Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This seems to be mostly ad-hominem arguments in the classic definition of it: "You're wrong because there's something bad about you," rather than addressing any particular point. I don't see how questioning Google's motivations for uncovering this is doing anything to "set the record straight."

A useful post would have addressed at least 3 things:

- What are the specifics of the mechanism by which they ended up obviously copying google's results?

- Do they handle clicks from google differently than any other clicks?

- How different would their results be if they didn't use clicks from google's search as a signal?



I'm about 80% in the "Nothing fishy here" camp on this issue, but you have pointed out the 20% very well right here.

Obviously they won't explain the first one, just as Google won't explain the specifics of why StackOverflow scrapers sometimes outrank StackOverflow, and for good reason. Explaining the specifics of the mechanism would open it wide to spammers.

The second? Yeah, that's the million-dollar question. They should answer that. Definitely. It touches on the smelliest thing about the issue. If I had a search engine (I don't), and I wanted to copy Google's results (I wouldn't), and I had the ability to collect user click data, I could use that click data to create plausible deniability for the copying. This is exactly the sort of thing Microsoft has done in the past on other issues.

The third? It's certainly relevant, but I doubt Google would be willing to tell me how its results would be different if it didn't use a specific metric.


- What are the specifics of the mechanism by which they ended up obviously copying google's results? They said is was user click stream data. User types search into Google, gets results back and clicks link. Microsoft gets sent the search term and link clicked.

- Do they handle clicks from google differently than any other clicks? Hopefully they do if they are using user clickstream data. Each domain should have something like a page rank to determine how trustworthy it is.

- How different would their results be if they didn't use clicks from google's search as a signal? Why would they not use Google. Google is a site on the internet where users click links, Microsoft is collecting data from sites on the internet where users click links.


> Hopefully they do if they are using user clickstream data. Each domain should have something like a page rank to determine how trustworthy it is.

If there's some algorithmic reason to believe that Google gives more trustworthy results, that's one thing, and the pagerank weight can be empirically determined. If, on the other hand, there's some point where they explicitly treat google clickthrough data any differently (if site=='google.com' weight+=10), then that seems to have crossed a line.

I'm trying and failing not to have this sound like some kind of koan, but if you treat everyone differently in the same way, then that's (arguably) fine. It's only if Google gets some specific attention that it seems more malicious.


What if that weight was empirically determined by an algorithm instead of by a human, would it make it OK or should Bing hardcode in a demotion to that google-based data source?


Did we read the same article? I like Google and crew more than Microsoft, but this was pretty clear to me:

> Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results. What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.

Beyond that, it's kind of crazy to think they'd open up on nitty-gritty details about their algorithm - nobody does that.

Anyways, I'm still with Google. I hope Google wins. But attacking Bing here was probably a tactical mistake - pretty much all marketing thought ever says "Don't attack-market against upstarts if you're the market leader!" You can't win if you're #1 and you do that. Google's #1. Attacking Bing was a really bad tactical move, though I'm still casually rooting for Google to win.


"- How different would their results be if they didn't use clicks from google's search as a signal?"

That's the important one. These Bing people are happy to run their mouths about 1k signals blah blah blah. Who gives a fuck? What is the weight on those signals.

Further, this case was made obvious because Bing couldn't create an answer to a query so they copied Google's answer. Dude admits as much. Why should we give Bing a pass on copying Google's results just because in this instance it was too hard to find their own results?


The weight on keywords that exist nowhere else on the internet other than the fake searches Google injected?

Well, if there's one source of data, that data is weighted at 100%.

The whole experiment is meaningless.


Have the Bing team expose their algorithms and criterion to the public so they can settle a spat with Google over 7/100 made up words appearing in a set of search results?

That sounds reasonable.


>over 7/100 made up words appearing in a set of search results

Again, that's not the allegation, that's the evidence. The allegation is that bing is using data that essentially amounts to a wholesale copying of google's results.


You missed the point I was making, which is that asking a company to fork over expensive information that a competitor like Google would kill for in the name of answering a question about 7 queries is disporportionate.


Forking over data is not required. Making a clear public statement would suffice.


Expensive information? Please. I'm not asking for all weights, I'm asking for 1/1000 weights. If it's tiny, then sharing it should be inconsequential. And no bullshit about competitors or spammers please -- it's not as if anyone is unaware that ranking well in Google organic results is good :rolleyes:


What would be the value in knowing that weight without knowing what it's relative to?


I thought it went without saying that the relevant coefficient should be taken from a unit normal vector or some other indication of relative magnitude should be given.


I prefer expectation (over typical queries) of weighted variance of the feature, divided by the total weighted variance of all features. (most likely sqrt(variance) instead, but whatever power you prefer).


>"What is the weight on those signals."

Low enough that it took Google nearly three engineers per successfully injected honeypot (7 honeypots per 20 engineers) and Google was only able to achieve a 7% success rate despite their extensive in-house knowledge of SEO.


I don't understand what you're trying to say here. The Google blog post says that they inserted 100 honeypots into Google. Then it says "within a couple weeks of starting this experiment, our inserted results started appearing in Bing."

Where are you getting the 7% number?


the more detailed searchengineland.com post mentioned that they only got 7-9 of the 100 nonsense terms they tried to inject to show up.



Three searches is nothing.


I've seen this 1k signals BS repeated over and over by HN commenters, as if somehow copying from 999 other guys makes this a nonproblem.


so maybe you should stop using all search engines, because they by definition do not create any data and just copy it from other people.


I wouldn't say it was obvious, mainly because this was something Google did specifically to cause this. I would imagine you could replicate the same results with other major search engines simply by doing the same thing with the same amount of volume as the Google dudes did.

Additionally, if things were reversed and Google was posed these questions, I would imagine would be lacking just as many answers as what Bing is supplying.


> I wouldn't say it was obvious, mainly because this was something Google did specifically to cause this

There is a subtle point that many posts like this are overlooking. Google didn't run this experiment to cause bing to show bogus results, they did it to confirm the rise in suspiciously similar results produced by bing.


  Google didn't run this experiment to cause bing to show bogus results, they did it to confirm...
There is a counter argument to that. If Bing's claims are true, then Google didn't run the experiment to confirm the results, they ran it to cause the results.

I think 99% of HN, including myself, will share your opinion, but it doesn't counter Bing's claims because it boils down to "which company do we believe?"


Google: we found that bing is copying us. To prove it, we ran this experiment.

Bing: doh, of course we use google results, but we told you that before in an obscure academic paper. Didn't you read that? And we are innovative - shiny pictures!


I don't understand what are Bing's claims that make it so just went Google did this they were copying results.


>Additionally, if things were reversed and Google was posed these questions, I would imagine would be lacking just as many answers as what Bing is supplying.

That's like saying you'd be evasive like OJ Simpson if you murdered a few people and were on trial.

So if Google used Bing to rank searches it would also be under scrutiny? Since that's not the case we can all avoid the useless thought experiment can't we?


Google uses click data in their ranking algorithm. It's a fact. Pick a low volume, low competition niche, have 30+ of your friends search for the keyword and page through the results until you find a result (same for everyone), have everyone click through and not bounce. You will improve that site's rankings.

Chrome tracks clicks and traffic. Google Toolbar tracks clicks and traffic. The difference in the situation is that the majority of people use IE to search on Google, not the Google toolbar/Chrome to search on Bing. Bing has more data to work from than Google does in regards to this exact type of click tracking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: