Is BrowseRank The New PageRank?
I've been on a research paper reading binge recently. I've got about 10 or so under my belt now in just the last couple of weeks. I've discovered they make great reading on my train ride to work. Relatively short and to the point. Sure they're often full of crazy math formulas, but those are easy to gloss over and instead concentrate on the discussion. Many of the papers were written years ago. Despite their age, the information is… ummm you know… informative. I mean that. My most recent reading, BrowseRank: Letting Web Users Vote for Page Importance, is actually from 2008 which makes it both informative and relevant to future SEO efforts.
BrowseRank is a measure developed by Microsoft with the purpose of outperforming Google's PageRank. Such an effort shouldn't come as a surprise as there is always a better way to build a mousetrap. Even Google has acknowledged that PageRank is no longer as effective as it once was given their devaluing of it when it comes to ranking search results. More specifically, BrowseRank set out to address two shortcomings with PageRank:
- PageRank relies on the link graph, but this link graph is easily manipulated by spammers who automate the creation of pages and links in numbers that are almost unimaginable. Of course these links are worthless for determining true page quality or value.
- PageRank ignores the time spent on a page. That is, if a user spends 5 minutes reading a page, that page is likely more valuable than one that takes just 10 seconds to read. PageRank is primarily concerned with where a user ends up via clicking on links, but it completely misses out on the feedback available by time spent on a page.
Using a dataset of 950 million unique URLs and 3 billion data points consisting of URL visited, time visited, and whether the visit originated from a link, the Microsoft researchers determined that BrowseRank outperformed both PageRank and TrustRank. This improvement was seen at both the website-level and the page-level. That's pretty impressive given that just three metrics were used. Improvement, in this case, means a higher ranking for pages that users actually find engaging such as MySpace, Facebook, and YouTube rather than pages that are just linked to a lot such as Adobe (e.g. to download Acrobat Reader).
In addition, BrowseRank was more effective at filtering spam because it doesn't rely on easy-to-manipulate link data (critical to PageRank) or a potentially faulty seed list (critical to TrustRank).
I think this idea of user feedback is already in play at search engines. You can see strong evidence of this just by looking at what might come of the data already being collected by toolbars. And acquisitions of social media sites which provide strong, real-time signals of user interest are surely not just plays for additional advertising revenue. More tangible evidence is apparent with Google's almost immediate ranking of hot submissions on social bookmarking sites which wouldn't be happening as fast if PageRank had to be calculated.
Can you imagine the world of SEO where links had no direct ranking value? It'd be like every link on the web suddenly had a nofollow slapped on it. Links would then return to their original purpose of providing a jump point for additional information. An equally important to question to ask is whether your business is capable of surviving with such a change.
Well in the paper they state that they had to get the users to agree to let their browsing data be used this way. The big question is privacy. You need to let the seach company know where you've been going. I'd also be surprised if Google don't already look at where people click to improve their own search results. However, another fundamental question this raises is don't you just end up promoting pages that are already popular ad infinitum?! What if a new and more relevant site comes along despite it's relevancy it could easily be swamped by more popular sites. There can, however, be no doubt that analysing user behaviour rather than simply links as a metric must be a better approach.
You could also argue that incorporating digg/stumbledupon results into things partly DOES look at behaviour BUT the problem is that this can be messed around with. What company worth it's salt wouldn't get all their employees to digg their site?!
It is really sad that PR ignores the time spent on a page and just concentrates on clicking on links. On the other hand BrowseRank counts visits made by user and time spent on a page, but it will help online gaming sites, entertainment sites, video sites and eventually porn sites to get higher rank.And even if its better who knows about BrowseRank and how to check it.Right now there is no importance of BrowseRank on net. I think google came up with one nice solution called Searchwiki, after getting some popularity its going to change behaviour of lot of things including PR.
If onpage optimisation and content were featured higher in the ranking algorythm then i think some spam would be reduced.
A lesser dependency on the volume of links and more emphasis on quality.
Recognition of social interaction with the various social media sites - genuine users providing value for their followers, friends or contacts.
All search engines are far off offering the best results possible.
but at least they are trying, until then results will be manipulated.
I would love to see Google's internal page rank for sites and how it varies from the published pagerank.
What all should a webmaster do in such a situation to remain visible on the search engines?
I guess that explains why MSN produce great search results? Seriously though, if we look at the potential impact of user feedback. Take a retail site, will a site offering more products perform better from a user feedback point of view? Surely the more products they have the longer the user will remain on the site? Will a retail site offering cheaper products perform better, because surely high prices = higher bounce rates? This can not happen, as these factors are not necessarily indicators of relevancy. Therefore, these factors MAY play a role, but it is unlikely to play a significant ranking factor anytime in the near future.
The time-spent-on-page metric is not always reliable and can give very mixed signals. A searcher looking for something very specific might find it very quickly on the most relevant and well structured site, and therefore leave quickly too - fully satisfied.
Voting for page importance has seen some attention at the ACM in the last few years and rightly so. It's a real pain, because it's each vote doesn't have the same weight. If an expert votes and a layman votes, the expert votes carries more weight. There's also a whole host of other issues with a method such as BrowseRank. PageRank can't really give much meaningful information on the authority or quality of a page, we've known this for a long time, but it does have it's place.
There are some big issues around social media, the blogosphere and other things like that because you can't use the same techniques as for search. BrowseRank suffers from noise, same as the other ranking methods for social networks right now, like Digg.
I have no issue with user feedback when it's in a personalised environment but to calculate the authority of a page on a global scale, I don't think it's the way to go. It is a damn good paper though, and without stuff like this we'd never make any progress because there would never be any discussion.
Thanks for the post, it's nice to see more computing information and you summarized it beautifully.
Well, Google (and Yahoo) all have a variety of methods involving behavioral metrics...this MS offering isn't really a new thing from what I have read on it - Google has a wide variety (nameless) while Yahoo has methods such as personalized PageRank and others. My point being that it is prevalent across the big 3.
Now, there is also plenty of evidence/discussion that these types of signals can be noisy when used outside of the personalized setting and I have to imagine they ultimately get used in concert with other signals (links, historical factors, on page yada yada) more so than a standalone system. Just imagine the social spamming that would take place if open/public behavioral factors had a large stake in ranking processes.
This paper made the rounds with IR geeks back when it came out and I didn't see anything that earth shakingly new (compared to methods at G and Y)... much like FriendRank... a noisy signal that is unlikely to be a large factor in the near future... far more likely as a secondary ranking signal IMHO...
By the way, the supposition of social sites 'ranking' pages needs to consider historical ranking signals and the whole 'query deserves freshness' stuff...
Good post tho... always like the geeky stuff, just not the 'hype' of a new world; we've been seeing that for years and it never pans out - this is but a hint of new signals to come..Not a paradigm shift...