Google Overhauls Web Indexing With ‘Caffeine’

Google announced the launch of a new search index called Caffeine yesterday. Google promises that the new index will offer fresher results, indexing new content more quickly.

Google announced the launch of a new search index called “Caffeine” yesterday. Google promises that the new index will offer fresher results, indexing new content more quickly. In fact, it states that Caffeine offers 50% fresher results than its last index.

On its blog, Google has described Caffeine in simplistic terms. Vanessa Fox, the expert at Search Engine Land explains it better. She said: “Previously, Google’s crawling and indexing systems worked as batch processes. Googlebot would crawl a set of pages, then process those pages (extracting content from them, associating data about them, such as anchor text and external links, determining what those pages were about), and finally add them to the index.”

“While this system was continuous, all the documents in the batch had to wait until the whole batch was processed to be pushed live. Now, when Google crawls a page, it processes that page through the entire indexing pipeline and pushes it live nearly instantly. This change has already resulted in a 50 percent fresher index than before,” she added.

“Caffeine allows us to process data on the order of 100 petabytes,” Google evangelist Matt Cutts said Tuesday at the Search Marketing Expo in Seattle. “What is a petabyte?” (note: if you want to know “How Big is a Yottabyte?” check the article with an awesome infographic.)

“A petabyte is 1,024 terabytes – so, more than a million gigabytes. And there’s 100 petabytes of information, that scale of information, going into the (Caffeine technology) that we’re processing. So, it’s a lot more data, it allows a lot of flexibility, but fundamentally the change is that as soon as an object gets crawled, boom – it can get indexed.”

Google has deployed a near-instantaneous crawling/indexing system. Caffeine could be big for, say, news websites that post breaking information. If it’s not already, Google could become the place to go – instead of, perhaps, Bing – to learn more about breaking news events.

Perhaps more so than Twitter, the Internet’s premier real-time communications service. In related news, Bing on Wednesday announced the upcoming launch of bing.com/social, which allows users to search or browse topics and popular links on Twitter and Facebook.

So what does Caffeine mean for Bing? Well, it’s not entirely clear yet. Because Caffeine is an under-the-hood update to Google, people might not even appreciate it.

It’s about the search war in general: if over time people like an engine, they’ll use it. Google’s mantra all along has been to help people find information quickly. Meanwhile, Bing senior vice president Yusuf Mehdi said Wednesday that people like Bing because of the visual orientation of results.

Which do you want in a search engine: speed or clarity? And which search engines deliver what you want? [Google Blog via Silicon Republic]

Share this article

We welcome comments that advance the story directly or with relevant tangential information. We try to block comments that use offensive language, all capital letters or appear to be spam, and we review comments frequently to ensure they meet our standards. If you see a comment that you believe is irrelevant or inappropriate, you can flag it to our editors by using the report abuse links. Views expressed in the comments do not represent those of Coinspeaker Ltd.