The Google supercomputer is changing how we think about Internet-scale software.
The gigabyte slice of the Google file system available to Gmail beta testers will, in many cases, surpass the testers’ own corporate disk quotas for e-mail. And that only scratches the surface. Think about the possibilities for controlling spam, or streamlining file transfer, or mapping social networks, when e-mail travels within Gmail rather than across the Internet. If you join massive horsepower to vast data, amazing things will happen.
Now consider the PC. Although puny next to Google, today’s PCs are supercomputers compared with their ancestors of 15 years ago. Yet it is connectivity, not horsepower and data, that distinguishes the modern PC from its 1989 grandparent. We don’t push these powerful beasts nearly as hard as they’re capable of going. That underutilization caused sleeplessness in Seattle and led to Microsoft’s “Longhorn wave.”
Historically we’ve relied on fancier user interfaces to soak up spare client-side cycles, and that trend continues with Longhorn’s 3D-intensive Avalon. Longhorn also aims to create a new breed of applications that will produce and consume systemwide metadata. I applaud the goals, but there’s more to do. Imagine that Google, rather than Microsoft, controlled the desktop. Job No. 1 for the Google PC would be to vacuum up all available sources of data. Job No. 2 would be to exploit that data to the hilt.
On the Google PC, you wouldn’t need third-party add-ons to index and search your local files, e-mail, and instant messages. It would just happen. The voracious spider wouldn’t stop there, though. The next piece of low-hanging fruit would be the Web pages you visit. These too would be stored, indexed, and made searchable. More ambitiously, the spider would record all your screen activity along with the underlying event streams. Even more ambitiously, it would record phone conversations, convert speech to text, and index that text. Although speech-to-text is a notoriously imperfect art, even imperfect results can support useful search.
Here are some of the ways the Google PC could exploit this data:
Bayesian categorization: My SpamBayes-enhanced e-mail program learns continuously about what I do and don’t find interesting, and helps me organize messages accordingly. A systemwide agent that’s always building categorized views of all your content would be a great way to burn idle CPU cycles.
Context reassembly: When writing a report, you’re likely to refer to a spreadsheet, visit some Web pages, and engage in an IM chat. Using its indexed and searchable event stream, the system would restore this context when you later read or edited the document. Think browser history on steroids.
Screen pops: When you receive an e-mail, IM, or phone call, the history of your interaction with that person would pop up on your screen. The message itself could be used to automatically refine the query.
With managed metadata, these things are easy to do, and that’s a key motivation for Longhorn’s WinFS storage system. But we don’t have a lot of metadata now, and we won’t have much anytime soon. So it’s worth reflecting on what Google has accomplished by brute force. Instead of idly slacking most of the time, our PCs ought to be indexing, analyzing, correlating, and classifying.