Could Google's 'dataspaces' reshape search?

'Dataspaces' concept, which stems from the work of Google researcher Alon Halevy, could take search technology and content processing to another level, analyst claims

Google -- the company most identified with Web search -- is not the leading player behind the firewall, claiming about 9,000 customers are using its enterprise search products. Independent search vendor Autonomy says it has 17,000.

Still, in his recent report "Beyond Search," for Gilbane Group, analyst Stephen Arnold portrays the company as a quietly humming engine of activity, with work under way that could "leapfrog" the current generation of search technology.

[ For more, see related story: "The future of enterprise search." ]

Arnold, who closely tracks Google's patent applications, is especially interested in a concept called "dataspaces," which stems from the work of Google researcher Alon Halevy. Dataspaces, in Arnold's view, take "content processing into a new dimension."

"A dataspace should contain all of the information relevant to a particular organization regardless of its format and location, and model a rich collection of relationships between data repositories," Halevy wrote along with two co-authors in a December 2005 paper. "Hence, we model a dataspace as a set of participants and relationships.

"The participants in a dataspace are the individual data sources: they can be relational databases, XML repositories, text databases, Web services and software packages," the paper states at another point. "A dataspace should be able to model any kind of relationship between two (or more) participants."

While other vendors are pursuing similar goals, they cannot compete on scale with Google, according to Arnold.

"Even the most robust content processing systems have not been engineered to handle Google-level content flows. The implication of scale means Google is operating largely without competition from the companies profiled in this study," he wrote in "Beyond Search."

Meanwhile, Google indeed appears to have ambitious search and content-processing projects in the patent pipeline that echo the dataspaces concept.

One in particular, U.S. Patent No. 20070198481, "Automatic Object Reference Identification and Linking in a Browseable Fact Repository," describes an invention that crunches together a wide range of data on an individual or topic into a kind of dossier.

Google declined to comment on patent applications or make Halevy available for an interview.

"We file patent applications on a variety of ideas that our employees come up with," a company spokesman said via e-mail. "Some of those ideas later mature into real products or services, some don't."

But a company executive was willing to paint the company's search in general terms.

"Inside an enterprise, and maybe unlike the Internet, you can know a lot about a user," such as who they report to, said Matthew Glotzbach, director of product management for Google's enterprise division. "There's a lot of empirical information you can derive. All of that can be used to create a very, very rich profile about the user, which can then be used to create a really rich search experience."

Do not expect Google to suddenly bring a game-changing product to market, according to Glotzbach.

"The model is not these kind of big-bang approaches where we work for multiple years and then roll something out. In terms of what we do in enterprise search, you'll see a constant flow, as opposed to one sort of big bang -- here's a whole new thing," he said.

Mobile Security Insider: iOS vs. Android vs. BlackBerry vs. Windows Phone
Recommended
Join the discussion
Be the first to comment on this article. Our Commenting Policies