Free Newsletters
InfoWorld Daily

InfoWorld
Log-in | Register
STRATEGIC DEVELOPER  

Accessing the web of databases

A world of possibilities is revealed when you view the Web as a network of interconnected data

By Jon Udell  
May 03, 2006
 

I've just posted the fourth installment in my new series of Friday podcasts. It’s an interview with Kingsley Idehen, CEO of OpenLink Software. OpenLink’s flagship product is a universal database and application server, Virtuoso, which I last wrote about in 2003.

Free IT resource

Open Source Business Conference (OSBC) May 22-23, 2007

Sponsored by OSBC

Free IT resource

Virtualization Insights from Top Experts - Learn how virtualization gets real!

Sponsored by Dell

I convened the interview mainly to discuss Virtuoso’s recent transition to open source, but our wide-ranging conversation helped me clarify a theme that’s been central to my own work, and will dominate the next phase of the Internet’s evolution. The Web is becoming a database -- or, more precisely, a network of databases. All of the trends that inform this column -- including Web services, REST (Representational State Transfer), AJAX (Asynchronous JavaScript and XML), and interpersonal as well as interprocess collaboration -- can be usefully refracted through that lens.

I’ve always regarded the Web as a programmable data source as well as a platform for the document/software hybrid that we call a Web page. Early on, programmable access to Web data entailed a lot of screen scraping. Nowadays it often still does, but it’s becoming common to find APIs that serve up the Web’s data. If you want to remix the InfoWorld metadata explorer, for example, as Mike Parsons did, you can fetch its data directly as XML.

Free text search is an even more popular access API. Nearly every site provides that service, or outsources it to Google or another engine.

And, of course, sites that act as database front ends support canned queries, the results of which may (if you’re lucky) be accessible by way of APIs such as RSS.

What you can’t typically do, though, is create mashups by running ad hoc queries against remote Web data. There are good reasons to think that it’s just crazy to export open-ended query interfaces over the Web. No responsible enterprise DBA would permit such access to the crown jewels. But there are all kinds of data sources -- or what Idehen likes to call data spaces -- and a range of feasible and appropriate access modes.

Consider the data space that is my blog. I maintain the data as XML and provide open-ended query access by way of XPath. Want to extract the set of Python code fragments from my corpus? Be my guest, it’s just a query on the URL-line. Want to repurpose that data? Go for it -- the output of that query is well-formed XHTML that displays in the browser but is also software-friendly.

If you’re clever, you can probably write an XPath query that will stall or crash my service. If you do, one minor node of an emerging network of Web databases will drop off the grid until I notice the problem and restart it. But it won’t ruin your day or mine. And as we gain more experience with these modes of access, we’ll learn how to make them more resilient to attack.

The holistic view of that network should be our focus. In Idehen’s view, you’ll use something like SPARQL -- a query language for the semantic Web -- to traverse a graph of interlinked sites, and to merge interesting sources into a virtual collection. Then you’ll dispatch queries to each member of that collection. They’ll offer a range of query styles ranging from free text search to iteration over simple key/value pairs (accessed by way of RSS or Atom) to tree traversal (XPath, XQuery) and relational query (SQL). I think he’s got it exactly right.





 


 
Jon Udell is lead analyst and blogger in chief at the InfoWorld Test Center.

  More of Jon Udell's column
  Jon Udell's Weblog

Newsletter Check out all of our free newsletters!
Enter e-mail address:




 

TOP NEWS:


»  Four quick tips for choosing an IM security product
71 percent of businesses will invest in real-time messaging this year. If you're one of them, be sure to protect your enterprise

»  Forrester analysts ID hot IT jobs
Research group finds 16 IT roles with a promising future

»  Nvidia claims 10 hours of HD video on Tegra chip
The Tegra 600 and 650 can be used with hard disk drives and are designed partly for mobile Internet devices

»  Database vendors add Google's MapReduce
Greenplum and Aster Data Systems will support Google's programming technique, developed for parallel processing of large data sets across commodity hardware

»  Network management: Tips for managing costs
New technologies, changing requirements, and ongoing equipment maintenance and upgrades cost money, but there are ways to manage expenses

»  EMC targets SMBs, branch offices with new low-end storage
Celerra NX4 highlights include thin provisioning, snapshot technology for data recovery and backups, and Web-based console for management of storage volumes




Migrating to Vista
Join Windows Vista Expert, Richard Whitehead as he presents the benefits and challenges of migrating to Windows Vista. Sponsored by Novell

»  Click here to view this Webcast
  Planning For A Disaster
This new, comprehensive Solutions Guide is your one stop source for Disaster Recovery. In it you'll learn how to reduce the likelihood of a disaster and to create a rock solid business continuity plan should you face a disaster situation. Sponsored by Equallogic

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist