Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

The future of XML documents and relational databases

As new species of XML documents are emerging, vendors are unveiling increased RDBMS support for XML

By Jon Udell  
July 25, 2003
 

When XML came along five years ago, promising to rewrite the rules of data management, vendors of relational databases took note, but they didn't panic. They'd already seen this movie a decade before, when the object database had been cast in the role of paradigm shifter. This new software species did emerge and did popularize the notion of persistence -- that is, the capability of storing and retrieving programming-language objects without arduous translation to and from relational tables. As it turned out, however, the old RDBMS dog could learn new tricks. Relational databases figured out how to store complex types using the SQL:1999 object model. Implementations of JDO (Java Data Objects) exist for relational as well as for object databases. And according to Microsoft, the forthcoming Yukon edition of SQL Server will be capable of persisting .Net objects.

Free IT resource

Open Source Business Conference (OSBC) May 22-23, 2007

Sponsored by OSBC

Free IT resource

Virtualization Insights from Top Experts - Learn how virtualization gets real!

Sponsored by Dell

Having absorbed objects, the RDBMS vendors are now working hard to absorb XML documents. Don't expect a simple rerun of the last movie, though. We've always known that most of the information that runs our businesses resides in the documents we create and exchange, and those documents have rarely been kept in our enterprise databases. Now that XML can represent both the documents that we see and touch -- such as purchase orders -- and the messages that exchange those documents on networks of Web services, it's more critical than ever that our databases can store and manage XML documents. A real summer blockbuster is in the making. No one knows exactly how it will turn out, but we can analyze the story so far and make some educated guesses.

Consensual Hallucination

The first step in the long journey of SQL/XML hybridization was to publish relational data as XML. BEA Chief Architect Adam Bosworth, who worked on the idea's SQL Server implementation, calls it "the consensual-hallucination approach -- we all agree to pretend there is a document."

XML publishing was the logical place to start because it's easy to represent a SQL result set in XML and because so many dynamic Web pages are fed by SQL queries. The traditional approach required programmatic access to the result set and programmatic construction of the Web page. The new approach materializes that dynamic Web page in a fully declarative way, using a SQL-to-XML query to produce an XML representation of the data and XSLT (eXtensible Stylesheet Language Transformation) to massage the XML into the HTML delivered to the browser.

Originally these virtual documents were created using proprietary SQL extensions such as SQL Server's FOR XML clause. There's now an emerging ISO/ANSI standard called SQL/XML, which defines a common approach. SQL/XML is supported today by Oracle and DB2. It defines XML-oriented operators that work with the native XML data types available in these products. SQL Server does not yet support an XML data type or the SQL/XML extensions, but Tom Rizzo, SQL Server group product manager at Redmond, Wash.-based Microsoft, says that Yukon, due in 2004, will.

Storing Documents

Most of the information in an enterprise lives in documents kept in file systems, not in relational databases. There have always been reasons to move those documents into databases -- centralized administration, full-text search -- but in the absence of a way to relate the data in the documents to the data in the database, those reasons weren't compelling. XML cinches the argument.

As business documents morph from existing formats to XML -- admittedly a long, slow process that has only just begun -- it becomes possible to correlate the two flavors of data. Consider an insurance application that stores claims data in a relational table and claims documents in XML. A hybrid SQL/XML database enables the application to extract fragments of XML from a subset of the documents. And that subset can be created by joining XML elements in the document with column values in relational tables.

These wildly powerful effects are currently achieved using a few different kinds of storage and query strategies. On the storage side, there are two general approaches. You can put whole documents into columns of the database, or you can "shred" the document into a collection of relational tables. The latter approach makes best use of the database's query engine and atomic update capabilities, but mapping from irregular XML data to SQL is much harder than mapping from SQL to XML. It helps if your XML documents are governed by XML Schema descriptions. These provide hints to the XML-to-SQL mapper and can be annotated to more precisely control the mapping. It also helps if your database supports objects that can receive irregularly shaped XML data. "We extended the relational base technology to include objects as part of SQL:1999," says Sandeepan Banerjee, director of product management at Oracle Server Technologies. "And this has matured with 8i and 9i to the point where the typing system of XML Schema can be fully represented in the object/relational typing."


Continued
1 | 2 | Next Page » 



 


 
Jon Udell is lead analyst and blogger in chief at the InfoWorld Test Center.

  More of Jon Udell's column
  Jon Udell's Weblog

Newsletter Check out all of our free newsletters!
Enter e-mail address:




 

TOP NEWS:


»  Microsoft: Don't misunderstand UAC, other Vista features
A Microsoft posting attempted to explain the most 'misunderstood' features of Vista: UAC, Image Management, Display Driver Model, Windows Search, and 64-bit architecture

»  Compuware 2.0 set as rebirth of company
Looking to revitalize, the vendor will evaluate products and focus on business value

»  Google overtakes Yahoo as most-visited U.S. Web site
For the first time, Google has knocked Yahoo off the top spot of the most popular Web site in the country

»  Top 10: HP-EDS buy, Icahn strikes again, China quakes
This week's roundup of the top IT news stories includes the continuing saga of MS-Yahoo, HP's big buy, Vista's developer problem, 3G iPhone rumors, and more

»  ObjectWave's Swan swims for RIA connectivity
Rich Internet application platform enables simpler connectivity between AJAX interfaces and server-side code

»  Bender forms group to promote OLPC's Sugar UI
Sugar Labs, founded by OLPC's former president of software and content, intends to use open source as a tool to promote a learning model




Virtualization: A Step by Step Approach to Success
Your virtual machines can be up and running in a matter of minutes. HP and Citrix have integrated XenServer with HP ProLiant servers and management tools, powered by hardware-assisted Intel Virtualization Technology to enable high- performance, cost-savings solutions for server consolidation and disaster recovery. Sponsor: HP

»  Click here to view this Webcast
  Storage is big, and getting bigger
The only certainty is that your requirement for storage will never be satisfied. While you clean out space and authorize POs, you might consider another alternative: outsourcing. The best way to deal with storage might be to let someone else deal with it. Sponsored by SGI

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
SEE ALSO
• Software AG enhances XML document storage
• Database titans embrace XML
• Jon Udell's Weblog


FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS  IT EXEC-CONNECT   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist