Across the universe

Virtuoso speaks SQL, XML, and Web services, delivering tomorrow's universal server today

IN 1993 MICROSOFT first demonstrated Cairo, based on a prototype of its Object File System. The concept is slated to return, as Yukon, in 2003. Meanwhile, the industry hasn't been sitting on its thumbs. Database vendors have been busily converging the two major data-management disciplines: SQL and XML. One of the more forward-looking efforts is Virtuoso, from OpenLink Software.

Virtuoso packs more technology into one product than seems possible. It was first sold as a virtual database -- that is, a stand-alone SQL engine that could also use (and extend) foreign data sources. It evolved into a Web application server and now, in Version 2.7, has become as complete an example of a universal server as you are likely to find. Virtuoso 2.7, available for Windows, Linux, Solarix, HP/UX, AIX, and Mac OS X, creates a profound synthesis of SQL and XML data management styles, and wraps Web-services bindings around both. The SQL engine at the core of the product can contain structured data, as well as semistructured data (i.e. XML) and unstructured data (files, images). There's also a tightly integrated WebDAV (Web Distributed Authoring and Versioning) datastore that offers hierarchical access to semistructured and unstructured data.

Here's one example of how these pieces can fit together. A SQL query is defined, using the FOR XML clause to produce XML output. The query's results are routed through Virtuoso's built-in XSLT (Extensible Stylesheet Language Transformations) engine, transformed to HTML, and stored in a DAV folder. The HTML report is now available to DAV clients, or to browsers by way of Virtuoso's HTTP-based DAV interface.

If the query is defined as real time then the result file will seem empty, because it's just a placeholder; the query will generate the report in real time. Alternatively the query can be defined as persistent. In this case the result file is created, and then refreshed automatically at an administrator-defined interval. The query can optionally produce metadata (either a Document Type Definition or an XML Schema definition) describing the structure of the output. Note that except for writing the SQL query, there's no programming required to achieve these effects. It's all done in Virtuoso's Web-based administration tool.

A Virtuoso SQL query can draw on an extremely rich set of resources. Tables and stored procedures may be internal, or attached from Microsoft SQL Server or Oracle. Columns that store character data can be indexed for full-text search. When the character data is XML, XPath syntax is available for structure-aware queries.

Here's a query that returns just the first title from each of a set of XML documents, using the xpath_contains predicate: select XT_FILE, cast (p as varchar) descr from XML_TEXT2 where xpath_contains (XT_TEXT, '//title [position()= 1]',p)

Virtuoso's stored-procedure language, Virtuoso/PL, extends the product's reach to include virtually any Internet-accessible data: remote SQL databases, Web pages, e-mail inboxes, newsgroups, and SOAP (Simple Object Access Protocol) services. The SOAP client support includes direct SOAP calls, importation of WSDL (Web Services Description Language) wrappers, and UDDI (Universal Description Discovery and Integration) look-ups.

Data acquired from one or several of these sources can be manipulated using a function library that rivals what you'll find in powerful general-purpose scripting languages such as Perl and Python. The basics include support for dynamic arrays, regular-expression pattern matching, date arithmetic, and math. A rich set of Internet protocols enables PL procedures to perform LDAP operations, parse MIME messages, send and receive messages with NNTP (Net News Transfer Protocol), POP (Post Office Protocol), and SMTP, and even sign or verify S/MIME (Secure MIME) messages. Other suites of functions enable procedures to create and manage DAV collections; validate, transform, navigate, search, and store XML data; and interact with SOAP, WSDL, or UDDI services. Finally, there are functions that enable PL procedures to manage aspects of the Virtuoso database: users, transactions, cursors, logging, replication, and backup.

From universal client to universal server

So far this description sounds more like a universal client, or a middleware product that aggregates diverse data sources. There's more to the story. Virtuoso is, of course, a conventional database server, accessible to ODBC (Open Database Connectivity), OLE DB (Object Linking and Embedding Database), and JDBC (Java Database Connectivity) applications. And as we've seen, it's a WebDAV repository that serves content from the database either continually or on demand. In addition, Virtuoso is a full-fledged application server. Clients connect using HTTP (or, securely, using HTTPS) to three kinds of resources: the file system, the DAV repository, and the set of SOAP services published by Virtuoso. File system and DAV resources can be served up statically or by way of VSP (Virtuoso Server Pages). Like ASP (Active Server Pages) and JSP (Java Server Pages), VSP is a templating system that interleaves static content with dynamic behavior. The behavior, in this case, is expressed directly in Virtuoso/PL. The obvious cost of this approach is that few Web developers will be familiar with that language. The benefit is a deep unification of programming, data management, and data access.

It's hard to convey the flexibility of this approach in abstract terms. Here's one concrete example. Virtuoso can extend the set of functions available in XPath queries and XSLT (Extensible Stylesheet Language Transformation) stylesheets to include Virtuoso/PL procedures, which can be pure PL constructs or can call on other stored procedures and SOAP services. Imagine a SOAP-based stock-quote service made into an XSLT extension. A Web developer, writing a VSP application that pulls XML documents from the database can augment ticker symbols with current prices by making SOAP calls directly from the XSLT stylesheet.

Procedures that stand alone, or that wrap external behavior, are just as easily made available as SOAP services for use outside the Virtuoso environment. It's all done in the administrative tool: Define a logical path for the HTTP request, select a procedure, and publish it. As is now conventional with SOAP tool kits, Virtuoso generates WSDL descriptions on demand, and an HTML page for testing the services from a browser. We found it trivial to export two Virtuoso procedures, one native and the other an attached SQL Server stored procedure, as SOAP services which we could then call from Virtuoso's own test harness, and from Perl, Java, and .Net.

The marriage of SQL and XML, in the church of Web services, will reshape our industry. Virtuoso works hard to consummate the union. There's even an early implementation of XQuery 1.0, the latest in a series of proposed XML query languages. Virtuoso is a visionary product that is, frankly, challenging to fully comprehend. It can be configured as an e-mail message store, an NNTP server, an application server, a content-management system, a Web-services gateway, and much more. It requires, and will reward, multidisciplinary developers who can turn data into information, and information into knowledge. If you're intrigued by Yukon, the XML/SQL/Web-services hybrid that Microsoft plans for 2003, you may elect not to wait. Virtuoso is here today.