Databases and XML offer complementary functionality for storing data. Databases store data for efficient retrieval, whereas XML offers an easy information exchange that enables interoperability between applications. To take advantage of XML's features we can convert database tables into XML documents. We can then use XML tools with such documents for further processing. For example, XML documents can be presented as HTML pages with XSLT stylesheets, can be searched with XML-based query languages such as XQL, can be used as a data-exchange format, and so on. However, converting a database into an XML document is an expensive approach, one that requires not only the initial cost of conversion but also the subsequent costs of synchronizing both information sources.
TEXTBOX: TEXTBOX_HEAD: Abbreviations in this article API: application programming interface
DOM: Document Object Model
DTD: document type definition
JAXP: Java API for XML Parsing
JDBC: Java Database Connectivity
SAX: Simple API for XML
XML: Extensible Markup Language
XQL: XML Query Language
XSL: Extensible Stylesheet Language
XSLT: XSL Transformations
XT: An XSLT processor
:END_TEXTBOX
For processing XML documents, most XML tools work with the SAX or DOM API. In this article, we'll look at a way to implement the same APIs directly over a database, enabling XML tools to treat databases as if they were XML documents. That way, we can obviate the need of converting a database.
We'll see an implementation of the SAX API for Databases that should work with any database with a JDBC driver. Next, we'll examine an implementation of the DOM API for Databases that uses the SAX API internally. To demonstrate the SAX API for Databases, we'll look at its integration with XT (an XSLT processor). We'll also see an example of how such integration can create HTML pages that incorporate an XSLT stylesheet directly from a database and how it can convert a database into an XML document. Finally, we'll look at how the DOM API for Databases integrates with an XQL processor.
In this article, I make use of existing tools rather than create new tools to illustrate the applications of the SAX and DOM APIs for Databases. I show how to leverage a number of available XML tools to work with a database. All the XML tools I mention are either available for free or free for noncommercial use (though you should, of course, check licensing agreements).
Overview of the SAX and DOM APIs
SAX is an event-based API for XML. With it, the SAX parser reports events such as the start and end of elements to the application as it walks over the document. Because the parser reports events as it visits different parts of the document, it does not have to build any internal structure. That reduces the strain on system resources, which makes the parser attractive for large documents. For XML documents received as continuous streams, an event-based API is the only choice.
The DOM API, on the other hand, follows a treelike construct. Elements have parent-child relations with other elements. With this API, the parser builds an internal structure such that an application can navigate it in a treelike fashion. DOM allows an application to have random access to the tree-structured document at the cost of increased memory usage.
XML APIs for databases: The basics
Because of a database's highly regular data-storage structure, we can map it into data-centric XML documents. For example, we can transform a database table into an XML document with a DTD of the following form:
<!ELEMENT table rows*> <!ELEMENT rows (column1, column2, ...)> <!ELEMENT column1 #PCDATA> <!ELEMENT column2 #PCDATA> ....
In other words, with an XML API for databases, we can make the database look like an XML document; these APIs present the database as a virtual XML document. We are at the most basic concept of object-oriented design: it is the interface -- not the implementation -- that matters. In our situation, the tools using such an XML API need not care whether they are operating on a database table or an XML file.
Implementing the SAX API for Databases
To implement the SAX API for Databases, we need to implement a parser that operates on a JDBC data source, iterates over each row and column, and generates appropriate SAX events while iterating. The SAX specification provides the org.xml.sax.InputSource
class that models a data source representing a URL or a byte stream. To represent a database, we need a specialized form of it that can represent a table in a database. We therefore implement JDBCInputSource
, which extends the org.xml.sax.InputSource
class. Let's look at JDBCInputSource
in more detail:
// JDBCInputSource.java package dbxml.sax; import java.sql.*; import org.xml.sax.InputSource; public class JDBCInputSource extends InputSource { private String _connectionURL; private String _userName; private String _passwd; private String _tableName; public JDBCInputSource(String connectionURL, String userName, String passwd, String tableName) { super(connectionURL); _connectionURL = connectionURL; _userName = userName; _passwd = passwd; _tableName = tableName; } public String getTableName() { return _tableName; } public Connection getConnection() throws SQLException { return DriverManager.getConnection(_connectionURL, _userName, _passwd); } }
In the code above, the constructor takes the information needed to connect to a database and the name of the table to be parsed. The method getConnection()
connects to the database and returns a Connection
object.
Next, we need to implement the SAX parser that uses JDBCInputSource
to iterate over database table rows and columns and generates SAX events along the way. To simplify the code, we create an abstract ParserBase
class, which implements the org.xml.sax.Parser
and has responsibility only for managing various handlers. We then create our SAX parser for the JDBC source JDBCSAXParser
that extends the ParserBase
class.
(To view the code for ParserBase.java, click here.)
// JDBCSAXParser.java package dbxml.sax; import java.io.IOException; import java.sql.*; import org.xml.sax.*; import org.xml.sax.helpers.AttributeListImpl; public class JDBCSAXParser extends ParserBase { private static final AttributeList _stockEmptyAttributeList = new AttributeListImpl(); //------------------------------------------------------------------ // Methods from the Parser interface //------------------------------------------------------------------ public void parse (InputSource source) throws SAXException, IOException { if (! (source instanceof JDBCInputSource)) { throw new SAXException("JDBCSAXParser can work only with source " + "of JDBCInputSource type"); } parse((JDBCInputSource)source); } public void parse (String systemId) throws SAXException, IOException { throw new SAXException("JDBCSAXParser needs more information to " + "connect to database"); } //------------------------------------------------------------------ // Additional methods //------------------------------------------------------------------ public void parse(JDBCInputSource source) throws SAXException, IOException { try { Connection connection = source.getConnection(); if (connection == null) { throw new SAXException("Could not establish connection with " + "database"); } String sqlQuery = getSelectorSQLStatement(source.getTableName()); PreparedStatement pstmt = connection.prepareStatement(sqlQuery); ResultSet rs = pstmt.executeQuery(); parse(rs, source.getTableName()); rs.close(); connection.close(); } catch (SQLException ex) { throw new SAXException(ex); } } public void parse(ResultSet rs, String tableName) throws SAXException, SQLException, IOException { if (_documentHandler == null) { return; // nobody is interested in me, no need to sweat! } ResultSetMetaData rsmd = rs.getMetaData(); int numCols = rsmd.getColumnCount(); String tableMarker = getTableMarker(tableName); String rowMarker = getRowMarker(); _documentHandler.startDocument(); _documentHandler.startElement(tableMarker, _stockEmptyAttributeList); while(rs.next()) { _documentHandler.startElement(rowMarker, _stockEmptyAttributeList); for (int i = 1; i <= numCols; i++) { generateSAXEventForColumn(rsmd, rs, i); } _documentHandler.endElement(rowMarker); } _documentHandler.endElement(tableMarker); _documentHandler.endDocument(); } public void parse(String connectionURL, String userName, String passwd, String tableName) throws SAXException, IOException { parse(new JDBCInputSource(connectionURL, userName, passwd, tableName)); } //------------------------------------------------------------------ // Protected methods that derived classes could override to // customize the parsing. //------------------------------------------------------------------ protected void generateSAXEventForColumn(ResultSetMetaData rsmd, ResultSet rs, int columnIndex) throws SAXException, SQLException { String columnValue = rs.getString(columnIndex); if (columnValue == null) { return; } String columnMarker = getColumnMarker(rsmd.getColumnLabel(columnIndex)); char[] columnValueChars = columnValue.toCharArray(); _documentHandler.startElement(columnMarker, _stockEmptyAttributeList); _documentHandler.characters(columnValueChars, 0, columnValueChars.length); _documentHandler.endElement(columnMarker); } protected String getTableMarker(String tableName) { return tableName; } protected String getRowMarker() { return "row"; } protected String getColumnMarker(String columnName) { return columnName; } protected String getSelectorSQLStatement(String tableName) { return "select * from " + tableName; } }
Let's examine the code in more detail. JDBCSAXParser
includes several overloaded parse()
methods. In the list below, the org.xml.sax.Parser
interface requires implementing the parse(InputSource)
and parse(String)
methods. The other parse()
methods simplify the code and allow derived classes to override them to modify the parser behavior.
The
parse(InputSource)
method calls theparse(JDBCInputSource)
method if the argument is of typeJDBCInputSource
; otherwise, it throws aSAXException
as it cannot deal with it.The
parse(String)
method throws aSAXException
as the information supplied is not sufficient to access the database.The
parse(JDBCInputSource)
method gets aConnection
object from the input source and executes a query to obtain aResultSet
object. It then callsparse(ResultSet)
with this object.The
parse(ResultSet, String)
method performs the core parsing logic. It iterates over each row in the result set and each column in the rows. The row iteration loop is surrounded by calls tostartElement()
andendElement()
with a table marker as the element-name argument. Similarly, each column iteration loop is surrounded by calls tostartElement()
andendElement()
with a row marker as the element-name argument. In both cases an empty attribute list passes as the second argument to thestartElement()
methods. On each visit to a column, thegenerateSAXEventForColumn()
method is called with column-name and column-value arguments. The value of a column is accessed by thegetString()
method on the result-set object, as we need a string representation of the column data to be notified in thecharacters()
SAX event.- The convenience method
parse(String, String, String, String)
simply creates aJDBCInputSource
object with the arguments passed to it and then calls theparse(JDBCInputSource)
method with it.
The protected methods of JDBCSAXParser
offer some customization possibilities through overriding: