XML APIs for databases

Blend the power of XML and databases using custom SAX and DOM APIs

Databases and XML offer complementary functionality for storing data. Databases store data for efficient retrieval, whereas XML offers an easy information exchange that enables interoperability between applications. To take advantage of XML's features we can convert database tables into XML documents. We can then use XML tools with such documents for further processing. For example, XML documents can be presented as HTML pages with XSLT stylesheets, can be searched with XML-based query languages such as XQL, can be used as a data-exchange format, and so on. However, converting a database into an XML document is an expensive approach, one that requires not only the initial cost of conversion but also the subsequent costs of synchronizing both information sources.

TEXTBOX: TEXTBOX_HEAD: Abbreviations in this article API: application programming interface

DOM: Document Object Model

DTD: document type definition

JAXP: Java API for XML Parsing

JDBC: Java Database Connectivity

SAX: Simple API for XML

XML: Extensible Markup Language

XQL: XML Query Language

XSL: Extensible Stylesheet Language

XSLT: XSL Transformations

XT: An XSLT processor

:END_TEXTBOX

For processing XML documents, most XML tools work with the SAX or DOM API. In this article, we'll look at a way to implement the same APIs directly over a database, enabling XML tools to treat databases as if they were XML documents. That way, we can obviate the need of converting a database.

We'll see an implementation of the SAX API for Databases that should work with any database with a JDBC driver. Next, we'll examine an implementation of the DOM API for Databases that uses the SAX API internally. To demonstrate the SAX API for Databases, we'll look at its integration with XT (an XSLT processor). We'll also see an example of how such integration can create HTML pages that incorporate an XSLT stylesheet directly from a database and how it can convert a database into an XML document. Finally, we'll look at how the DOM API for Databases integrates with an XQL processor.

In this article, I make use of existing tools rather than create new tools to illustrate the applications of the SAX and DOM APIs for Databases. I show how to leverage a number of available XML tools to work with a database. All the XML tools I mention are either available for free or free for noncommercial use (though you should, of course, check licensing agreements).

Overview of the SAX and DOM APIs

SAX is an event-based API for XML. With it, the SAX parser reports events such as the start and end of elements to the application as it walks over the document. Because the parser reports events as it visits different parts of the document, it does not have to build any internal structure. That reduces the strain on system resources, which makes the parser attractive for large documents. For XML documents received as continuous streams, an event-based API is the only choice.

The DOM API, on the other hand, follows a treelike construct. Elements have parent-child relations with other elements. With this API, the parser builds an internal structure such that an application can navigate it in a treelike fashion. DOM allows an application to have random access to the tree-structured document at the cost of increased memory usage.

XML APIs for databases: The basics

Because of a database's highly regular data-storage structure, we can map it into data-centric XML documents. For example, we can transform a database table into an XML document with a DTD of the following form:

<!ELEMENT table rows*>
<!ELEMENT rows (column1, column2, ...)>
<!ELEMENT column1 #PCDATA>
<!ELEMENT column2 #PCDATA>
....

In other words, with an XML API for databases, we can make the database look like an XML document; these APIs present the database as a virtual XML document. We are at the most basic concept of object-oriented design: it is the interface -- not the implementation -- that matters. In our situation, the tools using such an XML API need not care whether they are operating on a database table or an XML file.

A SAX or DOM parser can enable XML tools to work directly with databases

Implementing the SAX API for Databases

To implement the SAX API for Databases, we need to implement a parser that operates on a JDBC data source, iterates over each row and column, and generates appropriate SAX events while iterating. The SAX specification provides the org.xml.sax.InputSource class that models a data source representing a URL or a byte stream. To represent a database, we need a specialized form of it that can represent a table in a database. We therefore implement JDBCInputSource, which extends the org.xml.sax.InputSource class. Let's look at JDBCInputSource in more detail:

// JDBCInputSource.java
package dbxml.sax;
import java.sql.*;
import org.xml.sax.InputSource;
public class JDBCInputSource extends InputSource {
    private String _connectionURL;
    private String _userName;
    private String _passwd;
    private String _tableName;
    public JDBCInputSource(String connectionURL, String userName, 
                           String passwd, String tableName) {
        super(connectionURL);
        _connectionURL = connectionURL;
        _userName = userName;
        _passwd = passwd;
        _tableName = tableName;
    }
    public String getTableName() {
        return _tableName;
    }
    public Connection getConnection() throws SQLException {
        return DriverManager.getConnection(_connectionURL, _userName, _passwd);
    }
}

In the code above, the constructor takes the information needed to connect to a database and the name of the table to be parsed. The method getConnection() connects to the database and returns a Connection object.

Next, we need to implement the SAX parser that uses JDBCInputSource to iterate over database table rows and columns and generates SAX events along the way. To simplify the code, we create an abstract ParserBase class, which implements the org.xml.sax.Parser and has responsibility only for managing various handlers. We then create our SAX parser for the JDBC source JDBCSAXParser that extends the ParserBase class.

(To view the code for ParserBase.java, click here.)

// JDBCSAXParser.java
package dbxml.sax;
import java.io.IOException;
import java.sql.*;
import org.xml.sax.*;
import org.xml.sax.helpers.AttributeListImpl;
public class JDBCSAXParser extends ParserBase {
    private static final AttributeList _stockEmptyAttributeList 
        = new AttributeListImpl();
    //------------------------------------------------------------------
    // Methods from the Parser interface
    //------------------------------------------------------------------
    public void parse (InputSource source) throws SAXException, IOException {
        if (! (source instanceof JDBCInputSource)) {
            throw new SAXException("JDBCSAXParser can work only with source "
                                   + "of JDBCInputSource type");
        }
        parse((JDBCInputSource)source);
    }
    
    public  void parse (String systemId) throws SAXException, IOException {
        throw new SAXException("JDBCSAXParser needs more information to "
                               + "connect to database");
    }
    
    //------------------------------------------------------------------
    // Additional methods 
    //------------------------------------------------------------------
    public void parse(JDBCInputSource source) 
        throws SAXException, IOException {
        try {
            Connection connection = source.getConnection();
            if (connection == null) {
                throw new SAXException("Could not establish connection with "
                                       + "database");
            }
            
            String sqlQuery = getSelectorSQLStatement(source.getTableName());
            PreparedStatement pstmt = connection.prepareStatement(sqlQuery);
            
            ResultSet rs = pstmt.executeQuery();
            parse(rs, source.getTableName());
            rs.close();
            
            connection.close();
        } catch (SQLException ex) {
            throw new SAXException(ex);
        }
    }
    
    public void parse(ResultSet rs, String tableName) 
        throws SAXException, SQLException, IOException {
        if (_documentHandler == null) {
            return; // nobody is interested in me, no need to sweat!
        }
        
        ResultSetMetaData rsmd = rs.getMetaData();
        int numCols = rsmd.getColumnCount();
        
        String tableMarker = getTableMarker(tableName);
        String rowMarker = getRowMarker();
        
        _documentHandler.startDocument();
        _documentHandler.startElement(tableMarker, _stockEmptyAttributeList);
        while(rs.next()) {
            _documentHandler.startElement(rowMarker, 
                                          _stockEmptyAttributeList);
            for (int i = 1; i <= numCols; i++) {
                generateSAXEventForColumn(rsmd, rs, i);
            }
            _documentHandler.endElement(rowMarker);
        }
        _documentHandler.endElement(tableMarker);
        _documentHandler.endDocument();
    }
    
    public void parse(String connectionURL, String userName, String passwd,
                      String tableName) throws SAXException, IOException {
        parse(new JDBCInputSource(connectionURL, userName, passwd, tableName));
    }
    
    //------------------------------------------------------------------
    // Protected methods that derived classes could override to 
    // customize the parsing.
    //------------------------------------------------------------------
    protected void generateSAXEventForColumn(ResultSetMetaData rsmd,
                                             ResultSet rs,
                                             int columnIndex) 
        throws SAXException, SQLException {
        String columnValue = rs.getString(columnIndex);
        if (columnValue == null) {
            return;
        }
        String columnMarker 
            = getColumnMarker(rsmd.getColumnLabel(columnIndex));
        char[] columnValueChars = columnValue.toCharArray();
        _documentHandler.startElement(columnMarker, 
                                      _stockEmptyAttributeList);
        _documentHandler.characters(columnValueChars, 
                                    0, columnValueChars.length);
        _documentHandler.endElement(columnMarker);
    }
    
    protected String getTableMarker(String tableName) {
        return tableName;
    }
    protected String getRowMarker() {
        return "row";
    }
    protected String getColumnMarker(String columnName) {
        return columnName;
    }
    protected String getSelectorSQLStatement(String tableName) {
        return "select * from " + tableName;    
    }
}

Let's examine the code in more detail. JDBCSAXParser includes several overloaded parse() methods. In the list below, the org.xml.sax.Parser interface requires implementing the parse(InputSource) and parse(String) methods. The other parse() methods simplify the code and allow derived classes to override them to modify the parser behavior.

  • The parse(InputSource) method calls the parse(JDBCInputSource) method if the argument is of type JDBCInputSource; otherwise, it throws a SAXException as it cannot deal with it.

  • The parse(String) method throws a SAXException as the information supplied is not sufficient to access the database.

  • The parse(JDBCInputSource) method gets a Connection object from the input source and executes a query to obtain a ResultSet object. It then calls parse(ResultSet) with this object.

  • The parse(ResultSet, String) method performs the core parsing logic. It iterates over each row in the result set and each column in the rows. The row iteration loop is surrounded by calls to startElement() and endElement() with a table marker as the element-name argument. Similarly, each column iteration loop is surrounded by calls to startElement() and endElement() with a row marker as the element-name argument. In both cases an empty attribute list passes as the second argument to the startElement() methods. On each visit to a column, the generateSAXEventForColumn() method is called with column-name and column-value arguments. The value of a column is accessed by the getString() method on the result-set object, as we need a string representation of the column data to be notified in the characters() SAX event.

  • The convenience method parse(String, String, String, String) simply creates a JDBCInputSource object with the arguments passed to it and then calls the parse(JDBCInputSource) method with it.

The protected methods of JDBCSAXParser offer some customization possibilities through overriding:

1 2 Page 1
Page 1 of 2
InfoWorld Technology of the Year Awards 2023. Now open for entries!