Managing your content with XML
Daisy and TeXtML CMSes take differing, yet successful, tacks
How documents are stored in the repository, though, is a bit complicated. As stated above, documents are mapped to XML equivalents — but that is only partly true. On the one hand, documents are stored wholesale in their native format. On the other hand, when a document is placed in the repository, it is parsed into a kind of XML doppelganger document that TeXtML uses to build indexes for the document. The TeXtML repository keeps track of the relationship between the original document and its XML shadow. (This technique of creating XML shadow documents while keeping the original available helps TeXtML significantly with its indexing chores, thus speeding queries.)
The parsing is performed by the TeXtML’s Universal Converter, which reads some 220-plus document formats. It is an optional component, but without it, the only querying you can do is on document metadata such as title, creation date, document type, and so on.
Indexes and Queries
TeXtML knows which parts of a given document are to be indexed via an index definition document. There is only one index definition document in the repository, and its content is entirely XML. So, when a new document enters the repository, it is dissected by the Universal Converter, and the index definition document is consulted to determine which elements are to be indexed. TeXtML creates indexes for full-text content, strings, numeric data, dates, and time.
TeXtML’s query language is yet another XML variant, entirely unlike XQuery. The dissimilarity is understandable. TeXtML is primarily intent on performing rapid document content search; less important is the capability to navigate an XML document’s structure using XPath-style expressions (as can happen in XQuery).
TeXtML’s demonstration download comes with a preloaded repository, as well as an application that allows you to experiment with the system’s querying capabilities. The application lets the user enter queries by filling in text boxes, generates the query invisibly, then executes it.
The installation also includes sample apps and queries, and the included programmer’s manual provides a line-by-line explanation of the VBScript programs. This is not to suggest that VBScript is your only programming avenue into TeXtML, which supports APIs for Java, native .Net, COM, and OLEDB (organic light-emitting diode B). There is also a WebDAV extension; but, at the time of this writing, the API did not support some of TeXtML’s advanced features.
Daisy could certainly benefit from a smoother installation. Hopefully, a turnkey version, expected as part of the next release, will eliminate that complaint. Beyond that, the Daisywiki is a joy to play with, and is an excellent test-drive of Daisy’s novel stuff-it-all-in-one-bag approach to document storage.
TeXtML is the product for scuba-diving through oceans of text content. It also provides safeguard features that Daisy doesn’t have, such as the Fault Tolerant Server, which replicates documents and transactions on multiple TeXtML servers.
If hard-core text searching is what you need in your CMS system, then by all means give TeXtML a look. Daisy, however, has that powerful attribute that we are seeing more and more in high-quality software: open source. If you want to set up a wiki site in an evening or two, Daisy is very hard to beat.