Daisy’s eschewing of a repository structure appears, at first glance, to be a severe omission. Further reflection, however,
reveals this weakness as a strength. In a typical CMS, a document is placed into a specific collection within the repository,
but that implies a redundancy: Someone has used the document’s content to determine which collection to put the document in.
If you’ve properly tagged the document, however, and if your repository server can create a view of the repository derived
from those tags, then the equivalent of a collection structure can be rendered at display time. And, unlike collection-based
repository servers, such a “view-based” server renders multiple, different views of the same repository. This is exactly what
Daisy does, and the result is quite impressive.
Ixiasoft TeXtML
TeXtML applies the bulk of its energies to the storage, retrieval, and management of text, and does so by creating an environment
awash in XML.
It’s not much of a stretch to say that TeXtML takes text documents from our universe, maps them into their equivalents in
an XML universe, and uses the capabilities of that universe to provide search and management functions that would not be available
otherwise. (This is not to suggest that TeXtML can handle text-only docs:
It can easily store and retrieve documents with embedded binary data.)
TeXtML uses a collections paradigm for organizing documents. Collections appear as named folders on TeXtML’s administration
console, and are navigated using the standard path constructs that anyone familiar with a file system would recognize.
How documents are stored in the repository, though, is a bit complicated. As stated above, documents are mapped to XML equivalents
— but that is only partly true. On the one hand, documents are stored wholesale in their native format. On the other hand,
when a document is placed in the repository, it is parsed into a kind of XML doppelganger document that TeXtML uses to build
indexes for the document. The TeXtML repository keeps track of the relationship between the original document and its XML
shadow. (This technique of creating XML shadow documents while keeping the original available helps TeXtML significantly with
its indexing chores, thus speeding queries.)
The parsing is performed by the TeXtML’s Universal Converter, which reads some 220-plus document formats. It is an optional
component, but without it, the only querying you can do is on document metadata such as title, creation date, document type,
and so on.
Indexes and Queries
TeXtML knows which parts of a given document are to be indexed via an index definition document. There is only one index definition
document in the repository, and its content is entirely XML. So, when a new document enters the repository, it is dissected
by the Universal Converter, and the index definition document is consulted to determine which elements are to be indexed.
TeXtML creates indexes for full-text content, strings, numeric data, dates, and time.
TeXtML’s query language is yet another XML variant, entirely unlike XQuery. The dissimilarity is understandable. TeXtML is
primarily intent on performing rapid document content search; less important is the capability to navigate an XML document’s
structure using XPath-style expressions (as can happen in XQuery).
TeXtML’s demonstration download comes with a preloaded repository, as well as an application that allows you to experiment
with the system’s querying capabilities. The application lets the user enter queries by filling in text boxes, generates the
query invisibly, then executes it.
The installation also includes sample apps and queries, and the included programmer’s manual provides a line-by-line explanation
of the VBScript programs. This is not to suggest that VBScript is your only programming avenue into TeXtML, which supports
APIs for Java, native .Net, COM, and OLEDB (organic light-emitting diode B). There is also a WebDAV extension; but, at the
time of this writing, the API did not support some of TeXtML’s advanced features.
Concluding Content
Daisy could certainly benefit from a smoother installation. Hopefully, a turnkey version, expected as part of the next release,
will eliminate that complaint. Beyond that, the Daisywiki is a joy to play with, and is an excellent test-drive of Daisy’s
novel stuff-it-all-in-one-bag approach to document storage.
TeXtML is the product for scuba-diving through oceans of text content. It also provides safeguard features that Daisy doesn’t
have, such as the Fault Tolerant Server, which replicates documents and transactions on multiple TeXtML servers.
If hard-core text searching is what you need in your CMS system, then by all means give TeXtML a look. Daisy, however, has
that powerful attribute that we are seeing more and more in high-quality software: open source. If you want to set up a wiki
site in an evening or two, Daisy is very hard to beat.