Free Newsletters
InfoWorld Daily

InfoWorld
Log-in | Register
Page 2 of 2  «  Previous Page

Managing your content with XML

 

Daisy’s eschewing of a repository structure appears, at first glance, to be a severe omission. Further reflection, however, reveals this weakness as a strength. In a typical CMS, a document is placed into a specific collection within the repository, but that implies a redundancy: Someone has used the document’s content to determine which collection to put the document in. If you’ve properly tagged the document, however, and if your repository server can create a view of the repository derived from those tags, then the equivalent of a collection structure can be rendered at display time. And, unlike collection-based repository servers, such a “view-based” server renders multiple, different views of the same repository. This is exactly what Daisy does, and the result is quite impressive.

Free IT resource

Hear how top CIOs turn change into a competitive advantage.

Sponsored by HP

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld

Ixiasoft TeXtML
TeXtML applies the bulk of its energies to the storage, retrieval, and management of text, and does so by creating an environment awash in XML.

It’s not much of a stretch to say that TeXtML takes text documents from our universe, maps them into their equivalents in an XML universe, and uses the capabilities of that universe to provide search and management functions that would not be available otherwise. (This is not to suggest that TeXtML can handle text-only docs:

It can easily store and retrieve documents with embedded binary data.)

TeXtML uses a collections paradigm for organizing documents. Collections appear as named folders on TeXtML’s administration console, and are navigated using the standard path constructs that anyone familiar with a file system would recognize.

How documents are stored in the repository, though, is a bit complicated. As stated above, documents are mapped to XML equivalents — but that is only partly true. On the one hand, documents are stored wholesale in their native format. On the other hand, when a document is placed in the repository, it is parsed into a kind of XML doppelganger document that TeXtML uses to build indexes for the document. The TeXtML repository keeps track of the relationship between the original document and its XML shadow. (This technique of creating XML shadow documents while keeping the original available helps TeXtML significantly with its indexing chores, thus speeding queries.)

The parsing is performed by the TeXtML’s Universal Converter, which reads some 220-plus document formats. It is an optional component, but without it, the only querying you can do is on document metadata such as title, creation date, document type, and so on.

Indexes and Queries
TeXtML knows which parts of a given document are to be indexed via an index definition document. There is only one index definition document in the repository, and its content is entirely XML. So, when a new document enters the repository, it is dissected by the Universal Converter, and the index definition document is consulted to determine which elements are to be indexed. TeXtML creates indexes for full-text content, strings, numeric data, dates, and time.

TeXtML’s query language is yet another XML variant, entirely unlike XQuery. The dissimilarity is understandable. TeXtML is primarily intent on performing rapid document content search; less important is the capability to navigate an XML document’s structure using XPath-style expressions (as can happen in XQuery).

TeXtML’s demonstration download comes with a preloaded repository, as well as an application that allows you to experiment with the system’s querying capabilities. The application lets the user enter queries by filling in text boxes, generates the query invisibly, then executes it.

The installation also includes sample apps and queries, and the included programmer’s manual provides a line-by-line explanation of the VBScript programs. This is not to suggest that VBScript is your only programming avenue into TeXtML, which supports APIs for Java, native .Net, COM, and OLEDB (organic light-emitting diode B). There is also a WebDAV extension; but, at the time of this writing, the API did not support some of TeXtML’s advanced features.

Concluding Content
Daisy could certainly benefit from a smoother installation. Hopefully, a turnkey version, expected as part of the next release, will eliminate that complaint. Beyond that, the Daisywiki is a joy to play with, and is an excellent test-drive of Daisy’s novel stuff-it-all-in-one-bag approach to document storage.

TeXtML is the product for scuba-diving through oceans of text content. It also provides safeguard features that Daisy doesn’t have, such as the Fault Tolerant Server, which replicates documents and transactions on multiple TeXtML servers.

If hard-core text searching is what you need in your CMS system, then by all means give TeXtML a look. Daisy, however, has that powerful attribute that we are seeing more and more in high-quality software: open source. If you want to set up a wiki site in an evening or two, Daisy is very hard to beat.


»  Previous Page | 1 | 2 



Ixiasoft TeXtML Server

Ixiasoft, ixiasoft.com

Very Good  8.1
criteria score weight
Ease-of-use 8 20%
Flexibility 8 20%
Integration 8 20%
Management 8 20%
Scalability 9 10%
Value 8 10%

Cost:
Starts at around $10,000

Platforms:
Requires Windows 2000/2003,or Windows XP Professional

Bottom Line:
TeXtML provides a wide array of APIs. Setup is easy, and it supports fault tolerance with multiserver fail-over repositories (an optional component). TeXtML strikes the right balance between turning everything into XML, or using XML to enable powerful queries. The CMS excels at text searching. Although the price is steep, TeXtML may well be worth considering for companies that want quick search access to documents in a secure repository.

About our Reviews and Scoring Methodology



Daisy 1.3

Outerthought, org/daisy/index.html

Very Good  8.3
criteria score weight
Ease-of-use 8 20%
Flexibility 9 20%
Integration 8 20%
Management 8 20%
Scalability 8 10%
Value 9 10%

Cost:
Free

Platforms:
Requires only a JVM 1.4.2 or higher, and MySQL version 4.0.20 or Version 4.1.7 (or higher).

Bottom Line:
Daisy's novel approach to stuffing all documents into one bag and leaving it to metadata and navigation documents to sort out may sound like anarchy but this scheme provides more flexibility than the collections approach. Daisy allows multiuser editing of repository content, as demonstrated by its wiki front end. The installation takes a bit of work, and the documentation is still in progress. Daisy is proving itself in live on-the-Web use, so the extra effort is worth it.

About our Reviews and Scoring Methodology



 


 
Rick Grehan is a contributing editor at InfoWorld. Contact him at rick_grehan@infoworld.com.
 

TOP NEWS:


»  Four quick tips for choosing an IM security product
71 percent of businesses will invest in real-time messaging this year. If you're one of them, be sure to protect your enterprise

»  Forrester analysts ID hot IT jobs
Research group finds 16 IT roles with a promising future

»  Nvidia claims 10 hours of HD video on Tegra chip
The Tegra 600 and 650 can be used with hard disk drives and are designed partly for mobile Internet devices

»  Database vendors add Google's MapReduce
Greenplum and Aster Data Systems will support Google's programming technique, developed for parallel processing of large data sets across commodity hardware

»  Network management: Tips for managing costs
New technologies, changing requirements, and ongoing equipment maintenance and upgrades cost money, but there are ways to manage expenses

»  EMC targets SMBs, branch offices with new low-end storage
Celerra NX4 highlights include thin provisioning, snapshot technology for data recovery and backups, and Web-based console for management of storage volumes




FIVE WAYS TO REDUCE IT COSTS IN 2009
The demands on IT have never been greater, particularly in light of lower revenue and uncertain demand for the goods and services. There are many ways that IT can help organizations adjust to this new economic environment. Learn about five key technology trends that can immediately impact your organization's bottom line, and how to build a strategy to implement these technologies within your current budget. Sponsored by: Riverbed

»  Click here to view this Webcast
  Enterprise Data Security Solutions Guide
Data security used to be about outside threats. These days the biggest challenge for data-driven organizations is the management of secure information from the inside out. Data is available on laptops, your network and even USB devices, but not always secure. Read this Solutions Guide to learn the best ways to keep it safe. Sponsored by ISC2

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2009, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist
TecChannel :: TecCommunity