Social tagging can be a great way to manage metadata about things that aren't confidential, to align communities of interest around sets of resources, and to harness the collective ability of those communities to tease out the relationships among things. At InfoWorld.com, for example, this process has proved an effective way to answer the request: "Show me clusters of articles related to the current one."
When data can't be freely shared, however, and when there aren't lots of people interacting with it, social tagging lacks the critical mass it needs to thrive. At the scale of a workgroup, a department, or even a whole company, it's unlikely this approach will yield accurate answers to requests such as: "Show me the discussions related to sales projections made by people working on the Trinity project." To answer such a query, you'd need to know which items relate to the project, which are sales projections, and which are messages related to those projections.
Web and file system metadata
The Web's inventor, Tim Berners-Lee, has long imagined a "semantic Web" that makes it possible to reason about interrelated things. To that end, the World Wide Web Consortium has proposed two initiatives: RDF (Resource Description Framework), a grammar for describing and exchanging metadata about resources; and OWL (Web Ontology Language), a set of languages for classifying resources.
RDF describes interrelated things in terms of subject-predicate-object "triples." Examples might be "DOCUMENT IS-A SALES_PROJECTION," or "DOCUMENT HAS-AUTHOR PAUL_SMITH." If you had lots of resources described in this way, and if the metadata vocabularies were carefully controlled, and if you had a query engine that could efficiently process sets of these assertions, you could answer all kinds of very difficult but very interesting questions. Those are three huge ifs, of course, and given the scale and chaotic complexity of the Web, it's not surprising that little progress has been made to date.
Is there better traction to be gained in the more restricted domain of personal information management? That's what Microsoft hopes to prove with WinFS, the next-generation file system that was originally planned for Windows Vista only, then appeared in beta for Windows XP this fall, and is now slated to appear on both platforms sometime after Vista ships.
A WinFS data store is a collection of strongly typed items. The list of types includes Document, Person, Message, and -- most crucially -- Relationship. The idea is that applications managing these items use relationships to weave what are, in effect, RDF triples. Although applications can assert that a document is a sales projection, or that its author is Paul Smith, or that Paul Smith is a project Trinity team member, these relationships are not held privately by any of the applications. Instead, they're available systemwide to all WinFS-aware applications, any of which can query for messages written by team members that refer to sales projections.
Marrying a relational database to a file system is one of the daunting challenges faced by WinFS. Another will be convincing developers to exploit the built-in WinFS types and create new types that define customized axes of controlled metadata. A third challenge will be to build bridges between WinFS types, which are specialized .Net objects, and documents or messages represented using XML and described by XML schemas.