When metadata refers to shared data, however, and when lots of people can interact with that shared data, the dynamics shift.
In this scenario, the metadata produced by a few people -- for selfish reasons -- can also benefit many others. Enlightened
self-interest is one key motivator, but peer pressure is another. Although nothing compels me to tag my bookmarks or photos
according to group conventions, doing so means the items I tag will receive more attention from the group. That's a powerful
incentive to contribute, and it creates a tight feedback loop that helps metadata vocabularies converge.
Social tagging can be a great way to manage metadata about things that aren't confidential, to align communities of interest
around sets of resources, and to harness the collective ability of those communities to tease out the relationships among
things. At InfoWorld.com, for example, this process has proved an effective way to answer the request: "Show me clusters of articles related to the
current one."
When data can't be freely shared, however, and when there aren't lots of people interacting with it, social tagging lacks
the critical mass it needs to thrive. At the scale of a workgroup, a department, or even a whole company, it's unlikely this
approach will yield accurate answers to requests such as: "Show me the discussions related to sales projections made by people
working on the Trinity project." To answer such a query, you'd need to know which items relate to the project, which are sales
projections, and which are messages related to those projections.
Web and file system metadata
The Web's inventor, Tim Berners-Lee, has long imagined a "semantic Web" that makes it possible to reason about interrelated
things. To that end, the World Wide Web Consortium has proposed two initiatives: RDF (Resource Description Framework), a grammar
for describing and exchanging metadata about resources; and OWL (Web Ontology Language), a set of languages for classifying
resources.
RDF describes interrelated things in terms of subject-predicate-object "triples." Examples might be "DOCUMENT IS-A SALES_PROJECTION,"
or "DOCUMENT HAS-AUTHOR PAUL_SMITH." If you had lots of resources described in this way, and if the metadata vocabularies
were carefully controlled, and if you had a query engine that could efficiently process sets of these assertions, you could
answer all kinds of very difficult but very interesting questions. Those are three huge ifs, of course, and given the scale
and chaotic complexity of the Web, it's not surprising that little progress has been made to date.
Is there better traction to be gained in the more restricted domain of personal information management? That's what Microsoft
hopes to prove with WinFS, the next-generation file system that was originally planned for Windows Vista only, then appeared
in beta for Windows XP this fall, and is now slated to appear on both platforms sometime after Vista ships.
A WinFS data store is a collection of strongly typed items. The list of types includes Document, Person, Message, and -- most
crucially -- Relationship. The idea is that applications managing these items use relationships to weave what are, in effect,
RDF triples. Although applications can assert that a document is a sales projection, or that its author is Paul Smith, or
that Paul Smith is a project Trinity team member, these relationships are not held privately by any of the applications. Instead,
they're available systemwide to all WinFS-aware applications, any of which can query for messages written by team members
that refer to sales projections.