October 20, 2005

Managing metadata

Data about data provides enormous opportunities to organize information in new and useful ways

When we talk and write about IT issues, we use certain words to mean many different things: "Platform," "architecture," and "integration" are among the worst offenders. But the most overloaded term in the IT lexicon may well be "metadata."

Everyone knows the common definition: Metadata is data about data, a secondary thing that's separate in some way from the primary thing to which it refers. But that definition begs a series of questions. Is metadata something we derive from data, or assign to it? Does it classify things, or enable us to search for things, or govern the behavior of things? If data that is described by metadata also, in turn, refers to other data, does it then qualify as both data and metadata?

These questions can verge on the philosophical, but by working through some examples, we can define various types of metadata, list the benefits that we expect from using it, and identify the challenges associated with maintaining it. Programs, documents, messages, files, Web resources, and Web services are some of the IT constructs often described by metadata. Let's review the roles that metadata can play in these different scenarios.

Software metadata

Since the birth of software, programmers have embedded one kind of metadata -- namely comments -- in their source code. Making such comments more integral to software has been a long-standing quest. In the 1980s, the legendary computer scientist Donald Knuth began evangelizing a technique he called "literate programming." Knuth was the inventor of TeX, a markup language that's still used for math-intensive typesetting. His idea was to use TeX in tandem with a programming language to compose a single document that blended both code and documentation.

Knuth's approach never really caught on, but the idea of weaving comments more intimately into code continued to evolve. Java programmers, for example, write specially formatted comments in their source code and then use the Javadoc tool to translate those comments into HTML documentation.

Comments are an informal kind of metadata used to describe the design and operation of software for human readers. But they can also be used in more formal ways to declare properties of software components and relationships among them. A module that checks credit card numbers, for example, might be invoked directly or by way of a Web services framework. Specifying the invocation style in a comment, rather than in the code, is one way to separate configuration logic from business logic.

Because comments don't survive compilation, though, such configuration metadata is only indirectly linked to the code to which it refers. Why not embed the metadata directly in the generated code? The .Net architecture enables just that. With J2SE 1.5, Java does, too. Thanks to a technique called reflection, available in both environments, it's possible to query class files or assemblies at run time, discover these metadata annotations, and react dynamically to them. Metadata can be used to declare that a component must run in a transactional context, for example, or to specify the kind of authentication it must use.

These custom annotations are assigned to software, not intrinsic to it. But Java and .Net programs also make available, through reflection, intrinsic metadata about the objects they contain, as well as the types and properties of those objects. As a result, these self-describing programs can collaborate with other programs in highly dynamic ways.

Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.