When we talk and write about IT issues, we use certain words to mean many different things: "Platform," "architecture," and "integration" are among the worst offenders. But the most overloaded term in the IT lexicon may well be "metadata."
Everyone knows the common definition: Metadata is data about data, a secondary thing that's separate in some way from the primary thing to which it refers. But that definition begs a series of questions. Is metadata something we derive from data, or assign to it? Does it classify things, or enable us to search for things, or govern the behavior of things? If data that is described by metadata also, in turn, refers to other data, does it then qualify as both data and metadata?
These questions can verge on the philosophical, but by working through some examples, we can define various types of metadata, list the benefits that we expect from using it, and identify the challenges associated with maintaining it. Programs, documents, messages, files, Web resources, and Web services are some of the IT constructs often described by metadata. Let's review the roles that metadata can play in these different scenarios.
Since the birth of software, programmers have embedded one kind of metadata -- namely comments -- in their source code. Making such comments more integral to software has been a long-standing quest. In the 1980s, the legendary computer scientist Donald Knuth began evangelizing a technique he called "literate programming." Knuth was the inventor of TeX, a markup language that's still used for math-intensive typesetting. His idea was to use TeX in tandem with a programming language to compose a single document that blended both code and documentation.
Knuth's approach never really caught on, but the idea of weaving comments more intimately into code continued to evolve. Java programmers, for example, write specially formatted comments in their source code and then use the Javadoc tool to translate those comments into HTML documentation.
Comments are an informal kind of metadata used to describe the design and operation of software for human readers. But they can also be used in more formal ways to declare properties of software components and relationships among them. A module that checks credit card numbers, for example, might be invoked directly or by way of a Web services framework. Specifying the invocation style in a comment, rather than in the code, is one way to separate configuration logic from business logic.
Because comments don't survive compilation, though, such configuration metadata is only indirectly linked to the code to which it refers. Why not embed the metadata directly in the generated code? The .Net architecture enables just that. With J2SE 1.5, Java does, too. Thanks to a technique called reflection, available in both environments, it's possible to query class files or assemblies at run time, discover these metadata annotations, and react dynamically to them. Metadata can be used to declare that a component must run in a transactional context, for example, or to specify the kind of authentication it must use.
These custom annotations are assigned to software, not intrinsic to it. But Java and .Net programs also make available, through reflection, intrinsic metadata about the objects they contain, as well as the types and properties of those objects. As a result, these self-describing programs can collaborate with other programs in highly dynamic ways.