December 03, 2004

Bootstrapping the semantic Web

Tim Berners-Lee's quest to give the Web meaning receives aid from unexpected quarters

It's tempting to draw parallels between the careers of Albert Einstein and Tim Berners-Lee. Both men made world-transforming breakthroughs and then pursued even grander visions. Einstein, of course, never found the unified theory he sought for three decades. A lot of people think Berners-Lee's vision of a semantic Web will prove equally elusive.

We can all imagine the desired outcome: a version of the Web where items are related explicitly, not merely by co-occurrence of words. But skepticism has greeted the "semweb" technologies that Berners-Lee has been spearheading in the W3C. The approach is based on what's called "ontology," which the W3C defines as "a representation of terms and their interrelationships." Critics argue that we'll never agree on (or consistently apply) an ontology -- and they point to Google as proof that we don't need to.

Two companies I've encountered recently think there's a middle ground. One is Semagix, which offers an application toolkit, called Freedom, to help developers create, and then build on, a domain-specific ontology. Take the case of an anti-money-laundering application. The ontology is derived from authoritative information about individuals and companies provided by the likes of Dun & Bradstreet and Hoover's. Given such a framework, automatic classifiers can read unstructured documents -- e-mail, news feeds, Web pages -- and attach them to the framework. As a result, Semagix says, you can answer questions such as, "Which recent news reports mention companies that share directors with company X"?

Digital Harbor is taking a similar approach. The company's PiiE (Professional Interactive Information Environment), originally pitched as a rich Internet app-dev toolkit, has lately shifted toward "business ontology." Digital Harbor's Fusion Server helps developers define a set of terms and relationships, populate that framework with structured data, and then attach unstructured data to it. And as does Semagix, Digital Harbor emphasizes fusing ontology and data so that users can "connect the dots."

Of course, it's never as easy as we'd like to imagine. Consider Eliyon, a company that's gathered public information about more than 22 million people to support sales, recruiting, and other applications. As it turns out, I am several of those people. In addition to my current title, InfoWorld Test Center lead analyst, I show up as executive editor of Byte Magazine and contributor to Linux Magazine. And while those were once accurate descriptions of me, I have never been a member of Blue Titan's board of advisors, and I am not the inventor of RSS.

It's true I could register with the site, coalesce my correct identities, and purge the wrong ones. But authenticating with a credit card in order to update a profile that Eliyon owns is a nonstarter for me. Back in June, on my Weblog, I suggested the alternative that would suit me: I'll maintain my own profile on the Web and syndicate my data to anyone who needs it.

Semantic-Web naysayers think people and organizations can't be bothered to assert machine-readable facts about themselves. And, today, that is undoubtedly true. But when others assert facts about you -- as they increasingly will -- the tide could begin to turn. Individual acts of self-defense may ultimately combine to bootstrap the semantic Web.

Close

On Twitter now

Applications

Powered by Twitter

On Twitter now

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive Applications Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.