December 06, 2006

Data export, delivered

The Web's open architectural style and implicit APIs can help overcome data lock-in

From time to time I get recruited to help someone export mail and contacts from one e-mail program and import the data into another. The fact that a civilian must recruit a geek to accomplish this seemingly mundane task speaks volumes about our industry’s sad history of data lock-in.

Even for a geek, the solution can easily become a slide down a slippery slope. There’s no shortage of converters floating around on the Net, but it’s surprisingly hard to find one that will reliably and completely transform, say, WAB (Windows Address Book) to LDIF (LDAP Data Interchange Format) or CSV (comma-separated values).

Back in 2002, I discovered that Mozilla’s mail program, now known as Thunderbird, could import mail and contacts from an Outlook PST file and then export the data as Mbox for mail and LDIF or CSV for contacts. My referral log tells me that, to this day, people continue to seek out and use that technique.

It came in handy again this week when a friend wanted to switch from Outlook Express to Gmail. Exporting her contacts to a CSV file, and then importing them into Gmail, turned out to be a snap. But when I declared victory, she sent me scrambling down the slippery slope with this innocent-sounding question: “What about my distribution lists?”

Uh-oh. It turns out that she uses 15 lists, some with just a few individual addresses and some with more than 100. Those lists didn’t appear in the CSV file, or in the output of any other WAB converter I could find. Even my trusty Thunderbird trick only partly worked. Although Thunderbird can export lists to LDIF, it does only one at a time, so I had to create a file for each separate list. Grumble.

With half the battle won, how to inject those LDIF files into Gmail? There’s no official Google-supported API, but I’ve gotten lots of mileage out of an unofficial one called libgmail. Good news: libgmail has added support for Gmail contacts since I last used it. Bad news: It only supports individual contacts, not lists.

The solution I cobbled together speaks volumes about the fundamental openness of Web applications. To find out how Gmail creates a distribution list, I logged in, created a list interactively using Gmail’s form, and captured the resulting HTTP transaction using one of the handiest tools in my Web developer’s kit, Firefox’s LiveHTTPHeaders extension.

The next step was to replay that transaction outside of the browser. I rearranged its elements -- a URL, a chunk of HTTP POST data, and a set of HTTP headers including a cookie packed with crucial name/value pairs -- as a command-line invocation of another of the handiest tools in my kit: curl.

As proof of concept, I used Gmail’s interface to delete the list I’d just made, then invoked the curl command to recreate it. When that worked, I wrote a simple script to interpolate names and addresses from the exported LDIF files into a series of curl commands, and invoke them one at a time. And that was that.

It was only a partial solution, of course. A fully automated version would tie into libgmail’s authentication scheme, obviating the need to capture and replay an HTTP header. But the fact that it’s possible to discover and exploit implicit APIs in this way is a testament to the power and flexibility of the Web’s architectural style.

Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.