Early experiments in cloud computing

Cloud computing has many faces, some just beginning to take shape. But at the New York Times and Nasdaq, first steps into on-demand infrastructure show promise

You know there's substance behind a technology buzzword when companies such as the Nasdaq OMX stock exchange and the New York Times publishing company use it for real production efforts. Cloud computing is the latest buzzword that vendors are using to spruce up the usual sales spiel, and the fever pitch is enough to make you think, "Dot-com boom, here we go again." While the skepticism is warranted, something very real is happening, and IT needs to pay attention.

So what are Nasdaq and the Times doing? In a phrase, utility computing. Both have tapped into Amazon.com's Internet-provisioned computing and storage services -- Elastic Compute Cloud (EC2) and Simple Storage Service (S3) -- to augment their own IT resources.

The Times processed 4TB of data through EC2 and S3, using a credit card to get the service going in a matter of minutes so that it could convert scans of 15 million news stories into PDFs for online distribution. Nasdaq uses S3 to deliver historical stock and mutual fund information, rather than add the load to its own database and computing infrastructure. Likewise, Infosolve Technologies uses Sun's Network.com grid-in-the-cloud utility to scrub customer addresses rather than stand up that infrastructure internally.

In another realm of cloud computing, companies such as medical robotics firm Intuitive Surgical and recruitment services provider Jobscience use in-the-cloud development environments to create a provision their own applications. Both companies use Salesforce.com's Force.com platform as a service, the ungainly name for this online IDE service, but other firms such as Coghead offer their own platforms.

[ Just what is cloud computing? InfoWorld sorts it all out.]

These two forms of cloud computing -- utility computing and platform as a-service -- are exciting developments. Unlike SaaS (software as a service), they're aimed squarely at IT users, not at business users looking to bypass IT (or that IT is happy to let someone else take care of). But despite early promise, analysts say there's a long way to go before they're a mainstream part of your datacenter. So the question is: Do you sit back and wait for them to mature, or do you experiment so that you can get early advantage when they're enterprise-class?

Computing in the sky
Nasdaq OMX has lots of stock and fund data, and it wanted to make extra revenue selling historic data for those stocks and funds. But for this offering, called Market Replay, the company didn't want to worry about optimizing its databases and servers to handle the new load. So it turned to Amazon's S3 service to host the data, and created a lightweight reader app using Adobe's AIR technology that let users pull in the required data. "If I'm someone like Nasdaq, it's a cheap experiment," says Nik Simpson, a senior analyst at the Burton Group.

The traditional approach wouldn't have gotten off the ground economically, recalls Claude Courbois, an associate vice president for data products at Nasdaq: "The expenses of keeping all that data online was too high." So Nasdaq took its market data and created flat files for every entity, each holding enough data for a 10-minute replay of the stock's or fund's price changes, on a second-by-second basis. (It adds 100,000 files per day to the several million it started with, Courbois says.) The Adobe AIR app Courbois' team put together in just a couple days pulls in the flat files stored at Amazon.com and then creates the replay animations from them. The result: "We don't need a database constantly staging data on the server side. And the price is right."

The New York Times also used S3 for a data-intensive project: converting 11 million articles published from the newspaper's founding in 1851 through 1989, to make them available through its Web site search engine. The Times scanned in the stories, cut up into columns to fit in the scanners (as TIFF files), then uploaded those to S3 — taking 4TB of space — over several WAN head connections from the Times' datacenter.

The Times didn't coordinate the job with Amazon — someone in IT just signed up for the service on the Web using a credit card, then began uploading the data. "After about 3TB, we got an e-mail [from Amazon.com] to ask if this would be a perpetual load," recalls Derek Gottfrid, senior software architect at the Times.

Then, using Amazon.com's EC2 computing platform, the Times ran a PDF conversion app that converted that 4TB of TIFF data into 1.5TB of PDF files. Using 100 Linux computers, the job took about 24 hours. Then a coding error was discovered that required the job be rerun, adding a second day to the effort -- and increasing the tab by just $240. "It would have taken a month at our facilities, since we only had a few spare PCs," Gottfrid says. "It was cheap experimentation, and the learning curve isn't steep.

Digital Fountain, a digital-media distribution company, uses the EC2 service to deliver mobile videos over the Internet. When the company decided to launch this new offering, "we didn't want to buy our own servers and get the people to do that work," says CTO Mike Luby. So Digital Fountain now streams them from Amazon.com's EC2 servers. Because Amazon.com doesn't guarantee availability, Digital Fountain streams the video from several servers, ensuring built-in backup for its provisioning. And it can throttle the number of servers to match demand as it rises and falls, Luby notes.

Over time, Luby expects to rely on other providers in addition to Amazon.com, to ensure a geographic diversity to keep streaming times manageable, as well as to increase server density without overloading any one provider.

[ Get a complete view of the cloud in our special report.]

There's more to utility computing than Amazon.com. Sun also has its own cloud-based computing platform, Network.com. Unlike EC2, though, it's a grid, meaning it specializes in parallel processing, where a task can be broken into independent steps that a large array of processors can tackle all at once. That limits its use to applications such as rendering, data scrubbing, and image transformation. "Not everything can be thrown at the Sun grid," says Subbu Manchiraj, vice president for technology at Infosolve Technologies, a provider of data management services. But where a task can be parallelized, the benefit is huge, he said.

Infosolve has used the Sun grid for the past 18 months to scrub names and addresses, making sure they are correct (such as verifying the ZIP code and ensuring that the street addresses are properly segmented). "With Sun, we can run 2,000 processors and get the data back quickly." Plus, Infosolve is a Java shop, so its application development skills were easily tuned to the style of Sun's grid apps. That let Infosolve offer its customers a turn-key data-scrubbing service it couldn't afford to stand up itself. "It's an offsite datacenter. And we pay only for what we use," he adds.

The grid's quick scalability has meant that Infosolve doesn't need to worry about balancing customers' loads. But another factor also helps Infosolve avoid worrying about scheduling: The jobs are batched, and customers have no expectation of real-time response. Thus, if resources do run out, the customer won't ever know. Ditto if there's a failure: "We can just restart the job," Manchiraj says.

Testing out online IDEs
Less mature than the cloud infrastructure plays are the app dev and app hosting platforms provisioned over the Internet. These are intended for apps that will be delivered over the Internet and through the browser anyhow, such as online commerce and services and apps delivered to mobile and remote employees. So it's no surprise that most early adopters of these online IDEs are themselves Web-based service providers.

A typical example is Jobscience, a provider of recruiting services. The company had been using Salesforce.com's customization tools, so it had gown comfortable with the underlying service availability. At the same time, the company struggled to manage its Adobe ColdFusion-based server environment, so CEO Ted Elliott began looking at using an outside hosting firm to simplify its ability to provision customers over the Internet. "But they manage to the stack, not to the app," Elliott says, and he wanted an environment that was operationally optimal, not just technically correct. So he turned to Salesforce.com's Force.com platform as a service to create and host the apps.

Elliott's biggest challenge was internal: His developers didn't want to let go the control over the app environment. But now, "they're starting to see what they can build that doesn't exist [in Force.com] while using the basics [in Force.com] such as calendaring and scheduling," he notes. So the in-house developers get to innovate differentiating apps, not build the basics that everyone else already has.

A more in-the-enterprise example is Intuitive Surgical, a surgical-robotics maker. It is involved in clinical trials of its equipment, and so needs to collect and distribute data across a range of clinical facilities, all of which are separate firms or entities. That data is inconsistent and hard to integrate to get meaningful analysis done. So Intuitive Surgical used Force.com to create a forms-based app to collect that clinical data from all trial participants. "We could build it using just their tools, so in essence, there was no programming," says Mark Burns, a clinical data specialist at the company.

But while the app is handy to collect data, Intuitive Surgical can't use it to submit trial results to the FDA. "It doesn't have the rigor that the FDA would require," he notes, around auditing of the data and tracking everything's that done to ensure the data has not been compromised or altered.

A long way to the cloud
Despite the successful examples of the first wave of infrastructure-oriented cloud computing, it's early days for these IT-oriented forms of Internet-based provisioning, and any large-scale shift to them is a good decade away, notes Ben Pring, a senior analyst at Gartner. But it is coming.

Although it can be easy to set up an S3 account using just a credit card, as the Times did, provider availability is a big factor, notes Jon Williams, CIO of Kaplan Test Prep and Admission. For example, Amazon's S3 had an outage in February, notes Burton Group's Simpson. "If half the IT infrastructure is unavailable, that's a difficult situation," he exclaims. (Another outage occurred the day this story was published.)

Some companies can't rely on Internet-provisioned infrastructure services because of regulatory compliance and security issues, adds Williams. "Many have scary compliance issues. How do you demonstrate what you are doing is in compliance when it is done outside?" says Burton Group's Simpson. He notes that the early infrastructure services aren't audited, or take the liability, for security or compliance requirements. Such issues have kept Merrill Lynch from using cloud-provisioned infrastructure. "We're very bound by regulators in terms of client data and country-of-origin issues, so it's very difficult to use the cloud," says Rupert Brown, a chief architect at the financial services firm.

To the degree that cloud computing raises risk, it will inhibit adoption in equal measure, particularly among enterprises. But every large company has noncritical areas where low cost of entry and quick deployment trump reliability, and a significant number of small businesses fall in the same category. For them, experimenting with cloud computing could put them on good footing for an agile, connected future. That's exactly what pioneers like Nasdaq and the New York Times have found.

Copyright © 2008 IDG Communications, Inc.

InfoWorld Technology of the Year Awards 2023. Now open for entries!