Computing in the sky
Nasdaq OMX has lots of stock and fund data, and it wanted to make extra revenue selling historic data for those stocks and funds. But for this offering, called Market Replay, the company didn't want to worry about optimizing its databases and servers to handle the new load. So it turned to Amazon's S3 service to host the data, and created a lightweight reader app using Adobe's AIR technology that let users pull in the required data. "If I'm someone like Nasdaq, it's a cheap experiment," says Nik Simpson, a senior analyst at the Burton Group.
The traditional approach wouldn't have gotten off the ground economically, recalls Claude Courbois, an associate vice president for data products at Nasdaq: "The expenses of keeping all that data online was too high." So Nasdaq took its market data and created flat files for every entity, each holding enough data for a 10-minute replay of the stock's or fund's price changes, on a second-by-second basis. (It adds 100,000 files per day to the several million it started with, Courbois says.) The Adobe AIR app Courbois' team put together in just a couple days pulls in the flat files stored at Amazon.com and then creates the replay animations from them. The result: "We don't need a database constantly staging data on the server side. And the price is right."
The New York Times also used S3 for a data-intensive project: converting 11 million articles published from the newspaper's founding in 1851 through 1989, to make them available through its Web site search engine. The Times scanned in the stories, cut up into columns to fit in the scanners (as TIFF files), then uploaded those to S3 — taking 4TB of space — over several WAN head connections from the Times' datacenter.
The Times didn't coordinate the job with Amazon — someone in IT just signed up for the service on the Web using a credit card, then began uploading the data. "After about 3TB, we got an e-mail [from Amazon.com] to ask if this would be a perpetual load," recalls Derek Gottfrid, senior software architect at the Times.
Then, using Amazon.com's EC2 computing platform, the Times ran a PDF conversion app that converted that 4TB of TIFF data into 1.5TB of PDF files. Using 100 Linux computers, the job took about 24 hours. Then a coding error was discovered that required the job be rerun, adding a second day to the effort -- and increasing the tab by just $240. "It would have taken a month at our facilities, since we only had a few spare PCs," Gottfrid says. "It was cheap experimentation, and the learning curve isn't steep.
Digital Fountain, a digital-media distribution company, uses the EC2 service to deliver mobile videos over the Internet. When the company decided to launch this new offering, "we didn't want to buy our own servers and get the people to do that work," says CTO Mike Luby. So Digital Fountain now streams them from Amazon.com's EC2 servers. Because Amazon.com doesn't guarantee availability, Digital Fountain streams the video from several servers, ensuring built-in backup for its provisioning. And it can throttle the number of servers to match demand as it rises and falls, Luby notes.
Over time, Luby expects to rely on other providers in addition to Amazon.com, to ensure a geographic diversity to keep streaming times manageable, as well as to increase server density without overloading any one provider.