Inside Amazon Web Services
From storage to payment, the king of clouds is dangling an array of low-cost services. We take a close look at the tools for IT and developers.Follow @infoworld
Here's how it works. Suppose you have a big pile of identical tasks that must be performed by humans. Perhaps you have a large quantity of text files that must be translated from one language to another. In the world of Mechanical Turk, you are a requester; you submit your tasks to the Mechanical Turk service, which places them on a kind of global bulletin board. Using that same service, workers log onto this bulletin board, select tasks, perform them, and post the results back to the service. Later you return to the Mechanical Turk, review the posted results, select those that are acceptable, and release funds to pay the workers. In short, the Mechanical Turk service is a middleman between employers and employees.
When I first read Mechanical Turk's description, I thought it was a great idea. It may yet be, but if my perusal of the tasks that are available is any indication, this is not a way to make any appreciable amount of money. Most of the HITS ("Human Intelligence Task," referring to a unit of work) posted paid mere pennies, and reading some of the descriptions gave me the uneasy feeling that workers would be used as human spam-bots.
It is possible that, in the future, Mechanical Turk will become a marketplace of decent work for reasonable money. For now, though, I am confident that I can make more money in less time – and do more good – by mowing the old lady's lawn next door.
Wading into Web Information Services
Amazon's Web Information Services are essentially query interfaces into extensive databases generated by a mixture of Web crawlers and Web traffic monitors. Data-mining organization can tap into the crawler-produced data to sift through information that is as wide-ranging as the Web itself. The utility of Web traffic data is self-evident to any company or individual interested in user visitation trends to their sites – as well as to related or competing sites.
AlexaWeb Search. Amazon's Alexa Web Search is the result of partnering between Amazon and Alexa, and it lets you query the information gathered by Alexa's Web crawler bots. The quantity of information available is difficult to gauge; Alexa has been crawling the Web for over a decade, and the Internet is in nonstop growth. Alexa's site says that, while its bots are working constantly, it takes about two months for a complete cycle through the Internet.
When Alexa adds a new Web site document to its database, it indexes about 50 attributes associated with that document. Attributes include the document's language, its Open Document Category, various parsed components of the URL, geographic location of the hosting server, and more. Also available is the document's text, the first 20KB of which is text-indexed. All this is available for searching.
Naturally, searches on such a large database can take time. The Alexa Web Search service is architected so that when you issue a search, the service returns a request ID. You use this ID to track the status of your search's progress. When the search is complete, results are stored in a (possibly gigantic) text file. The text file can be downloaded and "mined" locally.