The best hardware and software products of the year
InfoWorld's 2010 Technology of the Year Awards recognize the top solutions for business and IT professionals
Amazon Web Services
Amazon Web Services (AWS) is something like the world's biggest shopping mall of cloud-based services. Its array of selections is wide, ranging from storage-based services like SimpleDB, Amazon Relational Database Service, and Elastic Block Store to compute services like the Elastic Compute Cloud and Elastic MapReduce; to online ecommerce services like Amazon DevPay and Amazon Associates Web Service; to difficult-to-categorize services like the ad hoc job matchmaking service of Mechanical Turk.
Whether cloud computing ultimately establishes itself as a permanent platform for storage and data processing remains to be seen. One thing is certain, however; should cloud computing ultimately fail in its aspirations, no one will be able to blame Amazon for not trying. AWS extends to just about every dimension of cloud computing currently known. No doubt more "cloud solutions" are on the way, and as soon as someone figures out what they are, Amazon will probably be the first to provide an offering.
-- Rick Grehan
Hadoop is a data analysis application specifically designed to handle large data sets by employing distributed processing. Hadoop's scalability is remarkable; you can create and run a single-machine Hadoop system on your laptop or -- provided you have the space and finances -- deploy Hadoop across several thousand inter-networked computers.
More specifically, Hadoop is an implementation of the Map/Reduce algorithm developed by Google, running atop the Hadoop distributed file system (HDFS). Map/Reduce is a two-step process (the map step, followed by the reduce step), but each amounts to the mapping of one set of key/value pairs to another. When you create a Map/Reduce task to run in Hadoop, you write map and reduce routines (both are often remarkably small), and Hadoop acts as a managed runtime environment for them. Hadoop sees to it that distributed instances of your routines are executed, that input data is partitioned and sent to your routines, that the results are gathered and passed on to the next stage, and even that crashed instances are restarted and the situation is "healed."
Though the concept is simple, the result is powerful. Developers quickly catch on to writing map and reduce functions. (Nor are programmers restricted to a specific language for the functions; although Hadoop is Java based, you can write map and reduce functions in any language that can read and write standard input and output.) And Hadoop's scalability is practically linear. If your data gets twice as big, then double the number of systems in your Hadoop network. You need write no new code to accommodate the increase in the number of processors or expanded disk space; Hadoop sees to all that.
Best of all, Hadoop is an Apache project (and is therefore free) and has spawned several subprojects, each geared toward using the Hadoop technology to tackle huge data sets. Commercial users of Hadoop range from Amazon to Facebook to the New York Times. Amazon's Elastic MapReduce service is a Hadoop implementation.
-- Rick Grehan
Amazon's SimpleDB is precisely what its name implies: a simple database. It is simple in that creating a table (or, rather, SimpleDB's equivalent of a table -- a Domain) requires no schema. You don't have to tell SimpleDB "I am going to build a Domain whose structure is thus-and-so." You simply begin putting data into the Domain, and the structure happens.
SimpleDB is not a relational database (you could use Amazon Relational Database Service for that), but a comparatively feature-rich example of the new breed of "NoSQL" databases. Data is sorted in SimpleDB as name/value pairs, organized into items. The architecture of SimpleDB is best visualized as a table in a spreadsheet: attributes are columns, items are rows, and values are cell contents.
Operations in SimpleDB are a minimalist's dream. Domains are created or deleted using the CreateDomain and DeleteDomain requests, respectively. To read data, use GetAttributes (coupled with Select, which searches for specified data). To write data, use PutAttributes. To delete data, use DeleteAttributes. The Select operator -- analogous to SQL's SELECT statement -- recognizes a limited set of conditionals, but can handle the majority of basic queries.