When given a question, the software initially analyzes it, identifying any names, dates, geographic locations or other entities. It also examines the phrase structure and the grammar of the question for hints of what the question is asking.
Sometimes the question is an obvious one, and a query to a specific database will do the trick. Most times, however, the question will kick off five or 10 searches across different data sources, each an interpretation of what the question might be.
For this challenge, IBM has amassed an immense amount of reference material, including multiple encyclopedias, millions of news stories, novels, plays and other digital books. Some of the material is in structured databases; other material resides in unstructured text files.
The process is iterative. A set of results may require a new set of searches to be undertaken. "So, now you might have hundreds of processes, each generating additional candidate answers. Imagine that fan-out," Ferrucci said. An end-result may have 10,000 sets of possible questions and their corresponding answers.
Of course, Jeopardy requires only a single answer, preferably the right one. So once all the possible answers are collected, the system uses about 100 algorithms to rate each one, assessing it from different perspectives: Does the answer match the approximate time frame that the question hints at? Is it in the right geographic region? Does the grammatical form of the answer match what is required by the question? A categorical check is done: If the question is looking for a kind of liquid, is the answer a kind of liquid?
If the question with the highest score meets a preliminary threshold of confidence, that answer will be submitted.
This approach, by itself, would take a single CPU-based machine about two hours to formulate an answer to a single question, Ferrucci said. Here is where the IBM hardware comes in handy. Watson itself is composed of two racks of IBM Power7 System servers, or about 2,500 processor cores, all acting in harmony in a clustered configuration.
Each socket, which can accommodate either six or eight core processors, is able to handle 32 independent threads, said Tom Rosamilia, IBM General Manager of Power and z Systems. Each thread can host a separate search, or some other individual action.
"The great advantage that the hardware provides is the ability to run multi-threaded multicore" processes, Rosamilia said. In other words, running the software across multiple servers dramatically cuts the execution time.
Despite all this hardware muscle and software prowess, Watson's victory on the game floor is anything but assured. Last June, The New York Times reported that the system still had to be improved quite a bit to match fast-thinking Jeopardy aces.
But even as Ferrucci and his team work feverishly to make last-minute adjustments, the lessons they learn will have wider applicability, both for IBM and for the IT industry in general. Ultimately, IBM plans to use this software to build commercial systems that could answer specific questions in selected fields, such as health care, tech support, and the legal field.
"At the end of the day, whether Watson beats Jeopardy champions Ken Jennings and Brad Rutter is relatively unimportant," wrote Charles King, head of Pund-IT, in a weekly newsletter that the IT analysis firm issues. "However, a computing system demonstrating a form of essentially cognitive capabilities represents a huge technological step that will likely foreshadow profound developments in commercial IT systems and solutions."