IBM's Watson now (sort of) knows a cat when it sees one.
IBM has unveiled five new in-beta services for Watson Developer Cloud, its Bluemix-hosted system for creating applications that plug into the machine learning service. With it comes a paradox: The most immediately useful services aren't very impressive yet, while the more technologically impressive services might struggle to find an audience or face competition from existing services.
The new services, announced earlier today in a blog post, cover five new beta services: speech to text, text to speech, visual recognition, concept insights, and trade-off analytics. The last of those is a common in business decision-making; one provides a set of data and the service attempts to determine the best options for behaviors. IBM's live demonstration includes interactive examples like choosing a mutual fund based on multiple criteria (minimal risk, maximal long-term return, and so on) or selecting the best phone based on price.
When I asked IBM's Jeffrey Stylos (Software Engineer, Watson Platform Services) how this was superior to a mere spreadsheet, he replied, "This is due to the server-side analysis that automatically culls provably suboptimal choices. For example, in the Finance data set on the demo, 86 of the 115 input options are automatically elided." (He wasn't able to go into more detail on how this service will be explicitly powered by Watson, saying only, "There are lots of long-term plans for Watson.")
Some of the other services make more immediately visible use of Watson's existing trove of data. Concept insights lets you provide a concept -- say, solar energy -- and returns documents that are conceptually or thematically related to the topic, a kind of knowledge-graph search engine. Right now the service uses Wikipedia as the basis for its concept graph (it's not clear if Watson uses the live version of Wikipedia or a manually cleaned static snapshot), though a user can provide one instead.
The text-to-speech and speech-to-text services are much more familiar territory. Though they don't break new ground, they're consumable as a straightforward API via Watson. Output from text-to-speech sounds reasonably natural -- more than, say, the output from the Web Speech API -- though with a slightly stilted, robotic flavor. Most speech-to-text transcription systems are liable to have comprehension problems, and Watson is no exception. When speaking a sample text, the words "vocal chords" were interpreted as "ports," and "go into" became "guns."
Feed an image to Watson's visual recognition service, along with an optional classifier ("animal," "food," "landmark"), and it'll provide a list of ranked possibilities for identifying the image. Context seems to help a great deal: When I fed the service a picture of my cat with no context, Watson ranked it as "Object: 70%, Food: 68%, Cat 67%." Selecting "animal" bumped "cat" up to 70 percent.
IBM won't likely sweat the accuracy issues. The point of offering the services in their early stages is to get people working with them for free and, more important, to train Watson on the data supplied by users. This has already started to happen elsewhere; see IBM's partnership with Twitter, where data from the microblogging service is collected and run through Watson's APIs by businesses conducting sentiment analysis. In short, the long-term value of Watson won't become clearer until we see the impact of training.
Watson as a whole doesn't face competition from a single, central source, but rather from a passel of projects and initiatives. For instance, ImageNet, a joint venture between the Stanford Vision Lab, Stanford University, and Princeton University, is preparing an image classification network for educators and researchers, though that's somewhat removed from IBM's ambitions to deliver Watson as a business product.