Microsoft CEO Satya Nadella used his Build 2016 keynote to suggest a new road to what he calls a “more personal computing,” suggesting we need to think of "conversations as a platform" and interact with sites and services beyond the traditional app and Web browser.
Key to that new platform are a new generation of conversational apps built on many years of computing research, called “bots.”
While the experimental Tay bot Microsoft unleashed on Twitter last week may have been a failure, general-purpose chat bots are increasingly powerful tools that provide interactive services on platforms such as WeChat, Slack, and soon on Skype. Instead of going to a Web page to order pizza, you’ll converse with a Domino’s bot. Or instead of slogging through flight times, you'll be getting flight information from a Delta or United bot.
Building with the Bot Framework
Ubiquitous computing researchers have long focused on the idea of the agent: Software that will complete tasks on behalf of users. While bots as they stand are much simpler than an agent, they’re tools that can work in conjunction with smarter, more personal software such as Microsoft's Cortana. That also means bots are meant to be quick and easy to build and deploy, which is where Microsoft’s new Bot Framework comes in.
Designed to be cross-platform, the Bot Framework has its own developer portal. You can quickly sign up and register existing bots to use the portal's back-end services, or build and deploy your own bots from scratch. Bots can be built to work with a wide selection of endpoints, including SMS messaging, so your app can be connected to an ancient Nokia mobile phone running on a 2G network somewhere in a developing country, extending reach well beyond the bounds of the Internet.
At the heart of the Bot Framework is the Bot Connector, which handles connections to channels, as well as cloud storage for state and session tracking. Bots themselves are written using an open source Bot Builder SDK for C# and Node.js, with the SDK hosted on GitHub (though if you have an accessible REST endpoint, you can use any language you like). Finally, registered bots are listed in a public directory, which includes a scratchpad chat service where users can try them out.
Bots are, at heart, chat engines, taking and parsing messages from a user and responding appropriately. For example, a food-ordering bot will take an order, acknowledge it, and pass the order onto an e-commerce system, along with a user’s credentials to approve payment. Developers can use a local emulator to get started, with no need to connect to the cloud service. You’ll need to build a series of call/response pairs to handle chitchat with a user, either looking for simple strings or by using machine learning tools to work with natural language responses.
A bot bestiary
The simplest bots are scripted and will be very familiar to anyone who’s built a bot for IRC or for a chat service like AIM. They behave and operate much like interactive voice response services over the phone (and could quickly be repurposed from existing voice and touch-tone-driven systems).
More complex bots will take advantage of the explosion of machine learning-powered AI systems, with Microsoft providing tools to help refine understanding of user content. An onstage demo showed a food-ordering app being trained to recognize the phrase “my crib” as referring to a user’s home address.
More complex bots can be built on top of a range of cognitive services Microsoft has been working on over the last year or so. Cortana Analytics builds on Azure’s Machine Learning platform to give you quick API access to common machine learning algorithms. That runs alongside its Cognitive Services microservices, a set of focused APIs for specific machine learning scenarios. Originally named Project Oxford, the Cognitive Services tools now include 22 different REST APIs for speech, natural language, vision, and search.
The Language Understanding Intelligent Service (LUIS) is a key tool for bot development, as it lets you build waterfall recognizers that can quickly help pinpoint user intent. It's an important feature when you’re building bots that need to work with a wide swath of the general public.
The power of conversation
APIs like these underpin conversational computing, as they enable developers to infer context from a range of different inputs. Being able to quickly see that someone is at a desk can help tailor responses and decisions, narrowing what could have been a wide decision tree and simplifying the code needed to deliver that conversational agent. Understanding the context a user is working in can quickly make a bot appear as a natural, more intelligent part of a user’s day-to-day life.
Perhaps the real power of this cognitive approach to computing can be encapsulated by the final demo of Satya Nadella’s section of the Build keynote, where a blind engineer has been able to use Cognitive Services vision and speech APIs to build an application that works with smart glasses to narrate the world. This gives him a set of tools that makes it easier to understand what is happening around him, from colleagues’ emotional responses about a presentation to a description that explains a random sound in the park.
Bots and "conversation as a service" are the foundation of a world of cognitive computing. But they bring to the table a set of ready-to-use tools that can expand our applications to a much wider audience. Satya Nadella’s first public speeches talked about a world of “ubiquitous computing and ambient intelligence.” Now, two years on, Microsoft is delivering on that vision.