How Bing Distill could feed Microsoft machine learning

Microsoft's new Quora-esque project, Bing Distill, could be a real-world data source for machine learning projects, echoing IBM's plans with Watson

Man yelling screaming with data streaming out of megaphone

Microsoft has unveiled the first public look at Bing Distill, a service for answering user questions. The technology also ties in with Microsoft's ongoing work in machine learning, paralleling IBM's use of real-world data to enrich its galaxy of Watson-powered services.

At first glance, Bing Distill looks and sounds like Quora or Yahoo Answers -- the proposed mechanisms for the service is about the same. Users can volunteer answers for plain-language questions posed on Bing, which are then subjected to community review, feedback, and refinement. It's not yet clear what incentives Microsoft will offer to participators; one theory is that Microsoft could use the Bing Rewards program to do this.

The longer-term plan for Distill may be to provide a curated pool of data that Bing, and Microsoft generally, can use to enrich machine-learning projects.

Many machine learning projects start with a core set of algorithms, but are powered by bodies of data harvested from the real world. Language translation, for instance, relies in statistical analyses of sets of human-curated texts, with a text (or "corpus") needed for each of the languages to be translated. Creating and maintaining those texts isn't trivial. Google can expand the corpuses for its language translation systems from crowdsourced user feedback or newly translated texts, but the input has to be kept clean or the quality of the translations will suffer over time.

Consequently, machine learning projects are finding different ways to take in data from the real world. IBM's Watson machine-learning service can now analyze data from Twitter and presumably refine its results over time to build on the data ingested (most likely with human guidance).

Likewise, Microsoft has already dropped hints as to how machine learning is already used to enhance Bing search results. In a recent set of blog posts, the Bing team discussed how its algorithms plus harvested real-world data are already in use to make what Microsoft believes are informed predictions about future events -- such as the Oscars, the NBA Draft, and even elections.

From this, it's easy to see how Microsoft could take insights generated via Bing Distill and put them in action in the same way. The near-term plan is to have Distill answers appear in Bing search results, but in the long run, Distill data could be used to fuel more.

Microsoft's ambitions for machine learning involve, among other tasks, making consumable services akin to Watson, where ordinary users supply an Azure service with data and yield an analysis that would otherwise require heavy lifting on their part. It makes sense for Microsoft to augment that with yet another source of human-curated data -- and it'll be more useful if Distill lives up to its promise of a system designed to tease verifiable facts out of user-provided content.


Copyright © 2015 IDG Communications, Inc.