Review: Microsoft Azure AI and Machine Learning aims for the enterprise

Microsoft Azure combines a wide range of cognitive services and a solid platform for machine learning that supports automated ML, no-code/low-code ML, and Python-based notebooks.

1 2 3 Page 2
Page 2 of 3

Language Understanding

The Azure Language Understanding service, also call LUIS (the “I” stands for intelligent), allows you to define intents and entities and map them to words and phrases, then use the language model in your own applications. You can use prebuilt domain language models and also build and use customized language models. You can build the model with the authoring APIs, or with the LUIS portal, or both. The process of training LUIS models uses machine teaching, which is simpler than conventional machine learning training, and a LUIS model also improves continuously as the model is used. LUIS supports both text and speech input.

microsoft azure ai 07 IDG

The Language Understanding service, also called LUIS, allows you to use prebuilt domain language models and to build and use customized language models. Here we are using a limited vocabulary of intents and entities to control a picture in a web demo.

QnA Maker

QnA Maker lets you create a conversational question-and-answer layer over your existing data. You can use it to build a knowledge base by extracting questions and answers from your semi-structured content, including FAQs, manuals, and documents. You can answer users’ questions with the best answers from the QnAs in your knowledge base automatically. In addition, your knowledge base gets smarter as it continually learns from user behavior.

In addition to web Q&A, you can create and publish a bot in Teams, Skype, or elsewhere. You can also make your bot more conversational by adding a pre-populated chit-chat dataset at a range of (in)formality levels from professional to enthusiastic.

QnA Maker is often implemented behind a Language Understanding service. QnA Maker itself requires an App Service and a Cognitive Search service.

Text Analytics

Text Analytics is an AI service that uncovers insights such as sentiment, entities, relations, and key phrases in unstructured text. You can use it to identify key phrases and entities such as people, places, and organizations, to understand common topics and trends. A related service, Text Analytics for Health (currently in preview), allows you to classify medical terminology using domain-specific, pretrained models. You can gain a deeper understanding of customer opinions with sentiment analysis, and evaluate text in a wide range of languages.

Microsoft offers extensive transparency notes for its Text Analytics services, which ties into Microsoft’s responsible AI principles. It also promises privacy, in that it doesn’t use the training performed on your text to improve Text Analytics models. If you choose to run Text Analytics in containers, you can control where Cognitive Services processes your data.

The alternative to running Text Analytics in containers is making synchronous calls to the service’s REST API, or using the client library SDK. In November 2020 Azure added a new preview Analyze operation for users to analyze larger documents asynchronously, combining multiple Text Analytics features in one call.

microsoft azure ai 08 IDG

You can understand Text Analytics using this web demo, which is unfortunately not interactive. There are Text Analytics Quickstarts you can run in C#, Python, JavaScript, Java, Ruby, and Go.


Bing Translator (now called Microsoft Translator) once ran a distant second to Google Translate for quality and speed of translation, as well as in numbers of supported language pairs. That’s no longer the case for quality and speed, although as of December 2020 Google Translate supports 109 languages, compared to Microsoft’s “more than 70.” (Both translate Klingon, at least in text and using the API, should you happen to care.)

Azure’s Translator service not only provides stock translations, but also allows you to build custom models for domain-specific terminology, although using customized translation models is a little more expensive than using stock models. Microsoft doesn’t share the custom training performed on your training material or use it to improve Translator quality, nor does it log your API text input during translation, even though it does use consumer feedback from Bing Translate to improve stock Translator quality.

Translator is the same service that powers translations in all of Microsoft’s products, such as Word, PowerPoint, Teams, Edge, Visual Studio, and Bing, not to mention the Microsoft Translator app. In addition to being available as a cloud service, Translator can be downloaded to run locally on Edge devices for some languages.

I created a free-tier Translator service without any problems. I didn’t learn much more from testing it programmatically, however, than I did from exercising Bing Translator on the web or the Microsoft Translator app on Android. Bing Translator and Microsoft Translator also support speech and vision for some languages.

microsoft azure ai 09 IDG

Bing Translator, shown here, and the Microsoft Translator mobile app both use the Azure Translator service, although the app may run the Edge version of Translator on the mobile device. Here I’ve spoken “sumimasen,” a rather flexible Japanese apology, which the site correctly transcribed into Hiragana and translated as “Excuse me” as well as “I am sorry” and “Pardon me.”


The Speech area of Azure Cognitive Services includes speech recognition, text to speech, speech translation, and speaker recognition.

The Speech SDK supports C#, C++,  Java, JavaScript, Objective-C, Python, and Swift for both speech recognition and speech generation, and Go for recognition only. It exposes many features from the Speech service, but not all of them. The Speech SDK also supports the Speech Translation API, which can translate speech input to a different language with a single call.

Speech Studio is a customization portal for Speech. It claims to supply all the tools you need to transcribe spoken audio to text, perform translations, and convert text to lifelike speech.

Speech to Text

Microsoft describes its Speech to Text service as allowing you to quickly and accurately transcribe audio to text in more than 85 languages and variants. The latter include six variants of English, seven variants of Arabic, and two variants each of French, Portuguese, and Spanish. You can also customize speech recognition models to enhance accuracy for domain-specific terminology, and combine transcriptions with language understanding or search services.

microsoft azure ai 11 IDG

A test of Azure Speech to Text in US English. The system had no trouble recognizing my dictation nor in adding correct sentence capitalization and punctuation.

Text to Speech

Azure’s Text to Speech service allows you to build apps and services that speak naturally, choosing from more than 200 voices and over 50 languages and variants. Some of the 50 languages and variants don’t have corresponding speech recognition support.

You can differentiate your brand with a customized voice, and access voices with different speaking styles and emotional tones to fit your use case. Standard voices don’t sound as natural as neural voices, but neural voices cost four times as much to use. Azure neural voices compete with Google WaveNet voices.

microsoft azure ai 12 IDG

This is the test panel for Azure Text to Speech. It supports all the available languages and variants (50 total) and all the available voices for each variant (200 total counting all variants), both standard and neural (high quality). Neural voices support several speaking styles in addition to general, such as newscast and customer service.

Speech Translation

The Speech Translation service allows you to translate audio from more than 30 languages and customize your translations for your organization’s specific terms. The service essentially combines speech to text for the source language, language translation, and text to speech for the target language.

microsoft azure ai 13 IDG

Azure Speech Translation works in stages. First, it recognizes speech, corrects it, and adds punctuation. Then it translates the text. Finally, it speaks the translated text, for languages with speech support.

Speaker Recognition

The Speaker Recognition service, currently in preview, works in two use cases. For identification, it matches the voice of an enrolled speaker from within a group, which is useful in transcribing conversations. For verification, it can either use pass phrases or free-form voice input to verify individuals for secure customer engagements.


This area includes computer vision, custom vision, face detection, form recognition, and video indexing.

Computer Vision

The Computer Vision service includes a bunch of capabilities for analyzing still images and video. These include optical character recognition (OCR), digital asset management (DAM), tagging visual features, object detection (with bounding boxes), commercial brand detection, categorizing images with a taxonomy, description generation, face detection, image type detection, domain-specific content detection, color scheme detection, thumbnail generation, area of interest detection (with bounding boxes), and adult content detection.

Spatial analysis analyzes video for events such as people detection (with bounding boxes), people tracking (from frame to frame), and region of interest (for example, person crossing a line or entering a zone). Spatial analysis is in a gated preview that requires an application.

microsoft azure ai 14 IDG

The Azure Computer Vision service identifies objects and their bounding boxes and reads text from images and video.

Custom Vision

Transfer learning is a quick way to customize an image model. Custom Vision uses transfer learning to create custom image models from just a few tagged images—not the thousands of images you might expect to need. It can also help you with untagged images. As you add more images, the model keeps improving.

microsoft azure ai 15 IDG

Custom Vision can train on a small number of tagged, or even untagged images. Adding more images improves the model’s accuracy.


The Face service includes face detection that perceives faces and attributes in an image; person identification that matches an individual in your private repository of up to 1 million people; perceived emotion recognition that detects a range of facial expressions like happiness, contempt, neutrality, and fear; and recognition and grouping of similar faces in images.

microsoft azure ai 16 IDG

Azure Face Verification returns the probability that two faces match.

Form Recognizer

Form Recognizer applies advanced machine learning to accurately extract text, key/value pairs, and tables from documents. With surprisingly few samples (the examples given used five exemplars for each custom document type), Form Recognizer tailors its understanding to your custom documents, both on-premises and in the cloud. It also has several pre-built models, such as for layouts, invoices, sales receipts, and business cards.

microsoft azure ai 17 IDG

The Form Recognizer service can extract text from documents and forms.

Video Indexer

With the Video Indexer service, you can automatically extract metadata—such as spoken words, written text, faces, speakers, celebrities, emotions, topics, brands, and scenes—from video and audio files. Then you can access the data within your application or infrastructure or make it more discoverable.

microsoft azure ai 18 IDG

Azure Video Indexer automatically extracts metadata from video and audio. Here it has extracted people, topics, and labels from videos of a Microsoft event.

Web search

The Bing Search APIs used to be here, under Cognitive Services. They are now under Bing.

Azure Machine Learning

While Azure Cognitive Services are primarily aimed at software developers, Azure Machine Learning is primarily aimed at data scientists. There’s a lot of overlap, of course: Data scientists may well choose to use Cognitive Services if they already work well for the application at hand or can be customized with transfer learning, and programmers may find that they are comfortable using Jupyter Notebooks to build models for cases where Cognitive Services fall short. Even business analysts without machine learning experience may manage to build models with AutoML or using the Azure Machine Learning drag-and-drop designer.

Microsoft maintains that Azure Machine Learning accelerates the end-to-end machine learning lifecycle. That doesn’t mean that you are required to use Azure Machine Learning for everything—just that you can. You can also integrate third-party products and other Azure services with Azure Machine Learning. The first three screenshots below show Microsoft’s visual introduction to Azure Machine Learning, although the screens are slightly out of date. In particular, the first intro screen lacks the Pipelines tab under Assets, shown in the fourth screen below.

microsoft azure ai 19 IDG

The Studio portion of Azure Machine Learning organizes and manages the entire machine learning lifecycle. Note the three choices for model building: AutoML, a GUI designer, and Azure Notebooks.

microsoft azure ai 20 IDG

One of the three choices for model building is AutoML. This screen allows you to manage recent automated machine learning runs.

microsoft azure ai 21 IDG

Once you have an acceptable model, you can deploy it to the Azure cloud or edge. You can monitor it over time and then retrain it when the data drifts.

microsoft azure ai 22 IDG

The current version of Azure Machine Learning includes a Pipelines tab under Assets.

At a Glance
  • Microsoft Azure AI and Machine Learning offers a wide range of cognitive services and a solid platform for machine learning that supports automated machine learning, no-code/low-code machine learning, and Python-based notebooks. All of the services are very solid.


    • Most services have a free tier for development and test.
    • Transfer-learning services such as Custom Vision and Form Recognizer can learn from a small number of samples (in the single digits).
    • A subset of cognitive services is available to run in containers on-premises.
    • Azure has made good progress on Responsible AI.


    • Azure currently lacks natural language generation services.
1 2 3 Page 2
Page 2 of 3