Machine learning reviews

Review: 6 machine learning clouds

Amazon, Microsoft, Databricks, Google, HPE, and IBM machine learning toolkits run the gamut in breadth, depth, and ease

Machine learning reviews

Show More
1 2 Page 2
Page 2 of 2

Google Cloud Machine Learning

Google recently announced a number of machine-learning-related products. The most interesting of these are Cloud Machine Learning and the Cloud Speech API, both in limited preview. The Google Translate API, which can perform language identification and translation for more than 80 languages and variants, and the Cloud Vision API, which can identify various kinds of features from images, are available for use -- and they look good based on Google's demos.

The Google Prediction API trains, evaluates, and predicts regression and classification problems, with no options for the algorithm to use. It dates from 2013.

The current Google machine learning technology, the Cloud Machine Learning Platform, uses Google's open source TensorFlow library for training and evaluation. Developed by the Google Brain team, TensorFlow is a generalized library for numerical computation using data flow graphs. It integrates with Google Cloud Dataflow, Google BigQuery, Google Cloud Dataproc, Google Cloud Storage, and Google Cloud Datalab.

I have checked out the TensorFlow code from its GitHub repository; read some of the C, C++, and Python code; and pored over the TensorFlow.org site and TensorFlow white paper. TensorFlow lets you deploy computations to one or more CPUs or GPUs in a desktop, server, or mobile device, and it has all sorts of training and neural net algorithms built in. On a geekiness scale, it probably rates a 9 out of 10. Not only is it way beyond the capabilities of business analysts, but it's likely to be hard for many data scientists.

Google Translate API, Cloud Vision API, and the new Google Cloud Speech API are pretrained ML models. According to Google, its Cloud Speech API uses the same neural network technology that powers voice search in the Google app and voice typing in Google Keyboard.

HPE Haven OnDemand

Haven OnDemand is HPE's entry into the cloud machine learning sweepstakes. Haven OnDemand's enterprise search and format conversions are its strongest services. That’s not surprising since the service is based on IDOL, HPE's private search engine. However, Haven OnDemand’s more interesting capabilities are not fully cooked.

Haven OnDemand currently has APIs classified as Audio-Video Analytics, Connectors, Format Conversion, Graph Analysis, HP Labs Sandbox (experimental APIs), Image Analysis, Policy, Prediction, Query Profile and Manipulation, Search, Text Analysis, and Unstructured Text Indexing. I have tried out a random set and explored how the APIs are called and used.

Haven speech recognition supports only a half-dozen languages, plus variations. The recognition accuracy for my high-quality U.S. English test file was OK, but not perfect.

The Haven OnDemand Connectors, which allow you to retrieve information from external systems and update it through Haven OnDemand APIs, are already quite mature, basically because they are IDOL connectors. The Text Extraction API uses HPE KeyView to extract metadata and text content from a file that you provide; the API can handle more than 500 different file formats, drawing on the maturity of KeyView.

Graph Analysis, a set of preview services, only works on an index trained on the English Wikipedia. You can't train it on your own data.

From the Image Analysis group, I tested bar-code recognition, which worked fine, and face recognition, which did better on HPE's samples than on my test images. Image recognition is currently limited to a fixed selection of corporate logos, which has limited utility.

hpe haven ondemand barcode recognition

The Haven OnDemand bar-code recognition API can isolate the bar code in an image file (see the red box) and convert it to a number, even if the bar code is on a curved surface, at an angle up to about 20 degrees, or blurry. The API does not perform the additional step of looking up the bar-code number and identifying the product.

I was disappointed to discover that HPE’s predictive analytics only deals with binary classification problems: no multiple classifications and no regressions, never mind unguided learning. That severely limits its applicability.

On the plus side, the Train Prediction API automatically validates, explores, splits, and prepares the CSV or JSON data, then trains Decision Tree, Logistic Regression, Naive Bayes, and support vector machine (SVM) binary classification models with multiple parameters. Then it tests the classifiers against the evaluation split of the data and publishes the best model as a service.

Haven OnDemand Search uses the IDOL engine to perform advanced searches against both public and private text indexes. Text Analysis APIs range from simple autocomplete and term expansion to language identification, concept extraction, and sentiment analysis.

Editor's Choice

IBM Watson and Predictive Analytics

IBM offers machine learning services based on its "Jeopardy"-winning Watson technology and the IBM SPSS Modeler. It actually has sets of cloud machine learning services for three different audiences: developers, data scientists, and business users.

SPSS Modeler is a Windows application, recently also made available in the cloud. The Modeler Personal Edition includes data access and export; automatic data prep, wrangling, and ETL; 30-plus base machine learning algorithms and automodeling; R extensibility; and Python scripting. More expensive editions have access to big data through an IBM SPSS Analytic Server for Hadoop/Spark, champion/challenger functionality, A/B testing, text and entity analytics, and social network analysis.

The machine learning algorithms in SPSS Modeler are comparable to what you find in Azure Machine Learning and Databricks’ Spark.ml, as are the feature selection methods and the selection of supported formats. Even the automodeling (train and score a bunch of models and pick the best) is comparable, although it’s more obvious how to use it in SPSS Modeler than in the others.

IBM Bluemix hosts Predictive Analytics Web services that apply SPSS models to expose a scoring API that you can call from your apps. In addition to Web services, Predictive Analytics supports batch jobs to retrain and reevaluate models on additional data.

There are 18 Bluemix services listed under Watson, separate from Predictive Analytics. The AlchemyAPI offers a set of three services (AlchemyLanguage, AlchemyVision, and AlchemyData) that enable businesses and developers to build cognitive applications that understand the content and context within text and images.

Concept Expansion analyzes text and learns similar words or phrases based on context. Concept Insights links documents that you provide with a pre-existing graph of concepts based on Wikipedia topics.

The Dialog Service allows you to design the way an application interacts with the user through a conversational interface, using natural language and user profile information. The Document Conversion service converts a single HTML, PDF, or Microsoft Word document into normalized HTML, plain text, or a set of JSON-formatted Answer units that can be combined with other Watson services.

ibm watson top predictors

I used Watson to analyze a familiar bike rental data set supplied as one of the examples. Watson came up with a decision tree model with 48 percent predictive strength. This worksheet has not separated workday and nonworkday riders.

Language Translation works in several knowledge domains and language pairs. In the news and conversation domains, the to/from pairs are English and Brazilian Portuguese, French, Modern Standard Arabic, or Spanish. In patents, the pairs are English and Brazilian Portuguese, Chinese, Korean, or Spanish. The Translation service can identify plain text as being written in one of 62 languages.

The Natural Language Classifier service applies cognitive computing techniques to return the best matching classes for a sentence, question, or phrase, after training on your set of classes and phrases. Personality Insights derives insights from transactional and social media data (at least 1,000 words written by a single individual) to identify psychological traits, which it returns as a tree of characteristics in JSON format. Relationship Extraction parses sentences into their components and detects relationships between the components (parts of speech and functions) through contextual analysis.

Additional Bluemix services improve the relevancy of search results, convert text to and from speech in a half-dozen languages, identify emotion from text, and analyze visual scenes and objects.

Watson Analytics uses IBM’s own natural language processing to make machine learning easier to use for business analysts and other non-data-scientist business roles.

Machine learning curve

The set of machine learning services you should evaluate depends on your own skills and those of your team. For data scientists and teams that include data scientists, the choices are wide open. Data scientists who are good at programming can do even more: Google, Azure, and Databricks require more programming expertise than Amazon and SPSS Modeler, but they are more flexible.

Watson Services running in Bluemix give developers additional pretrained capabilities for cloud applications, as do several Azure services, three Google cloud APIs, and some Haven OnDemand APIs for document-based content.

The new Google TensorFlow library is for high-end machine learning programmers who are fluent in Python, C++, or C. The Google Cloud Machine Learning Platform appears to be for high-end data scientists who know Python and cloud data pipelines.

While Amazon Machine Learning and Watson Analytics claim to be aimed at business analysts or "any business role" (whatever that means), I am skeptical about how well they can fulfill those claims. If you need to develop machine learning applications and have little or no statistical, mathematical, or programming background, I'd submit that you really need to team up with someone who knows that stuff.

Read the reviews:

At a Glance

Copyright © 2016 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2