Review: Scikit-learn shines for simpler machine learning

Well-tended Python framework offers wide selection of robust algorithms, but no deep learning

At a Glance

Scikits are Python-based scientific toolboxes built around SciPy, the Python library for scientific computing. Scikit-learn is an open source project focused on machine learning: classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It’s a fairly conservative project that’s pretty careful about avoiding scope creep and jumping on unproven algorithms, for reasons of maintainability and limited developer resources. On the other hand, it has quite a nice selection of solid algorithms, and it uses Cython (the Python-to-C compiler) for functions that need to be fast, such as inner loops.

Among the areas Scikit-learn does not cover are deep learning, reinforcement learning, graphical models, and sequence prediction. It is defined as being in and for Python, so it doesn’t have APIs for other languages. Scikit-learn doesn’t support PyPy, the fast just-in-time compiling Python implementation because its dependencies NumPy and SciPy don’t fully support PyPy.

Scikit-learn doesn’t support GPU acceleration for multiple reasons having to do with the complexity and the machine dependencies it would introduce. Then again, aside from neural networks, Scikit-learn has little need for GPU acceleration.

Scikit-learn features

As I mentioned, Scikit-learn has a good selection of algorithms for classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. In the classification area, which is about identifying the category to which an object belongs, and is called supervised learning, it implements support vector machines (SVM), nearest neighbors, logistic regression, random forest, decision trees, and so on, up to a multilevel perceptron (MLP) neural network.

However, Scikit-learn’s implementation of MLP is expressly not intended for large-scale applications. For large-scale, GPU-based implementations and for deep learning, look to the many related projects of Scikit-learn, which include Python-friendly deep neural network frameworks such as Keras and Theano.

To continue reading this article register now