Python power comes to SQL Server 2017

The next version of Microsoft's flagship database will run Python scripts, with full access to Python third-party libraries, as native T-SQL stored procedures

Python power comes to SQL Server 2017
Credit: Mike

Don't bring the data to your computation if you can help it. Bring your computation to your data.

Microsoft has heeded this cardinal rule of information science with the latest Community Technology Preview release of SQL Server 2017. Python can now be used within SQL Server to perform analytics, run machine learning models, or handle most any kind of data-powered work.

This integration isn't limited to enterprise editions of SQL Server 2017, either—it'll also be available in the free-to-use Express edition.

The most conventional application of Python with SQL Server is to execute Python scripts as normal, with SQL Server as a data source. Microsoft has also made it possible to embed Python code directly in SQL Server databases by including the code as a T-SQL stored procedure. This allows Python code to be deployed in production along with the data it'll be processing.

These behaviors, and the RevoScalePy package, are essentially Python versions of features Microsoft built for SQL Server back when it integrated the R language into the database. Both Python and R T-SQL code can be used side by side in the same database if needed.

An existing Python installation isn't required. During the setup process, SQL Server 2017 can pull down and install its own edition of CPython 3.5, the stock Python interpreter available from the Python.org website. Users can install their own Python packages as well or use Cython to generate C code from Python modules for additional speed.

Installation also includes packages from the Anaconda distribution of Python, widely used in data science, and Microsoft's RevoScalePy package, a set of data analysis functions that can take advantage of SQL Server's in-memory and column-store index features. Third-party modules like TensorFlow can augment SQL Server's processing with GPU-accelerated functions. Database admins can also set constraints on the behavior of the Python runtime and prevent scripts from violating security or network policies.

It's not clear if Microsoft will eventually allow other Python distributions to be used in place of CPython, since some have been built specifically to enhance productivity for data scientists. For example, the Intel Distribution for Python, a reworking of the Anaconda distribution that uses the Intel Math Kernel Library to speed up common math operations like linear algebra or fast Fourier transforms on Intel processors, could provide a major pick-me-up. Barring that, existing math libraries for CPython like NumPy or Pandas ought to provide the same kinds of boosts for Python in SQL Server as they do for Python generally.

Linux users will have to wait to take advantage of this. Microsoft has previously announced SQL Server would be available for Linux, but right now, only the Windows version of SQL Server 2017 supports Python.