Review: SQL Server 2016 boosts speed, analytics

SQL Server 2016 shines with stretch database to Azure, queries against Hadoop, internal R, better security, and higher performance

Review: SQL Server 2016 boosts speed, analytics
Thinkstock

Microsoft calls SQL Server 2016 the “biggest leap forward” in the 27-year evolution of the SQL Server database. As we’ll see, despite the excess of hype, the SQL Server 2016 database offers enterprises a number of attractive new capabilities, including built-in R analytics, querying of external Hadoop and Azure data stores, and neat management and data security features.

Further, SQL Server 2016 improves the in-memory OLTP engine, in-memory columnstore, and Azure cloud integrations introduced in SQL Server 2014. It also offers a reasonable staged upgrade for existing SQL Server installations. Whether all of these improvements -- and Microsoft’s offer of free database licenses to current Oracle Database users -- are sufficient to tempt these users to migrate to SQL Server is another question entirely.

Before I go over the new and improved features in more detail, I should mention there are a number of good ways to get your hands dirty with SQL Server 2016: Download and install a free Developer or Evaluation copy, or spin up a preconfigured Azure VM, and try out the various TechNet Virtual Labs.

Installation and upgrades

Installation of SQL Server 2016 locally requires some planning, along with a suitably updated copy of Windows 8, Windows 10, or Windows Server 2012. Even a completely current copy of Windows 10 needs to have a Visual Studio runtime update installed before SQL Server 2016, and if you want to use PolyBase (to run queries on data residing in Hadoop or Azure Blob storage) you must also install Oracle Java 7 or later. While a Linux version of SQL Server 2016 has been announced and both Red Hat and Canonical have pledged support, the Linux version is still in private preview.

sql server 2016 setup

SQL Server 2016 setup is very similar to the setup for previous versions. It has a few extra configuration steps and options -- for example, to set up Microsoft R and the PolyBase Query Service, which are new features.

The primary customers for SQL Server 2016 will be enterprises already running SQL Server 2014 who want to upgrade their production databases to take advantage of new features and improved performance, security, scalability, and so on. If you are one of these customers, you can install SQL Server 2016 in stages, presumably on a test or development database initially. Microsoft’s recommendation is to start with the database compatibility level unchanged, then raise it to 130 and test your production queries. If you need to fix regressions, you can lower the compatibility level temporarily while you do so; when you can run at compatibility level 130 without regressions, you should see improved query optimization, parallelized query plans, more aggressive automatic statistics updates, and other performance boosts.

Performance improvements

Microsoft makes big claims for SQL Server 2016 performance improvements. At a high level, it claims two high scores for TPC-E (OLTP) price/performance with 44 cores, one with a Fujitsu server and one with a Lenovo server. In addition, it claims high scores for TPC-H (analytics) in both performance and price/performance with 30TB and 1TB database sizes.

While the standardized TPC benchmarks are an improvement over the “benchmarketing” that was widespread in the database industry prior to their introduction, they are still subject to gaming by the vendors. TPC-E is quirky in that it specifies “pseudo-SQL” rather than specific SQL statements, and striking in that the only listed results are from various versions of Microsoft SQL Server.

TPC-H shows a larger variety of vendors; what I don’t understand from looking at the actual TPC-H listings is how Microsoft can claim victory in the 30TB and 1TB categories when SQL Server 2016 has been totally smoked by the clustered EXACluster OS 5.0 in both cases. Perhaps Microsoft meant to restrict its claims to nonclustered results but forgot to say so.

So color me unimpressed by the quoted TPC results. Microsoft makes additional speed-up claims for specific cases for SQL Server 2016 versus SQL Server 2014; some of these are very specialized, such as “19x faster Spatial native functions,” and others are more general, such as “Tableau 190 percent faster queries.” Whether SQL Server 2016 can outperform Amazon Aurora for Tableau queries is not a question that Microsoft is currently volunteering to answer.

The important performance questions for SQL Server 2014 customers to explore are whether their own database loads run faster or scale better in SQL Server 2016. Microsoft’s measurements and benchmarks may be suggestive, but they are ultimately less relevant than how your own database performs.

Another area of potential performance and scalability improvements has to do with the increased memory (12TB) and number of cores available to SQL Server 2016 running on Windows Server 2016. This will certainly matter for in-memory OLTP and in-memory operational analytics, but will require a significant expense for bigger servers with more RAM, more cores, and more SQL Server 2016 license packs.

Beyond the increase in maximum RAM, in-memory OLTP has been improved since its introduction in SQL Server 2014. According to Microsoft it boasts a rewritten engine, nonblocking optimistic concurrency control, and compilation of T-SQL to native code. Microsoft claims up to a 30-fold performance improvement over on-disk OLTP.

sql server 2016 in memory oltp

In-memory OLTP, introduced in SQL Server 2014, has been beefed up with a new engine, and compilation of T-SQL to native code.

Temporal tables and query stores

Most databases have two data stores: the database and the transaction log. Rolling back a database to some point in history using the transaction log is a patience-trying exercise.

SQL Server 2016 introduces in-database temporal history. This automatically tracks the history of data changes and enables easy querying of historical data states. Basically, this implements an ANSI 2011 facility by adding a history table for old data, a time period to the data table, and FOR SYSTEM_TIME clauses to the SELECT statement syntax.

Query stores keep track of query texts (plus all relevant properties), store all plan choices and performance metrics, and work across restarts, upgrades, and recompiles. A query store allows you to troubleshoot query performance easily, to compare query plans, and to force plan choices based on past performance.

Security improvements

Data security is more important than ever. I don’t even have to mention the HIPAA and PCI-DSS regulations and their penalties, given the serious, well-publicized breaches at the U.S. Office of Personnel Management and the “cheating” site Ashley Madison.

One of the big security improvements in SQL Server 2016 is called Always Encrypted. It’s quite clever: Sensitive data is encrypted in the database at the column level, and the unencrypted data is never seen outside the trust boundary.

sql server 2106 encryption

The diagram outlines one way that “Always Encrypted” data is safeguarded by SQL Server 2016. Note that the personally identifiable SSN is never sent outside the trust boundary.

SQL Server 2016 has two implementations of encryption: randomized and deterministic. The randomized form is more secure, but the deterministic form allows for equality comparison in WHERE, DISTINCT, and GROUP BY clauses and JOINs. Both randomized and deterministic encryption allow for transparent retrieval.

Another important security feature of SQL Server 2016 is dynamic data masking. This protects sensitive columns at query time by obfuscating them to nonprivileged users. Assume that the SocialSecurityNumber field of the Employee table has been masked:

ALTER TABLE [Employee] ALTER COLUMN [SocialSecurityNumber]
ADD MASKED WITH (FUNCTION = ‘SSN()’)

Then any user who has not been granted UNMASK privilege will see only partial Social Security numbers returned from a query: 123-45-6789 will be returned as XXX-XX-XX89.

Dynamic data masking applies to columns. There is an additional need for row-level security (RLS) in several scenarios, such as restriction of the data for a given patient to medical personnel treating that patient (for example, only nurses working on the patient’s floor and wing), or restriction of visible rows in a multitenant database to the records belonging to the current tenant. RLS is implemented using filter predicates that call inline table-valued functions. Inline functions are optimized so that they have performance comparable to that of views.

Operational analytics

Traditionally, the OLTP database server and the analytics database server were separate, and an ETL process periodically extracted new data from the OLTP server to update the OLAP cubes on the analytics server. That was expensive; it also introduced data latency and had too many moving parts for complete reliability.

The new real-time operational analytics mode in SQL Server 2016 uses a live nonclustered columnstore index that is updated directly from the OLTP database. Recent “tail” rows go into an in-memory OLTP table so that they can be queried without any columnstore index overhead. A background process task migrates rows from the tail to the columnstore in chunks of 1 million rows. Microsoft suggests combining columnstore analysis with availability groups for the best possible analytics performance and scaling.

Built-in R services

The R programming language is one of the favorites of data scientists, along with Python. A major issue encountered when running R programs such as regressions against large data sets has been the time required to bring the data to the R server or workstation. Typically, data scientists bypassed this issue by pulling a small random sample of the data into the R server for development and testing purposes, and they only ran the R scripts against the full data as a final step.

In SQL Server 2016, high-performance parallelized R services, built right into the database, can interact with T-SQL constructs, essentially bringing the analysis to the data instead of bringing the data to the analysis. This is not only much faster, but it’s also much more scalable and much less RAM-intensive. It’s fast enough that data scientists can develop and test their R scripts in the database. Microsoft also offers a standalone R server.

Hybrid cloud and big data

Microsoft likes to talk about “hyperscale” features having to do with combining SQL Server 2016 with the Azure cloud. The sexiest of those features is the stretch database. Basically, you create a link between your on-premises database instance and an Azure SQL Database by turning on the “remote data archive” feature in SQL Server 2016, providing your Azure admin credentials to SQL Server, and altering the databases you want to stretch to name the remote data archive.

Then, to control the migration, you optionally define a predicate function or functions to define which part of the data will be stretched (for example, completed orders in an order table). Finally, you turn on the remote data archive migration state for tables you want to stretch.

Other hyperscale features include high availability, database replicas, and enhanced backup to Azure. You can also migrate an on-premises SQL Server instance to SQL Azure completely automatically.

PolyBase is Microsoft’s term for a unified view of relational data in SQL Server 2016 and semi-structured data in HDFS (Hadoop) or Azure Blob storage. PolyBase clusters scale out with one head node and multiple compute nodes in roughly the same way that Hadoop clusters scale out with one name node and multiple data nodes. PolyBase currently supports Cloudera, Hortonworks Data Platform, and HDInsight.

If you need huge scale, you can configure an Azure SQL Data Warehouse or Azure Data Lake in the cloud or install a Microsoft Analytics Platform System appliance on-premises. In fact, you can tie all of that into a hybrid solution. Microsoft claims the system will scale to multiple petabytes of data, all using a common set of development and management tools.

Does your company need any of the new features of SQL Server 2016? If you’re running an older version of SQL Server, it’s fairly easy to test for compatibility and tangible benefits by setting up a test instance, either locally or in the Azure cloud, loading it up with your databases, and testing your applications against it. With that under your belt, you’ll either know not to bother or know there’s a good reason to undertake the upgrade.

Meanwhile, if you’re running Oracle Database, Microsoft is willing to give you the SQL Server 2016 licenses you’ll need to migrate and give you access to the company's migration tools. The last time I looked at that process, Microsoft had about 90 percent of the Oracle-to-SQL-Server migration automated, with good documentation for the manual steps. It was still a daunting task, however, especially if your application made heavy use of Oracle-specific features. Nevertheless, if Oracle-to-SQL-Server migration is already part of your road map, then this might be a good time to undertake the transition.

InfoWorld Scorecard
Management (25%)
Performance (25%)
Availability (20%)
Scalability (20%)
Value (10%)
Overall Score (100%)
Microsoft SQL Server 2016 8 9 9 9 9 8.8

Copyright © 2016 IDG Communications, Inc.

How to choose a low-code development platform