How to work with Parallel LINQ in C#

Take advantage of Parallel LINQ to implement declarative data parallelism in your applications by leveraging the multiple cores in your system

Parallel LINQ

Parallel LINQ

Language Integrated Query, also known as LINQ, is a query execution pipeline that adds query capabilities to languages targeted at the managed environment of .Net. Parallel LINQ, or PLINQ, is a query execution engine that runs on top of the managed environment of .Net and takes advantage of the multiple processors or cores in your computer system to execute the queries in parallel. In other words, it enables you to optimize your queries by splitting them into parts so as to execute these parts in parallel and hence boost the query performance.

PLINQ is an extension to LINQ and was introduced as part of .Net Framework 4. It is a query execution engine from Microsoft and is a part of the Parallel Extensions Library. The Parallel Extensions Library is in turn comprised of the TPL (Task Parallel Library) and PLINQ. Microsoft has provided support for parallel programming in .Net Framework to leverage the benefits of multi core systems. To take advantage of the parallel programming capabilities, a new class called Parallel was introduced in .Net Framework 4.

PLINQ is a good choice in compute-bound operations. But, what is it all about and what are the problems that it can solve? Is it appropriate to use it in lieu of LINQ whenever we need to query data? We would discuss all of these in a moment but let’s first understand how PLINQ works behind the scenes. PLINQ works by partitioning the source of data or the input into chunks which in turn is executed by different threads.

A bit of code now

Consider the following LINQ query.

var data = from e in employees

           where e.FirstName.StartsWith("J")

           select e;

You can convert the above query easily to a PLINQ query by using the AsParallel<TSource> extension method. Note that AsParallel is an extension method of the System.Linq.ParallelEnumerable class.

var data = from e in employees.AsParallel()

           where e.FirstName.StartsWith("J")

           select e;

If you want to preserve the order of the query result, you can take advantage of the AsOrdered method.

var data = from e in employees.AsParallel().AsOrdered()

           where e.FirstName.StartsWith("J")

           select e;

You can also preserve the order of the data that is returned as a result of execution of the PLINQ query by passing QueryOptions.PreserveOrdering as a parameter to the AsParallel method.

var data = from e in employees.AsParallel(QueryOptions.PreserveOrdering)

           where e.FirstName.StartsWith("J")

           select e;

Note that using the AsParallel() method is not advisable on small collections -- it would rather run slower compared to a normal query. What if you want to force parallelism? This isn't recommended though but you can leverage the WithExecutionMode extension method to achieve this. Here's an example that illustrates this.

var data = from e in employees.AsParallel().WithExecutionMode

                (ParallelExecutionMode.ForceParallelism)

           where e.FirstName.StartsWith("J")

           select e;

Note that ParallelExecutionMode is an enumeration that is available as part of the System.Linq namespace and can have one of these values: Default and ForceParallelism. If you specify Default as a parameter to the WithExecutionMode extension method, PLINQ will execute the query in parallel if an improvement in performance is evident in executing the query in parallel. If not, PLINQ would execute the query just like a LINQ query. On the contrary, if you specify ForeParallelism as a parameter to the WithExecutionMode extension method, PLINQ would execute the query in parallel even if that migh incur a performance penalty.

How do I limit the degree of parallelism?

You should also be aware of another related concept: degree of parallelism. This is an unsigned integer number that denotes the maximum number of processors that your PLINQ query should take advantage of while it is in execution. In other words, degree of parallelism is an integer that denotes the maximum number of tasks that would be executed concurrently to process a query.

Incidentally, the default value of degree of parallelism is 64 which implies that PLINQ can leverage a maximum of 64 processors in your system. Here's how you can limit the degree of parallelism in PLINQ to two processors in your system.

var data = from e in employees.AsParallel().WithDegreeOfParallelism(2)

           where e.FirstName.StartsWith("J")

           select e;

Note how the number of processors have been passed as an argument to WithDegreeofParallelism method. You should specify a higher value for the degree of parallelism for performance gains if your query performs more non-compute bound, i.e., non-CPU bound work.

I highly recommend reading the document "Patterns of Parallel Programming" by Stephen Toub. It provides an in-depth discussion on the parallel programming patterns in .Net.

This article is published as part of the IDG Contributor Network. Want to Join?

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.