Sunday, 1 May 2011

Parallel Programming using Microsoft .NET Framework 4.0

Wikipedia defines Parallel Computing as a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel"). In the past concurrency was virtually achieved by time slicing the processor, i.e., OS would rapidly switch between running programs after a fixed interval called time slice. That would enable the OS to execute multiple programs simultaneously. These days most computers have more than one cores/CPUs that enable multiple threads to execute simultaneously. Using these cores you can parallelize your code to distribute work across multiple processors.

Microsoft has introduced a new set of libraries, diagnostic tools and  runtime in .NET Framework 4.0 to enhance support for parallel computing. The main objective of these features is to simplify parallel development, i.e., writing parallel code in a natural idiom without having to work directly with threads. These include Task Parallel Library (TPL), Parallel LINQ, and new data structures.

Task Parallel Library

Task Parallel Library (TPL) is a set of types and APIs that simplifies adding parallelism and concurrency to the applications. It handles the partitioning of the work, scheduling of the threads on the ThreadPool, cancellation support, state management and other low level details. TPL introduces the concept of Data Parallelism, scenarios in which the same operation is performed concurrently on elements in a source collection or array. Parallel.For and Parallel.ForEach methods in the System.Threading.Tasks namespace are used for this purpose. For example, the following statement is used to concurrently process the items in a source collection;
Parallel.ForEach(sourceCollection, item => Process(item));
Both these methods have several overloads to let you stop or break loop execution, monitor the state of the loop on other threads, maintain thread-local state, finalize thread-local objects, control the degree of concurrency, and so on.

TPL provides other methods and data types to implicitly or explicitly executes tasks concurrently. The Parallel.Invoke method is used to execute any number of arbitrary statements concurrently. The method accepts variable no. of Action delegates as argument and executes these concurrently. The easiest way to create the Action delegates is to use lambda expressions. For example;
Parallel.Invoke(() => DoSomeWork(), () => DoSomeOtherWork());
If a greater control over task execution is required, or you need to return a value from the task TPL includes System.Threading.Tasks.Task and System.Threading.Tasks.Task<TResult> classes this purpose. The Task object handles the infrastructure details and provides methods/properties for controlling its execution and observing its status. For example, the Status property of a Task determines whether a task has started running, ran to completion, was cancelled, or has thrown an exception. The Task object accepts a delegate (named, anonymous or a lambda expression) as argument at initialization time. Calling the method Start on the Task object starts execution of the provided delegate.
// Create a task and provide a user delegate.
var task = new Task(() => Console.WriteLine("Hello from task."));

// Start the task.
taskA.Start();
The Task object contains a static property Factory that returns an object of the TaskFactory class that provides support for creating and scheduling Task objects.
// Create and start the task in one operation.
var taskA = Task.Factory.StartNew(() => Console.WriteLine("Hello from taskA."));
The Task class provides many other options to control the execution of operations assigned to the task. The Task.ContinueWith method let you specify a task to be started when the antecedent task completes. The user code executed using a Task can create nested detached and child Tasks. The child tasks are created when the TaskCreationOptions.AttachedToParent is specified while creating the Task. In case of child tasks, the parent task implicitly waits for all child tasks to complete.The Task class provides support to explicitly wait for a single or an array of tasks. The Task.Wait method let you wait for a task to complete. The Task.WaitAny and Task.WaitAll methods let you wait for any or all tasks in an array to complete. When a Task throws one or more exceptions, these are wrapped in an AggregateException and is propagated back to the joining thread. Also the Task class supports cooperative cancellation. The Task class takes a cancellation token as argument at initialization and user can issue cancellation request at some later time.

Parallel LINQ

Parallel LINQ is the parallel implementation of LINQ to objects. PLINQ implements the full set of LINQ standard query operators as extension methods and defines additional operators for parallel operations. PLINQ queries operate on any in-memory IEnumerable/IEnumerable<T> data source and have deferred execution. It partitions the data source into segments, and then executes the query on each segment on separate worker threads in parallel. The System.Linq.ParallelEnumerable exposes PLINQ's functionality and implements all the standard LINQ operators in addition with operators specific to parallel execution.

A query is executed in parallel when user calls the AsParallel extension method. All the subsequent operations  are bound to the ParallelEnumerable implementation.
var source = Enumerable.Range(1, 10000);

// Opt-in to PLINQ with AsParallel
var evenNums = from num in source.AsParallel()
               where Compute(num) > 0
               select num;
PLINQ infrastructure analyzes the overall structure of a query at runtime and executes the query in parallel or sequentially based on the analysis. You can use the WithExecutionMode<TSource> and ParallelExecutionMode enumeration to enforce the PLINQ to use the parallel algorithm. Although PLINQ query operators revert to sequential mode automatically when required, for user-defined query operators the AsSequential operator can be used to revert back to sequential mode

In order to preserve the order of the source sequence in the results PLINQ provides the AsOrdered extension method for this purpose. The sequence is still processed in parallel but the results are buffered to maintain the order. This extra maintenance causes an ordered query to be slow compared to the default AsUnordered<TSource> sequence.
evenNums = from num in numbers.AsParallel().AsOrdered()
           where num % 2 == 0
           select num;
PLINQ queries use deferred execution and the operations are not executed until query is enumerated using a foreach. However, foreach itself does not run in parallel and requires the output from all parallel tasks be merged back into the thread on which the loop is running. For faster query execution when order preservation is not required, PLINQ provides the ForAll operator to parallelize processing of the results from query.
var nums = Enumerable.Range(10, 10000);

var query = from num in nums.AsParallel()
            where num % 10 == 0
            select num;

// Process the results as each thread completes
// and add them to a System.Collections.Concurrent.ConcurrentBag(Of Int)
// which can safely accept concurrent add operations
query.ForAll((e) => concurrentBag.Add(Compute(e)));
PLINQ queries also support cancellation as is supported by Task. The WithCancellation operator accepts the cancellation token instance as argument. When the IsCancellationRequested property is set on the token, PLINQ will stop processing on all threads and throw an OperationCancelledException.

PLINQ wraps the exceptions thrown by multiple threads while executing a query into an AggregateException type and marshals the exception back to the calling thread. Only one try-catch block is required on the calling thread.

Data Structures for Parallel Programming

The .NET Framework 4.0 introduces several new types that are useful in parallel programming including a set of concurrent collection classes, lightweight synchronization primitives, and types for lazy initialization.

The collection types in the System.Collections.Concurrent namespace provide thread-safe add and remove operations that avoid locks wherever possible and use fine-grained locking where locks are necessary. These include BlockingCollection<T>, ConcurrentBag<T>, ConcurrentDictionary<TKey, TValue>, ConcurrentQueue<T>, and ConcurrentStack<T>. Each of these collection types as compared to the types in System.Collections.Generic namespace provides the thread-safety while performing related operations,e.g., multiple threads can add/remove items from a ConcurrentBag.

In addition with concurrent collections, Microsoft has introduced a new set of fine-grained and performance efficient synchronization primitives in the .NET Framework 4.0. Some of the new types have no counterparts in earlier versions of .NET Framework. These types are defined in System.Threading namespace and include Barrier, CountdownEvent, ManualResetEventSlim, SemaphoreSlim, SpinLock and SpinWait.

The .NET Framework 4.0 also includes classes for initializing objects lazily, i.e., the memory for an object is not allocated until it is needed. Lazy initialization can improve performance by spreading object allocations evenly across the lifetime of a program. The System.Lazy<T>, System.Threading.ThreadLocal<T> and System.Threading.LazyInitializer are the classes used for this purpose.

Conclusion

The Task Parallel Library and Parallel LINQ uses System.Threading.ThreadPool for executing the operations concurrently. The thread pool allocates a pool of threads at the start of an application and manages the threads in a very efficient and intelligent manner. Still it has its own limitations which can affect the choice between a dedicated thread instead of a thread from the thread pool. When a foreground thread is required in an application, all the threads allocated in the pool are marked as background threads. Foreground threads have a priority over background threads and a few CPU intensive foreground threads can starve the background threads for their share of the processor which might result in unexpected performance degradation. Therefore one should consider the overall structure and working of the application while making use of TPL and PLINQ.

1 comment: