Parallel Programming with .NET

What’s new in Beta 1 for the Task Parallel Library? (Part 3/3)

FromAsync

In order to better to integrate the Asynchronous Programming Model (APM) with TPL, we added FromAsync to TaskFactory. Here’s an example for reading asynchronously from a FileStream:

FileStream fs = ...;

byte [] buffer = ...;

Task<int> t = Task<int>.Factory.FromAsync(

fs.BeginRead, fs.EndRead, buffer, 0, buffer.Length, null);

The overloads provided for FromAsync support all implementations of the APM pattern up to a certain number of parameters (which represent the vast majority of the APM implementations in the .NET Framework); additional overloads that work directly with IAsyncResults are also available to cover the stragglers.

FromAsync returns a Task that represents the asynchronous operation, and once you have that Task, you have all of the functionality available to normal Tasks. You can wait on them, schedule continuations off of them, and so on.

ContinueWhenAll and ContinueWhenAny

The ContinueWith method on Task enables a slew of powerful patterns and is a fundamental building block in many higher-level implementations. However, while being able to run a task when another completes is useful, it’s also useful to be able to run a task when any or all of a set of tasks completes. For these purposes, we added ContinueWhenAll and ContinueWhenAny to TaskFactory:

var tasks = new Queue<Task>();

for (int i = 0; i < N; i++)

{

tasks.Enqueue(Task.Factory.StartNew(() =>

{

...

}));

}

// Schedule a task to execute when all antecedent tasks complete.

var continuationOnAll = Task.Factory.ContinueWhenAll(

tasks.ToArray(), (Task[] completedTasks) =>

{

...

});

// Schedule a task to execute when one (any) antedent task completes.

var continuationOnAny = Task.Factory.ContinueWhenAny(

tasks.ToArray(), (Task completedTask) =>

{

...

});

TaskScheduler

In previous releases, you may have seen a TaskManager class. TaskManager is no more – replaced by the new TaskScheduler class. By default, tasks get scheduled on TaskScheduler.Default, which is an implementation based on new work-stealing queues in the .NET 4 ThreadPool (as mentioned in the first post). However, TaskScheduler is an abstract class, so developers can derive from it and create their own custom schedulers! The default scheduler should be sufficient in most cases, but a custom scheduler might be needed for some special scenarios (strict FIFO scheduling, special priorities, etc).

One of those special scenarios, scheduling onto the UI thread, is supported out-of-the-box. The TaskScheduler.FromCurrentSynchronizationContext method returns a scheduler that wraps the current synchronization context. Thus, if it is called on the GUI thread of a Windows Forms application, you’ll get back a TaskScheduler that marshals any queued tasks to the GUI thread. Here’s an example:

public void Button1_Click(…)

{

var ui = TaskScheduler.FromCurrentSynchronizationContext();

Task.Factory.StartNew(() =>

{

return LoadAndProcessImage(); // compute the image

}).ContinueWith(t =>

{

pictureBox1.Image = t.Result; // display it

}, ui);

}

In this set of posts, we talked about most of the major updates to TPL for Beta 1, including changes under the covers, renames of some of our core types, redesigns of some of our core functionalities, and the addition of brand new types. Some topics were left out such as debugger and profiler integration. Stay tuned for posts on those and thanks for reading!

Posted: Tuesday, April 14, 2009 10:33 PM by dashih | 15 Comments

What’s new in Beta 1 for the Task Parallel Library? (Part 2/3)

TaskFactory

There are two ways to create and schedule Tasks: use constructors and instance methods, or use static methods like StartNew. In previous releases, these two functionalities were jammed into the Task class, and we realized that separating them would result in a cleaner design. So we introduced the TaskFactory class, which contains all static methods that create and/or schedule Tasks. In Beta 1, these static methods include more than just StartNew, but those will be covered in the next post!

For convenience, Task now contains a static Factory property which returns a default instance of TaskFactory. So for many users, the biggest practical difference is that where you would have used Task.StartNew (or Task.Create in the June 2008 CTP and earlier), you now use:

Task.Factory.StartNew(() =>

{

...
});

An additional advantage of TaskFactory is the ability to consolidate the specifying of options such as creation options, continuation options, and which scheduler to use:

TaskFactory myFactory = new TaskFactory(

myScheduler, myCreationOptions, myContinuationOptions);

Task t0 = myFactory.StartNew(() => { });

Task t1 = myFactory.StartNew(() => { });

Task t2 = myFactory.StartNew(() => { });

Task<TResult>

In previous releases, Tasks that produced results were called Futures. Moving forward, we’ve opted to rename Future<T> to Task<TResult>, a type that derives from Task. There were a number of reasons, including:

· We constantly found ourselves describing Futures as “tasks that return results”.

· Many folks, that we spoke with, found the name Task<TResult> clearer.

· There is some discrepancy in literature and concurrency circles about exactly what’s implied by the term “future” (What functionality should or should not be exposed? Side-effect free?)

· The name Future has a not-related-to-Task implication. However, we wanted Task<TResult> to derive from Task so that the former could be treated polymorphically as the latter.

· We wanted TaskFactory methods to be able to return “tasks that return results”. For example, it would have been odd for Task.Factory.StartNew to return a Future.

TaskCompletionSource<TResult>

A Task<TResult> is typically used to asynchronously execute a delegate that computes and returns a result. Sometimes however, the asynchronous operation cannot be represented by a delegate, but rather is performed by an external entity.

Such functionality was originally supported on Future<T>, via a special constructor that did not take a delegate. A Future created this way could have its Value or Exception properties set using the respective setters. There were several problems with this approach, for example:

· The setters were only usable if the Future<T> was created with the special constructor.

· Commonly, a producer would want to hand out a Future<T> to consumers, but not allow the consumers to resolve it. However, in this scheme, anyone who had a reference to the Future<T> could resolve it.

We’ve addressed these issues by introducing a new TaskCompletionSource<TResult> type:

TaskCompletionSource<int> tcs = new TaskCompletionSource<int>();

Task<int> task = tcs.Task;

// Sometime later...

tcs.SetResult(computedResult); // Or tcs.SetException(exc), or tcs.SetCanceled()

// In a consumer elsewhere...

try { Console.WriteLine(task.Result); }

catch (AggregateException ae) { }

In this post, we covered the most fundamental redesigns to TPL tasks. “What’s new in Beta 1 for TPL (Part 3/3)” will cover all remaining changes, such as other helper methods on TaskFactory and the new TaskScheduler. Look forward to it!

Posted: Monday, April 06, 2009 9:23 PM by dashih | 8 Comments

Do you use LazyInitMode.AllowMultipleExecution?

In an effort to release simple, streamlined APIs, we spend a lot of time poring over every aspect of our types.

One of the types that we know is getting used a lot both internally and externally is LazyInit<T>.

One of LazyInit<T>’s constructors takes in a LazyInitMode enum which allows you to initialize a value in one of three modes:

EnsureSingleExecution – which ensures that if multiple threads attempt to initialize a LazyInit<T> concurrently, that only one of the initializer delegates will execute
AllowMultipleExecution – which allows multiple threads to execute the initializer delegate and race to set the value of the LazyInit<T>
ThreadLocal – which allows multiple threads to execute the initializer delegate and stores a local copy of the value for each thread

AllowMultipleExecution is motivated primarily by performance. If we allow the threads to race, we don’t need to take a lock and will never need to block any of the threads attempting to initialize the LazyInit<T>. Additionally, it’s theoretically useful if you have an operation in your initializer delegate that you don’t want to occur while under a lock.

The former motivation is validated by a quick and dirty perf test: AllowMultipleExecution typically performs 1-2x faster than EnsureSingleExecution for sufficiently small initializer delegates (longer running delegates typically see no improvement and also result in more wasted CPU time as the work that some of the threads produced will be discarded). While 2x is great, to see significant perf gains, you’d essentially need a lot (thousands?) of LazyInit<T> instances that could all potentially be initialized by multiple threads. Remember that this only affects the first time a LazyInit<T> is initialized, so calling Value many times would not affect performance.

While the latter motivation is important, we have few concrete scenarios.

On top of all this, the scenarios that might benefit from this mode are heavily limited. Only under a specific set of circumstances would you choose to use AllowMultipleExecution:

You’re sharing a LazyInit<T> instance between threads.
You are sure you won’t throw an exception in your initializer delegate (the exception semantics for this are very strange, e.g. if one thread fails and one succeeds which wins the race?).
Your initializer delegate doesn’t rely on some thread-local state that can result in different generated values.
The speed of your initializer delegate is just slow enough that taking a lock might block another thread for too long.
The speed of you intializer delegate is just fast enough that it won’t result in multiple CPUs wasting tons of cycles.

Given all the usage restrictions and the limited scenarios that would see a performance improvement, we’re unsure whether this mode is useful. Before we make any decisions on whether to keep it or remove it, we thought it best to reach out to you first and ask: are you using AllowMultipleExecution? If not, would you?

Posted: Friday, April 03, 2009 1:39 AM by phillips.joshua | 4 Comments

What’s new in Beta 1 for the Task Parallel Library? (Part 1/3)

Under the Covers

TPL now uses the .NET ThreadPool as its default scheduler. As part of this effort, the ThreadPool has undergone a number of significant functional improvements:

· Work-stealing queues were introduced internally to be used by TPL (see Daniel Moth's post on the new CLR 4 ThreadPool engine)

· Hill-climbing algorithms were introduced to quickly determine and adjust to the optimal number of threads for the current workload.

· Coordination and synchronization types such as SpinWait and SpinLock are now used internally.

Also, the whole of Parallel Extensions just emerged from a performance push. In TPL, this included work to decrease the overheads of loops and the launching of Tasks. We’re by no means done where performance is concerned, but you should notice improved performance for a variety of scenarios.

System.Threading.Parallel

The TPL feature crew spent many hours in design meetings, and this has resulted in quite a few changes for our Parallel APIs. Here are the most significant ones.

ParallelOptions

A common request/question we’ve gotten based on previous CTPs is the ability to limit the concurrency level of parallel loops. Folks would create a new TaskManager (specifying the number of processors and/or the number of threads per processor) just to achieve this scenario, and many were still unsuccessful. We now provide a better, more intuitive solution.

The new ParallelOptions class contains properties relevant to the APIs in System.Threading.Parallel. One of these properties is MaxDegreeOfParallelism, which does what it sounds like it does. The default value, -1, causes a Parallel API to attempt to use all available cores, but this can be overridden. For example, the following loop will run on no more than two cores regardless of how many exist in the machine:

var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };

Parallel.For(0, 1000, options, i=>

{

...

});

By consolidating options into the ParallelOptions class, we were able to eliminate quite a few existing overloads and prevent exploding the number of overloads when adding new options. Some new properties include a CancellationToken that will be monitored to determine whether a Parallel call should exit early, and a TaskScheduler that can be used to specify the scheduler on which to execute. Both of these options are explored more in later sections.

Thread-local State

In previous releases, we supported thread-local state via a ParallelState<TLocal> class. For example, to get the sum of 0-99:

int sum = 0;

Parallel.For(0, 100,

// Initialize all thread-local states to 0.

() => { return 0; },

// Accumulate the iteration count.

(int i, ParallelState<int> loopState) =>

{

loopState.ThreadLocalState += i;

},

// Accumulate all the final thread-local states.

(int finalThreadLocalState) =>

{

Interlocked.Add(ref sum, finalThreadLocalState);

});

In the above, loopState (a ParallelState<int> instance) stores each of the thread-local states in a property. However, loopState would also be used to prematurely break out of the loop (using the Break or Stop methods). For a cleaner design, we decided to separate these two functionalities by:

· Renaming ParallelState to ParallelLoopState (used to break out of loops prematurely, check if a loop has been stopped by another iteration, etc.)

· Removing ParallelState<TLocal> and baking thread-local state into the signatures of Parallel.For and ForEach overloads

To achieve the above scenario now, the body delegate would be:

// Accumulate the iteration count.

(int i, ParallelLoopState loopState, int threadLocalState) =>

{

return threadLocalState + i;

},

Note that the body delegate is now a Func that returns a TLocal – in this case, an int. Each iteration of the body is passed the current thread-local state and must return the possibly-updated state.

Tasks and Tokens

In the previous section, we saw that a CancellationToken may be used to cancel a Parallel call. This token structure is actually part of a new unified cancellation model that is intended for eventual use throughout the .NET Framework. As of Beta 1, it is supported by a few TPL APIs such as Wait:

Task t = ...

CancellationTokenSource tokenSource = new CancellationTokenSource();

CancellationToken token = tokenSource.Token;

try { t.Wait(token); }

catch (OperationCanceledException oce) { }

// Elsewhere...

tokenSource.Cancel();

The new cancellation model centers around two types: CancellationTokenSource and CancellationToken. A CancellationTokenSource is used to cancel its member CancellationToken (accessible via the Token property). A CancellationToken can only be used to check whether cancellation has been requested; separating the ability to cancel and the ability to check for cancellation requests is a key point in the new model. In the above example, the Wait operation is passed a CancellationToken, and the associated CancellationTokenSource is used to cancel the operation; note that it is the Wait operation that gets canceled, not the Task.

As mentioned in the “What’s new in CDS” post, cancellation merits a dedicated post, so look for that one soon.

Check back soon for “What’s new in Beta 1 for TPL (Part 2/3)”!

Posted: Friday, March 27, 2009 8:49 PM by dashih | 23 Comments

What’s new in Beta 1 for the coordination data structures?

We're currently working on the Beta of .NET 4.0 (no dates to announce) and there are lots o’ new stuff for the Parallel Extensions. We hope you’re as excited about it as we are. Given that we have so much coming down the pipes, we decided to roll out posts on what’s coming in digestible chunks. What’s better to start with than the low-level infrastructure? In addition to the coordination data structures, you can expect posts on changes in the Task Parallel Library (TPL) as well as Parallel LINQ (PLINQ) and a few miscellaneous topics.

New Types

Since our last CTP in September ‘08, we’ve added a few new types to add to your parallel-programming arsenal.

ConcurrentBag<T>
In short, ConcurrentBag<T> is a thread-safe, unordered collection of objects. Sounds a bit like a set, but unlike a set, ConcurrentBag<T> allows duplicates. So now you’re thinking, what’s so interesting about an unordered collection that allows duplicates? Truth be told, in a single-thread world, not very much. In a multi-threaded world, however, removing ordering restrictions allows us to do some pretty cool optimizations that makes for a really scalable and efficient type in certain situations.

You can think of ConcurrentBag<T> as a thread-safe linked list of thread-safe double-ended queues (deques). Each thread that touches the bag has it’s own deque to which it adds and removes items from the top. When any thread’s deque is empty, it pulls from the bottom of another thread’s non-empty deque. You can see why this is scalable and efficient: in many cases, there is no contention at all. Contention is only an issue when a thread does not have any items and is forced to “steal” items from another thread. It’s worth noting that the add-to-the-top-steal-from-the-bottom approach plays nice with the cache: threads get to work on items in nice contiguous chunks. In fact, this type of work-stealing queue works so well that it’s principally how the Thread Pool in .NET 4.0 load-balances work.

Just to make it clear, ConcurrentBag<T> isn’t always the fastest choice. Consider the single producer/single consumer scenario: the producer would always steal from the consumer’s deque – ConcurrentQueue<T> would be much more efficient. Scenarios in which threads both produce and consume values will benefit from ConcurrentBag<T> the most.

Figure 1. How threads access elements in a ConcurrentBag<T>.

ConcurrentLinkedList<T> and ConcurrentLinkedListNode<T>
ConcurrentLinkedList<T> is essentially a thread-safe version of LinkedList<T>. Of course there are some caveats, this is parallel programming we’re talking about. :) Just inserting a value into a linked list becomes difficult in a multi-threaded world. Say you want to add items to a list in sorted order. With single-threaded linked lists, this is a simple as walking the list and finding the position where the left node is less than the value you want to insert and the right node is greater than or equal to that value. In a concurrent environment, between the time you’ve found the location and actually added the value, another value could have been inserted in that place or one of the neighboring nodes removed. Thus, ConcurrentLinkedList<T> provides methods that support these tasks with atomic operations, like the TryInsertBetween method:

list.TryInsertBetween( (left, right) =>     ((left.Value < item) && (right.Value > item)), item);

TryInsertBetween will walk the list and insert the value only when the supplied predicate returns true. It does all this atomically. If it fails, it returns false without changing the list.

Partitioner<T> and OrderablePartitioner<T>
When TPL or PLINQ consume a data structure, they do their best to load balance the elements between threads. Of course, TPL and PLINQ don’t have detailed knowledge about the data sources, such as each structures internals, the values of the data, and the length of the data source – all of which are factors in execution time. The APIs simply see that a type implements IEnumerable<T> or IList<T> and start processing the elements based on enumeration or indexing. Often times, this means that significant performance gains are overlooked. Partitioners allow you to achieve these gains.

For example, say I’ve created a thread-safe collection that protects elements with striped-locking (where elements 0, 3, and 6 are protected by lock A, 1, 4, and 7 by lock B and 2, 5, and 8 protected by lock C). If we just use the standard enumerator and the pattern will look like: acquire lock A, acquire lock B, acquire lock C, acquire lock A, acquire lock B, and so on. This is hardly an efficient manner to partition out elements, especially if other threads are playing around with the data structure. It would be much more efficient if we acquired each lock just once. In a perfect world, if Parallel.ForEach decided that three threads wanted to enumerate this hypothetical data structure, we’d want to give all of the elements under lock A to one thread, the elements of lock B to another and the elements of lock C to the last, significantly reducing the amount of time spent acquiring locks and potentially waiting for locks to be given up.

We’ll be posting more detail on partitioners soon, but suffice it to say that Partitioner<T> allows you to create custom partitioning algorithms for any data source. OrderablePartitioner<T> does the same, but keeps account of each element’s original index so the data can be reordered when necessary. Each of these abstract classes supports both static partitioning as well as dynamic partitioning. In addition to these extension points, we’re providing a few out-of-the-box partitioners, but I’ll keep you in suspense – check back for more detailed posts on partitioners soon!

Removed Types

In addition to adding new types, we also decided to cut one out.

WriteOnce<T>
In short, WriteOnce<T> was never really more than a simplification on top of Interlocked.CompareExchange and that doesn't provide enough value at this point to warrant a new type in the .NET Framework. Farewell!

Refactors

There are lots of minor refactors and name changes coming in Beta 1 but there’s only one major change for the coordination data structures that we should call out: LazyInit<T>. I won’t list every name change an name space change but it’s important to see how LazyInit<T> has been refactored. Originally, LazyInit<T> was a struct which gave it all the nuances that come with a value-type. We’ve since refactored LazyInit<T> into four types:

System.Lazy<T>: a class-based lazy initialization primitive that’s safe to pass around.
System.LazyVariable<T>: a struct-based lazy initialization primitive that has the value-type semantics (namely that you run the risk of re-initialization if you copy it or pass it to a method), but remains light-weight for situations where memory footprint is important.
System.Threading.LazyInitializer: a set of static methods where memory footprint is really an issue.
System.Threading.ThreadLocal<T>: originally a mode on LazyInit<T>, this is now its own type.

In addition to the refactoring, Lazy<T> and LazyVariable<T> now have both thread-safe and non-thread-safe initialization modes for those that need the lazy initialization functionality without the overhead of synchronization.

Cancellation

In the Beta 1 release, you’ll find a few types for a unified cancellation model intended to be used throughout the .NET framework. Cancellation holds enough merit to warrant a dedicated post on it so we’ll be posting one soon. What’s exciting is that we’ve added cancellation support throughout TPL, PLINQ and the coordination data structures. For the data structures specifically, we’ve added cancellation overloads to any of our APIs that may block, including:

BlockingCollection<T>
- Add
- AddToAny
- GetConsumingEnumerable
- Take
- TakeFromAny
- TryAdd
- TryAddToAny
- TryTake
- TryTakeFromAny
Barrier.SignalAndWait
CountdownEvent.Wait
ManualResetEventSlim.Wait
SemaphoreSlim.Wait

Misc. Changes

Again, there are too many minor changes to call out but just be aware that some types have moved to different namespaces and might have some APIs that were renamed.

There are two API additions worth calling out, however:

SpinWait.SpinUntil
We realized that a lot of scenarios for SpinWait went something like this… Check a condition. If false, spin. Check a condition, if false, spin. Check a condition. If false spin… We decided to make it a tad easier by allowing you to pass a Func<Boolean> to SpinWait, causing it to spin until the condition was met or a timeout occurred.

ConcurrentStack<T>.PushRange and ConcurrentStack<T>.PopRange
We’re all about squeezing out that last drop of performance and where we can, we do. Take ConcurrentStack<T> for example, which works on a single CompareAndSwap (CAS) operation if there is no contention. Many scenarios involve pushing or popping a series of items on or off the stack which typically results in looping through a collection, calling Push() on each item. This of course, results in N CAS operations and potentially a lot more if the stack is being contended on by multiple threads. Since we only need to do one CAS operation to update ConcurrentStack<T> anyway, we might as well build up a stack segment first and then push the whole segment onto the stack with a single CAS - reducing the number of potential retry CAS operations. Enter PushRange and PopRange, which do exactly that.

Check back soon for what’s new in TPL and PLINQ!

Posted: Wednesday, March 04, 2009 2:27 AM by phillips.joshua | 9 Comments

Maestro has its own blog!

The response to our initial post on this blog about Maestro has inspired us to give it its own blog.

If you’re not yet in the know, Maestro is a new incubation language from Microsoft’s Parallel Computing Platform team for safe and scalable parallel programming for .NET.

If you have interest in Maestro or any topics related to isolation, agents, message-passing, safety, incubation, etc. we encourage you to check it out, subscribe to the feed, and tell your friends. They just might buy you better birthday presents for the tip. :)

http://blogs.msdn.com/maestroteam

Posted: Friday, February 27, 2009 9:02 AM by phillips.joshua | 0 Comments

Filed under: Agents, Isolation, Message Passing, Incubation

Getting random numbers in a thread-safe way

It’s very common in a parallel application to need random numbers for this or that operation. For situations where random numbers don’t need to be cryptographically-strong, the System.Random class is typically a fast-enough mechanism for generating values that are good-enough. However, effectively utilizing Random in a parallel app can be challenging in a high-performance manner.

There are several common mistakes I’ve seen developers make. The first is to assume that Random is thread-safe and is ok to be used concurrently from multiple threads. This is a bad idea, and can have some drastic consequences on the quality of the random numbers (degrading them well below the “good enough” bar). For an example of this, consider the following program:

using System;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        Random rand = new Random();

        Parallel.For(0, 1000000, (i, loop) =>
        {
            if (rand.Next() == 0) loop.Stop();
        });

        while (true)
        {
            Console.WriteLine(rand.Next());
            Console.ReadLine();
        }
    }
}

While it won’t happen every time you run the app, on a multi-core machine it’s likely that once the Parallel.For in this example exits, rand.Next() will always return 0. Not very random, is it? This is due to an implementation detail of Random that does not tolerate multithreaded access (it wasn’t designed for it, and the docs explicitly call out that Random should not be used from multiple threads, as they do for most types in the .NET Framework).

Another common mistake is to fix a problem like that in the previous example by instantiating a new Random instance each time a random number is needed. This can also affect the quality of the random numbers. When a seed isn’t explicitly provided to Random through its constructor, it internally uses Environment.TickCount (at least in the current release) as the seed. TickCount returns a value that’s heavily influenced by the resolution of the system timer, and thus multiple calls to TickCount in a row are likely to yield the same value (on the system on which I’m writing this post, TickCount changes in value approximately once every 15 milliseconds). This means that Random will suffer a similar fate, as is exemplified in the following program:

using System;

class Program
{
    static void Main(string[] args)
    {
        int lastNumber = 0, count = 0;
        while (true)
        {
            // Get the next number
            int curNumber = new Random().Next();

            // If it's the same as the previous number,
            // note the increased count
            if (curNumber == lastNumber) count++;

            // Otherwise, we have a different number than
            // the previous. Write out the previous number
            // and its count, then start
            // over with the new number.
            else
            {
                if (count > 0)
                    Console.WriteLine(
                        count + ": " + lastNumber);
                lastNumber = curNumber;
                count = 1;
            }
        }
    }
}

This program continually creates new Random instances and gets a random value from each, counting the number of “random” values in a row that are the same. When I run this app, I consistently see each number showing up thousands of times before the number changes; again, not very random. The point of this example is that, in a multithreaded context, you may be creating a Random instance for each iteration, but many iterations across threads are likely to end up with the same value.

There are several approaches then one could take in using Random from multiple threads to avoid these kinds of issues.

One approach is to use a lock. A shared Random instance is created, and every access to the Random instance is protected by a lock, e.g.

public static class RandomGen1
{
    private static Random _inst = new Random();

    public static int Next()
    {
        lock (_inst) return _inst.Next();
    }
}

This has the benefit of being easy to write, simple to explain, and obviously does the right thing. Unfortunately, there’s an expensive lock on every access, and the more threads that want random numbers, the more contention there will be for this lock. Sharing hurts.

Another approach is to use one Random instance per thread, rather than one per AppDomain (as is the case with the RandomGen1 approach shown above). Here’s one way that might be implemented:

public static class RandomGen2
{
    private static Random _global = new Random();
    [ThreadStatic]
    private static Random _local;

    public static int Next()
    {
        Random inst = _local;
        if (inst == null)
        {
            int seed;
            lock (_global) seed = _global.Next();
            _local = inst = new Random(seed);
        }
        return inst.Next();
    }
}

An AppDomain-wide Random instance is maintained in order to provide seeds for new Random instances created for any new threads that come along wanting random numbers. Each thread maintains its own Random instance in a ThreadStatic field, such that once initialized, calls to Next need only retrieve the ThreadStatic Random instance and use its Next method; no locks are necessary, and sharing is minimized. I tested this in a loop like the following:

Stopwatch sw;
const int NUM = 1000000;

while (true)
{
    sw = Stopwatch.StartNew();
    Parallel.For(0, NUM, i =>
    {
        RandomGen1.Next();
    });
    Console.WriteLine(sw.Elapsed);

    sw = Stopwatch.StartNew();
    Parallel.For(0, NUM, i =>
    {
        RandomGen2.Next();
    });
    Console.WriteLine(sw.Elapsed);

    Console.ReadLine();
}

On my dual-core laptop using .NET 3.5 and the June 2008 CTP of Parallel Extensions, the ThreadStatic approach ended up running twice as fast as the lock-based version. Moreover, as the number of cores increases, so too will the amount of contention on the lock in the lock-based approach, thus increasing the gap between the two approaches. Additionally, a lot of work has been done in .NET 4.0 to improve the performance of things such as accessing [ThreadStatic]. When I run this same code on current builds of .NET 4.0 (again on a dual-core), I see the ThreadStatic version running more than four times faster than the lock-based version. Even when there’s no contention on the lock in the RandomGen1 solution (simulated by switching the parallel loops to sequential loops), the ThreadStatic version in .NET 4.0 still runs significantly faster than the lock-based version.

Of course, if you really care about the quality of the random numbers, you should be using RNGCryptoServiceProvider, which generates cryptographically-strong random numbers (Addendum: davidacoder makes a good point in his comments on this post that while Random has certain statistical properties, using multiple Random instances as part of the same algorithm may change the statistical properties in unknown or undesirable ways). For a look at how to get a Random-based facade for RNGCryptoServiceProvider, see .NET Matters: Tales from the CryptoRandom in the September 2007 issue of MSDN Magazine. You could also settle on an intermediate solution, such as using an RNGCryptoServiceProvider to provide the seed values for the ThreadStatic Random instances in a solution like that in RandomGen2 (which would help to avoid another issue here, that of two threads starting with the same seed value due to accessing the global Random instance in the same time quantum):

public static class RandomGen3
{
    private static RNGCryptoServiceProvider _global =
        new RNGCryptoServiceProvider();
    [ThreadStatic]
    private static Random _local;

    public static int Next()
    {
        Random inst = _local;
        if (inst == null)
        {
            byte[] buffer = new byte[4];
            _global.GetBytes(buffer);
            _local = inst = new Random(
                BitConverter.ToInt32(buffer, 0));
        }
        return inst.Next();
    }
}

Posted: Thursday, February 19, 2009 6:40 AM by toub | 14 Comments

Filed under: Task Parallel Library, Parallelism Blockers, Code Samples, Parallel Extensions, .NET 4.0

“Session-in-a-box” on Parallel Programming with .NET 4.0

At PDC 2008 and TechEd EMEA 2008, Daniel Moth delivered several hit talks on parallel programming with the .NET Framework 4.0. The videos of both of those talks are available online, and he’s since created a series of blog posts capturing the content from those sessions in a way that should make it easy for others to re-present the content. Definitely worth a look: http://www.danielmoth.com/Blog/2009/02/give-session-on-parallel-programming-or.html.

Thanks, Daniel!

Posted: Wednesday, February 11, 2009 9:56 PM by toub | 0 Comments

Filed under: Talks, .NET 4.0

We haven’t forgotten about other models – honest! (Maestro)

This post has been moved to a Maestro-dedicated blog. Please direct all comments and questions to the new blog. Thanks!

Posted: Wednesday, February 11, 2009 8:24 PM by phillips.joshua | 22 Comments

Filed under: Agents, Isolation, Message Passing, Incubation

Parallel Extensions on Wikipedia

Did you know we’re on Wikipedia? We love Wikipedia; it’s a great resource for hungry minds and we’re elated to have an article dedicated to the Extensions. Unfortunately, our article is a bit out-of-date.

We could update it ourselves but we’d rather respect Wikipedia’s policies and avoid editing an article pertaining to our own project. Instead, we’re soliciting your help. The article was created in September of ’07 and hasn’t changed much since even though a lot has changed in the bits. Heck, even the name has changed! We’d love to have you contribute to the article by fleshing it out more, reflecting changes to the APIs (Parallel.Do is now Parallel.Invoke), refreshing facts (we’ve announced we’re going to be part of Visual Studio 2010 and the .NET Framework 4.0 and released a CTP) and updating external links (the .NET Framework 4.0 section of the .NET Framework page could use a refresh, for instance).

Also, for the really ambitious, we have a few other technologies that don’t even have pages or relevant sections at all. The Parallel Patterns Library, the Concurrency Runtime, the new Parallel Stacks and Parallel Tasks windows, and the new Concurrency Analysis option in the Performance Wizard of Visual Studio 2010 all need a fine Wikipedian home. Both the Parallel Computing page on MSDN and Channel 9 are great resources to get started, but we’d be happy to answer any questions you might have in your editorial endeavors, so let the comments begin!

Posted: Wednesday, January 14, 2009 5:29 PM by phillips.joshua | 9 Comments

Filed under: Parallel Extensions, Wikipedia

Announcing 10-4! Weekly video podcasts on Visual Studio 2010 and the .NET Framework 4.0

The Visual Studio and .NET Framework evangelism team have teamed up with Channel 9 on a new series of video podcasts. 10-4 will take a look at a different capability of Visual Studio 2010 and the .NET Framework 4.0 every week. Here's a note from that team:

You can be the first to get these episodes by visiting http://channel9.msdn.com/shows/10-4/ and subscribing to the RSS feed of your choice, depending on the media format you’re interested in. If you have a question or suggestion for a future email, please email 10-4@microsoft.com. We want to hear from you!

Episode 1: Downloading and Using the Visual Studio 2010 September CTP

For this first episode of 10-4, we’ll look at how to download and use the Virtual PC image of the Visual Studio 2010 September CTP. We’ll give you tips on how to download this massive (7GB+ compressed) VPC, show you how to get past some pesky expiration issues, and get you started with the CTP walkthroughs. Lastly we’ll cover where to get assistance and provide your feedback about this release.

In future episodes we’ll dive more deeply into the technical underpinnings of Visual Studio 2010 and the .NET Framework 4.0, but for this first episode we wanted to make sure everybody could get the CTP and follow along at home.

Posted: Wednesday, December 31, 2008 12:00 AM by essey | 2 Comments

Parallelism Videos Galore

It's been a hectic and exciting few weeks, and we on the Parallel Computing Platform team have been having a great time talking with customers all over the world, at the PDC, at TechEd EMEA, at DevConnections, through Channel 9, and more. A lot of the resulting material is now available online for viewing, so do check it out if you're interested.

PDC 2008 videos

Parallel Programming for Managed Developers with the Next Version of Microsoft Visual Studio

Parallel Programming for C++ Developers in the Next Version of Microsoft Visual Studio

Microsoft Visual Studio: Bringing out the Best in Multicore Systems

Concurrency Runtime Deep Dive: How to Harvest Multicore Computing Resources

Parallel Symposium: Addressing the Hard Problems with Concurrency

Parallel Symposium: Application Opportunities and Architectures

Parallel Symposium: Future of Parallel Computing

Research: Concurrency Analysis Platform and Tools for Finding Concurrency Bugs

The Concurrency and Coordination Runtime and Decentralized Software Services Toolkit

Channel 9: Visual Studio 2010 and .NET Framework 4.0 Week

Using the Parallel Extensions to the .NET Framework

Native Parallelism with the Parallel Patterns Library

Debugging Parallel Applications with Visual Studio 2010

TechEd EMEA 2008

Parallel Programming for Managed developers with Visual Studio 2010 and .NET Framework 4.0

Tech chat with Microsoft's parallel guru Steve Teixeira

Why we Need the Task-Based Programming Model Introduced in .NET 4

The Inexorable Drive to Many-Core Processors

Other (OnMicrosoft, University of Washington Colloquium, ...)

Concurrent Programming on Windows

Visual Studio 2010 Part 1 of 2- Support for Parallelism

Microsoft's Parallel Computing Platform: Applied Research in a Product Setting

And if you're more interested in the written word, check out this recent tech brief in Redmond Developer News:

Tech Brief: Parallel Extensions

All of this is, of course, in addition to all of the previous material that's been posted online, such as was mentioned at The Channel 9 videos are rolling in, Webcasts on Parallelism from France, Parallel Extensions Demo Fun on Channel 9, More Channel 9 Parallel Extensions Goodness, New PLINQ video on Channel 9, Task Parallel Library on Channel 9, Burton Smith on Channel 9, Parallel Extensions on .NET Rocks, Parallelism in October 2008 MSDN Magazine, and Another Parallel Extensions screencast.

Enjoy!

Posted: Tuesday, November 18, 2008 12:14 AM by toub | 4 Comments

Filed under: Parallel Extensions, Media, Talks, C++, Visual Studio 2010, .NET 4.0

.NET Framework 4.0 Poster for Download

Brad Abrams posted about a cool .NET Framework 4.0 poster which was distributed at the PDC last week and which you can download. Zoom in on the CORE section right in the middle for a glimpse into the parallelism support in .NET 4.0.

Posted: Monday, November 03, 2008 11:24 PM by toub | 3 Comments

Filed under: PLINQ, Task Parallel Library, Parallel Extensions, CDS, .NET 4.0

Using Hyper-V with the Visual Studio 2010 and .NET Framework 4.0 CTP

Last week, we posted about the availability of the Visual Studio 2010 and .NET Framework 4.0 CTP, which includes Parallel Extensions to the .NET Framework. This preview release is available as a Virtual PC (VPC) image. Unfortunately, a VPC image isn't great for showcasing parallel processing, as even on a machine with many cores, the guest OS will only have a single virtual core.

Over on his blog, Grant Holliday has put up a nice guide to converting the VPC image into a Hyper-V image. Why is this useful? Because the guest OS can see up to four virtual cores, which makes it an environment much better suited to trying out the parallel technologies included in Visual Studio 2010.

Note that as Grant states, "This is not an officially tested scenario. Things may or may not work. You’re on your own." Even so, it's nice to have options. Good luck!

Posted: Monday, November 03, 2008 10:41 PM by toub | 1 Comments

Filed under: Parallel Extensions, CTP, Visual Studio 2010

Visual Studio 2010 and .NET Framework 4.0 CTP now available!

The Visual Studio 2010 and .NET Framework 4.0 CTP is now available, featuring Parallel Extensions to .NET! Parallel Extensions has been introduced before, but what sets this CTP apart from the previous two is that it’s not a CTP of Parallel Extensions alone but of Visual Studio 2010 and .NET 4.0. And excitingly, the Task Parallel Library, PLINQ, and Coordination Data Structures have made their way deep into the heart of the Framework.

What’s New in the Task Parallel Library?

As with all of Parallel Extensions, the most important change for the Task Parallel Library is that it’s now part of the .NET Framework, and specifically it’s part of mscorlib.dll. This means that every .NET application has access to TPL, without needing to add any additional assembly references.

There are a variety of additional changes in this CTP, some more noticeable than others. One of the repeated pieces of feedback we received on our previous CTPs was that the Create factory method for initializing and scheduling tasks was lacking, both in that the name didn’t convey the full meaning of its functionality as well as that it didn’t allow tasks to be created separately from being scheduled. In this CTP, we’ve addressed that in two ways. First, we’ve renamed the Create static methods to StartNew, and second, we’ve added public constructors and an instance Start method.

Other API changes will also be visible. Task now exposes more information about its lifecycle through its Status property, which returns a TaskStatus that denotes whether the task has been created, scheduled for execution, is waiting for its child tasks to complete, has completed due to cancellation, and so on. TaskStatus includes three final states, one representing a successful completion, one representing a completion due to an unhandled exception, and one representing a completion due to cancellation. This latter status is also available through the IsCanceled property, which represents a change in semantics from the previous CTP, where IsCanceled represented whether the Cancel method had been called but at which point the task could still be executing; that information is now available through the IsCancellationRequested property.

One important change since the last CTP has to do with unhandled exceptions. The Task class needs to catch unhandled exceptions thrown out of a task’s delegate so that these exceptions can be marshaled out through a call to Wait. However, in doing so, these exceptions may go unnoticed. To address that issue, the Task Parallel Library now tracks whether unhandled exceptions have been observed, such as through a call to Wait on a Task that threw an unhandled exception. If a Task that completed due to an unhandled exception is garbage collected with its unhandled exception never having been observed, at that point the exception will be allowed to propagate, tearing down the application as would have happened had the exception been allowed to propagate when initially thrown.

A variety of implementation details have also changed in this release. For example, the system is now able to detect when tasks block on synchronization primitives and can respond by increasing the concurrency level in order to make forward progress.

You may also note that some new delegates have been introduced, namely System.Threading.Action2 and System.Threading.Func2. These are only temporary and are serving a purpose specific to this CTP release; these delegates are planned for removal in the future, replaced in usage by the existing System.Action and System.Func delegates.

What’s New in PLINQ?

PLINQ is now in System.Core.dll, together with its sequential counterpart LINQ-to-Objects.

AsSequential has been renamed to AsMerged to reflect the behavior of parallelism a little more clearly. In the system, AsParallel tells the system to partition data for parallelism and AsMerged tells PLINQ when to merge the partitions and transition to sequential LINQ-to-Objects.

There is also an overload of AsMerged that takes the System.Linq.ParallelMergeQueryOptions enumeration. This enumeration has options to hint at how elements should be yielded from PLINQ execution. FullyBuffered optimizes for whole query performance, by buffering the entire contents of the query before yielding. NotBuffered minimizes the time it takes for a produced element to be yielded to the consumer thread. AutoBuffered is the default, it buffers data and yields those in chunks to a consumer.

Additionally, some improvements have been made around reliability. This includes removal of deadlocks when exceptions are thrown from delegates.

What’s New in Coordination Data Structures?

In the June 2008 CTP, we introduced a set of coordination data structures that complement PLINQ and the Task Parallel Library. These data structures include a set of thread-safe and scalable collections as well as higher-level and specialized synchronization primitives for coordination in multithreaded applications and components. In this CTP, we’ve introduced a few changes.

The largest changes are the addition of System.Collection.Concurrent.ConcurrentDictionary<T> and System.Threading.Barrier. The former is a thread-safe and scalable hash map, while the latter is an abstracted synchronization pattern popular in parallel algorithms. In addition to these new types, we’ve augmented some existing APIs, such as provided additional constructors and methods to LazyInit<T> and updating the SpinLock type to support its Enter and Exit methods being used within constrained execution regions. Finally, this release sees a plethora of naming changes, including renaming the System.Threading.Collections namespace to System.Collections.Concurrent and renaming the IConcurrentCollection<T> interface to IProducerConsumerCollection<T>.

Posted: Friday, October 31, 2008 6:22 PM by dashih | 14 Comments