Threads Considered Harmful

One big "feature" of XPLC that will either attract or deter some people is our stance on threading: I do not smoke this.

Okay, let's be clearer: preemptive threading is mostly out of the picture, the possible exceptions being operations that cannot be done in a non-blocking manner otherwise (like using gethostbyname() on Unix) and multiple CPU systems (we do want to use all of the CPUs, so the plan is something like N or N-1 threads, where N is the number of CPUs in the system).

The reasoning is that when it isn't deadlocks, race conditions, resource contention between threads, it's context switching making things that should be fast go slow (by interrupting right in the middle of your nice hand-optimized assembler tight blitting loop, wrecking the cache and do some other stuff that will probably make you miss the video refresh and look even slower than it actually is (and God knows look is everything these days).

This is quite expensive a price to pay when all you really get is the illusion that multiple things happen at once! Careful programming will give you that same illusion, with a much lesser impact on performance, since it leaves you in control of the scheduling, letting you wait until after your blitting is done.

Some people will object that the "careful" part will make programming much harder than what it will give back in performance, and this looks true at first look. But what do you know about first looks? They are often wrong. Threads are tricky to get right. There are traps at every corners and believe me, you will fall into them. You will think your debugger smokes crack. You will cry for your mother.

Threads can be faster in many cases, but it will definitely take an expert to make them do so, and statistics say I can bet you are not and still make a ton of money. Those who will owe me money will find themselves putting in too many locks or too coarsely grained locking will see their performance go right down the drain.

All of that said, there is hope! Proud users of POSIXy systems will find GNU Pth (Portable Threads), a cooperative multithreading library, a very useful addition to their arsenal. I was intending to use that library, but (ironically) it has a portability problem, being quite dependent on POSIX features for its portability. The current development version has started support for a Win32 version, but it is dependent on the Cygwin compatibility POSIX layer rather than a native version. MacOS, BeOS and other operating systems would also be left out in the cold anyway. But other similar cooperative multithreading libraries exists, this is just one example!

I found the slides for a talk that John Ousterhout (the creator of Tcl) made at the 1996 USENIX Technical Conference.

The State Threads library is another non-preemptive thread library that is specifically oriented toward I/O.

Dan Kegel has a web page about designing a web server that could handle tens of thousands of clients. Among other things, threading is discussed, and it contains a large number of links to relevant material and papers.

This interview of Ingo Molnar at slashdot has a very interesting answer about this at question #6:

Question:

Unix programmers seems to dislike using threads in their applications. After all, they can just fork(); and run along instead of using the thread functions. But, that's not important right now.

What is your opinion on the current thread implementation in the Linux kernel compared to systems designed from the ground up to support threads (like BeOS, OS/2 and Windows NT)? In which way could the kernel developers make the threads work better?"

Ingo Molnar's Answer:

Thats a misconception. The Linux kernel is *fundamentally* 'threaded'. Within the Linux kernel there are only threads. Full stop. Threads either share or do not share various system resources like VM (ie. page tables) or files. If a thread has 'all-private' resources then it behaves like a process. If a thread has shared resources (eg. shares files and page tables) then it's a 'thread'. Some OSs have a rigid distinction between threads and processes - Linux is more flexible, eg. you can have two threads that share all files but have private page-tables. Or you can have threads that have the same page-tables but do not share files. Within the kernel i couldnt even make a distinction between 'processes' and 'threads', because everything is a thread to the kernel.

This means that in Linux every system-call is 'thread-safe', from grounds up. You program 'threads' the same way as 'processes'. There are some popular shared-VM thread APIs, and Linux implements the pthreads API - which btw. is a user-space wrapper exposing already existing kernel-provided APIs. Just to show that the Linux kernel has only one notion for 'context of execution': under Linux the context-switch time between two 'threads' and two 'processes' is all the same: around 2 microseconds on a 500MHz PIII.

Programming 'with threads' (ie.: with Linux threads that share page tables) is fundamentally more error-prone that coding isolated threads (ie. processes). This is why you see all those lazy Linux programmers using processes (ie. isolated threads) - if there is no need to share too much state, why go the error-prone path? Under Linux processes scale just as fine on SMP as threads.

The only area where 'all-shared-VM threads' are needed is where there is massive and complex interaction between threads. 98% of the programming tasks are not such. Additionally, on SMP systems threads are *fundamentally slower*, because there has to be (inevitable, hardware-mandated) synchronization between CPUs if shared VM is used.

This whole threading issue i believe comes from the fact that it's so hard and slow to program isolated threads (processes) under NT (NT processes are painfully slow to be created for example) - so all programming tasks which are performance-sensitive are forced to use all-shared-VM threads. Then this technological disadvantage of NT is spinned into a magical 'using threads is better' mantra. IMHO it's a fundamentally bad (and rude) thing to force some stupid all-shared-VM concept on all multi-context programming tasks.

For example, the submitted SPECweb99 TUX results were done in a setup where every CPU was running an isolated thread. Windows 2000 will never be able to do stuff like this without redesigning their whole OS, because processes are just so much fscked up there, and all the APIs (and programming tools) have this stupid bias towards all-shared-VM threads.


Pierre Phaneuf <pp@ludusdesign.com>
Last modified: Sun Nov 12 18:49:11 EST 2000