Clojure: STMs vs Locks
May 28, 2008

I've been participating in this fascinating discussion with Rich Hickey, the author of Clojure about Software Transactional Memory.  I decided to cleanup and echo the conversation here, but the original can be found here.


The Problem Statement: it's not "atomic-vs-lock" but instead it's "where do we put atomic/lock?"

Randall R Schulz

  I found the following comment on an Azul Systems blog (http://blogs.azulsystems.com/cliff/2008/05/javaone.html) :

 

Cliff Click wrote:   

    Performance in an STM is not transparent: it requires a runtime to do abort/retry, and as the runtimes get fancy & complex the behavior under load of STM's gets.... weird and unpredictable.  Not something you can put into a production environment.   

  Any comments or counter-examples to this?

Raoul Duke

Rich Hickey, richhic...@gmail.com

wrote:

  Are explicit locking designs free from such anomalies? I wouldn't think so.

paraphrase: the behaviour under load gets weird and tricky.

well, in some ways maybe that doesn't happen with locks: the question of (depending on which STM system we're looking at) figuring out what caused all the rollbacks vs. with locks, you can at least generally quickly see that a given lock has 90% of all threads waiting on it kind of thing. whereas you don't know what variable in the venn diagram of overlapping transactions caused the retry?

tho i believe Rich previously said that Clojure would actually have a way of finding that out!
http://groups.google.com/group/clojure/browse_thread/thread/2f73b80fd...

Cliff Click

You got it - under load, performance is unpredictable.

With locks, you can see which locks are loaded, generally figure out why, and plan a strategy (stripe the lock, shorter hold time, switch to the j.u.concurrent datastructures, etc).

Not so (at least not yet) with STM.  I've talked to a few people who've tried STM out in a larger scale than the usual academic papers - and the results are downright disappointing.  Unless you become an expert in the STM runtime you're using (where "expert" means "not just grok it, but can build & tweak it") anytime performance is not good enough - you get stuck.  This is an open research problem at best, with no clear indication that the problem can be solved at all.

Rich Hickey

One could as generically argue that systems with manual locking are usually broken, and therefore their behavior itself, never mind their performance, is unpredictable.  That's been my experience with manual locking in the hands of mere mortal developers.  It can be as difficult to understand the behavior of any single manually-concurrent program as to understand the performance characteristics of an STM.  In the latter case, at least the STM is handling correctness for you, and all users of the same STM can share knowledge (and bug fixes and performance improvements), vs each application being its own Gordian knot.  And in any case in which the STM is failing you performance-wise, you can opt out and use manual locks outside of transactions.  To the extent you can use it, STM can drastically reduce the complexity of a system.

I'm wary of unifying the discussion of concurrency with one about performance, as the problems of concurrency start with correctness, where the manual locking story, due to its complexity, is quite bad.  Scalability is not a universal problem - some systems need to handle thousands of simultaneous connections - most don't, some need to leverage hundreds of CPUs in a single application - most don't.  But every application that wants to leverage multiple cores needs to be correct.

It would be no problem to instrument the Clojure STM references with fault counters that would allow one to know the exact per-reference contention level. Once known, the answers in both cases are similar - share less mutable state, don't have both long and short-lived transactions contend for the same data, check your success conditions ASAP etc.

STM designs differ from one another quite a bit, so any general statements are difficult to assess.  I think the level of granularity of the STM comes into play.  Most of the STM papers I've read concentrate on creating lots of transactional cells with which they envision constructing concurrent data structures like the ones in java.util.concurrent.  I don't think that's a good idea at all.

Clojure encourages the use of immutable persistent data structures, which need no locking or coordination whatsoever in order to perform correctly in a concurrent situation, and STM references to support the appearance of identity-preserving change.  It also encourages the use of java.util.concurrent and the like for caches and work queues - the best tools for the job.

As far as I know, Clojure is the first STM that uses snapshot MVCC, avoiding both read logging and transaction restarts due to read invalidations.  Clojure's STM also gives you the ability to 'ensure' a reference you later intend to read or write, thus providing some manual control over resource acquisition order (rather than just relying on the side effect of access).  It also supports explicit commute, further reducing retries for commutative writes.  I'll not claim its performance characteristics are at all proven, but its contention due to, and for, reading is lower than in programs that use manual locking, and lower than in STMs in which write transactions can cause read transactions to retry.

Once you get to record-level STM granularity, like Clojure's, it's also a bit easier to draw parallels to the performance characteristics of the database systems it mimics, which have been doing similar things for decades.

I don't consider STM performance any more or less of a research problem than determining the correctness of programs that do manual locking - there's still work to do.

And of course, neither STM, nor any other mechanism, is a panacea.

Brett Morgan

Given that I am a mere developer, actually with MT i'd consider myself a rank newbie, the most important thing to me is visibility.  I want to be able to see what problems I am creating with my design, so that I can change my design, and see how the problems change.  In effect, I am looking for an environment that teaches me what I am doing wrong, if only by showing me which references are hotly contended.

In keeping with that, it would probably be helpful for both refs and transactions to be namable, such that debugging output from the STM runtime can actually be easily inspected.  The reason I say this is that the data I need to work over is non uniform, so I need to be able to label the hot reference to know which particular chunks of my data set are causing issues.

thoughts?

Cliff Click

The problem with manual locking - and it applies equally well to transactions - is that the there's no name-space-control over what needs to be locked/transacted.  The failure people have with locks (and limited experience with transactions) is that they can't properly name their shared variables.  They think they can, until we start looking hard, and then we keep finding more... and more that need to be locked/transacted atomically.  In short, people don't know where to put the lock/unlock or where to bound the transactional region.

Having said that, locks have some advantages over transactions: transparent performance being one of them.  Programs are busted if the locks are wrong (or transaction regions don't cover all the variables they need to; Clojure isn't immune here) AND they are busted if performance sucks.  For 2-4 way machines the issue is probably moot; for 8-32 way machines (standard servers these days) - it matters.  In a year every el-cheapo linux blade server out there will be looking at 16-64 cpus.  And of course for Azul, 100+ active cpus is the norm.  I claim that in a year, most people will either care (get bit by lack of scaling), or see that their need to care within another year.

Requiring shared variables to have a special syntax, ala Ref, is a big step forward here for both locks & transactions.  You're reducing the complexity of the problem via name-space control (only Ref's are shared) and promoting immutable shared state.  The STM issue is orthogonal.

As evidence of that, suppose you replace your STM implementation with a trivial single global lock.  Are the programs more or less complex than their locking counterparts?  The program is as-correct and as-complex as before (before you swap the STM implementation), and indeed is exactly equal to the locking solution using a single global lock.  Suppose I can get a correct program using STM - then I also have a correct program using a single global lock.  STM only varies from locking as I try to make the implementation more performant: for the STM I trust the language implementor to "stripe" the locks under the  hood.  For locks, I start using more than one lock.  In both cases I can also try to shrink the lock-hold-time (or reduce the size of the STM region), or switch algorithms.

Summary: STM doesn't solve the Big Problem (lack of name-space control).  Ref's might solve the Big Problem - and solving the Big Problem will be a big help to both locks & STM.  Locks have transparent performance and are thus easy to optimize.  Optimizing without good name-space control is a buggy process for both locks & STM.  I don't know how to optimize an STM implementation without becoming an expert in the particulars of the STM runtime (and there are so many to choose from!)

Rich Hickey

I agree that identifying the things that can change is one key issue, and Clojure's Refs, Agents, and Vars all do that.  But I think a large number of your arguments reflect a lack of understanding of how (at least Clojure's) STM works.

Refs are not merely markers, they do enforcement.  There is simply no mutating a ref outside of a transaction - it's not an allowed mistake.  Any attempt to change a ref outside a transaction throws an exception.  So, there is no such thing as "transaction regions don't cover all the variables they need to" in Clojure.

Your thought exercise with a single lock is a good one.  Let's say Clojure's implementation of STM used a single global lock (in fact it has no global locks at all), and contrast it with a manual locking program that used a single lock for everything.

Which is more correct?  The Clojure one, because it is not subject to undetected 'forgetting to obtain the lock'.  Your assertion that "Suppose I can get a correct program using STM - then I also have a correct program using a single global lock" implies nothing about programs written with a single manual lock where no STM system is helping ensure correctness.  All Clojure STM programs (that don't throw Ref mutation out-of-transaction exceptions) are correct with a global lock, but not all programs using a global lock (that don't throw exceptions) are correct STM (or otherwise) programs.

Which one is has better performance? Constant factors aside, potentially the Clojure one, because readers do not wait for writers in MVCC.

As we add locks, performance of both approaches improves, but a new source of errors comes into play for the manual approach - lock acquisition order and the resultant deadlock risk.  At this point STM becomes dramatically less complex than manual locking.

I'll grant that it is likely that, in the hands of experts, it is possible to write a correct program with a custom crafted locking strategy that will outperform any general-purpose STM strategy.  But I still contend that, for any moderately complex program, that strategy will approach the complexity of an STM, and that most people will get it wrong in subtle, intractable ways.

Analogies to databases are telling.  DBMS's took over locking long ago and most people were happy to have them do so.  Ditto memory management and GC.  In each case, people claimed superior control and performance in the manual solution, but the cost and complexity proved too high for most applications.

So, mutable entity identification is a problem, but it is a subproblem of correctness enforcement.  It is likely that, given well-identified sharable entities, some form of static correctness analysis might be brought to bear on the manual locking case, and I'm sure people are working on that.  STMs can provide correctness enforcement today, and therefore the two are not equivalent.  Any opacity in STM operation or performance is an area for improvement, not necessarily a reason for avoidance.

Cliff Click

On May 24, 10:51 am, Rich Hickey

wrote:

I agree that identifying the things that can change is one key issue, and Clojure's Refs, Agents, and Vars all do that.  But I think a large number of your arguments reflect a lack of understanding of how (at least Clojure's) STM works. Refs are not merely markers, they do enforcement.  There is simply no mutating a ref outside of a transaction - it's not an allowed mistake.  Any attempt to change a ref outside a transaction throws an exception.  So, there is no such thing as "transaction regions don't cover all the variables they need to" in Clojure.

Alas - there still is.   :-(
Example down below, and this is my primary complaint against both locks & STM.  However, your other arguments are more quickly answered - mostly because I agree with you.

Your thought exercise with a single lock is a good one.  Let's say Clojure's implementation of STM used a single global lock (in fact it has no global locks at all), and contrast it with a manual locking program that used a single lock for everything.

Which is more correct? The Clojure one, because it is not subject to undetected 'forgetting to obtain the lock'. Your assertion that "Suppose I can get a correct program using STM - then I also have a correct program using a single global lock" implies nothing about programs written with a single manual lock where no STM system is helping ensure correctness.  All Clojure STM programs (that don't throw Ref mutation out-of-transaction exceptions) are correct with a global lock, but not all programs using a global lock (that don't throw exceptions) are correct STM (or otherwise) programs.

Good one.  I agree.  Clojure wins this round.

Which one is has better performance?  Constant factors aside, potentially the Clojure one, because readers do not wait for writers in MVCC.

Not really relevant; nobody writes the 'single global lock' version, including Clojure.  It was just a thought exercise.

As we add locks, performance of both approaches improves, but a new source of errors comes into play for the manual approach - lock acquisition order and the resultant deadlock risk.  At this point STM becomes dramatically less complex than manual locking.

Not interesting in practice.  Because in practice, deadlocks are easy to find & fix.  You get a crash dump, crawl the stacks, discover the lock cycle & reorganize.  Sure the situation could be better, but deadlocks are a 'crash once' kind of bug.  Dev's don't like 'em, but they don't lose sleep over them either.

I'll grant that it is likely that, in the hands of experts, it is possible to write a correct program with a custom crafted locking strategy that will outperform any general-purpose STM strategy.  But I still contend that, for any moderately complex program, that strategy will approach the complexity of an STM, and that most people will get it wrong in subtle, intractable ways.

Analogies to databases are telling.  DBMS's took over locking long ago and most people were happy to have them do so.  Ditto memory management and GC.  In each case, people claimed superior control and performance in the manual solution, but the cost and complexity proved too high for most applications.

Actually, I was thinking of the compiler-vs-ASM analogy that took place in the 60's & 70's.  But point well taken: I agree that we need some kind of support here.  Maybe in the very long run STM runtimes get better & STM does win.  Meanwhile life sucks.  If you can only solve the problem with STM's I'll point out below then I'll agree with you, and maybe even try my hand at an STM implementation.

So, mutable entity identification is a problem, but it is a subproblem of correctness enforcement. It is likely that, given well-identified sharable entities, some form of static correctness analysis might be brought to bear on the manual locking case, and I'm sure people are working on that. STMs can provide correctness enforcement today, and therefore the two are not equivalent. Any opacity in STM operation or performance is an area for improvement, not necessarily a reason for avoidance.

Ok, the long sought after counter example: STM's do not compose w/correctness. Bad Java syntax follows, because I don't 'speak' Clojure.  I'm using the classic example.

    class Account {
      long _money;
      add( long m ) { atomic { _money = _money + m; } }
    }
   
    transfer( Account a1, Account a2, long m ) {
      a1.add( m);
      a2.add(-m);
    }

While add alone is STM'd & safe, the two calls to add that need to be wrapped in an atomic are not - so there is a race where the accounts do not balance.  Deciding at what level to add the next atomic (should it be around transfer or at a higher level or not at all?) demands the programmer have domain knowledge not already encapsulated in the Account class.  THIS is the source of vast numbers of bugs.

Less trivial examples quickly spiral out of control.  Sure I know Ref's A, B & C are all Ref's and thus only updated in dosync blocks - but can I split any collection of Ref updates into more than one atomic region?  The ad-absurdum answer is to wrap a dosync around all Ref's - which means the whole program, and that means a single-threaded program.  Imagine the program unrolled in front of you, as an endless sequence of actions (all control flow flattened out)... interspersed with other actions are updates to Ref's.  Now: dissect this list into groups with atomic or lock as you see fit, preserving the program meaning as another unrolled thread of execution is interspersed.  How do you decide where it's OK to split the stream and where it's not OK? Why, in the Grand Scheme of things is:

  "... { read _money ... write money } ... " 
a correct splitting of the action stream, while
  "... { read _money ... write _money }  ... { read _money ... write _money } ..." 
is broken, but
  "... {{ read _money ... write _money } ... { read _money ... write _money }} ..." 
is correct again?

A Brief Digression into the Usefulness of Deadlock Avoidance

Phil Jordan

I'm new to STM, I've stumbled into it after doing some explicit/low-level lock-free programming and systems that are synchronised classically with mutex/semaphore-based locking.  I especially don't know what goes on under the hood in Clojure's STM implementation, or how powerful the JVM is in terms of memory guarantees.

I'm chipping in out of interest and to improve my understanding, apologies if I sound like an idiot:

Cliff Click wrote:

  On May 24, 10:51 am, Rich Hickey

wrote:   

    As we add locks, performance of both approaches improves, but a new source of errors comes into play for the manual approach - lock  acquisition order and the resultant deadlock risk. At this point STM becomes dramatically less complex than manual locking.   

  Not interesting *in practice*.  Because in practice, deadlocks are easy to find & fix.

From personal experience, this is far from the truth in complex systems.  Deadlocks happening only on "in the wild" systems, appearing in the form of heisenbugs, etc.  Not fun at all.  There's too much in the way of implicit contracts going on, which blows up in your face if you're trying to extend undocumented software that was written by someone who left the company before you arrived.  (and we all know the "well then document it" approach just doesn't happen in practice)

Maybe it's just the implementations I've used (pthreads, Win32, OpenMP) and others give you higher-level diagnostics, but at the level I was working it could get very painful.

You get a crash dump, crawl the stacks, discover the lock cycle & reorganize. Sure the situation could be better, but deadlocks are a 'crash once' kind of bug.

You need a reasonable amount of infrastructure in place for that, plus you're relying on absence of evidence rather than proof that it can't deadlock.

Dev's don't like 'em, but they don't lose sleep over them either.

The people who lose sleep over software quality are probably the kind who try to avoid complex locking schemes like the plague in the first place. :)

  Bad Java syntax follows, because I don't 'speak' Clojure.  I'm using the classic example.

    class Account {
      long _money;
      add( long m ) { atomic { _money = _money + m; } }
    }
   
    transfer( Account a1, Account a2, long m ) {
      a1.add( m);
      a2.add(-m);
    }

While add alone is STM'd & safe, the two calls to add that need to be wrapped in an atomic are not - so there is a race where the accounts do not balance.  Deciding at what level to add the next atomic (should it be around transfer or at a higher level or not at all?) demands the programmer have domain knowledge not already encapsulated in the Account class.  THIS is the source of vast numbers of bugs.

My understanding is that this is exactly the kind of situation where STM excels: you wrap the two add calls in a transaction rather than making them individually atomic. The way Clojure handles this (I've been spending 99.9% of my time in Clojure on non-threaded things, so I could easily have missed something) is that your _money would be a ref, and any attempt at modifying it will fail unless you're in a transaction.  Wrapping the add around the transaction would be the anti-pattern here, you want to make the transfer a transaction.

Less trivial examples quickly spiral out of control.  ...

Okay, you kind of lost me with what you're trying to say here.

I get the impression you're mixing up atomic operations on the low level with a high-level STM implementation like Clojure's.  The latter presumably won't be efficient unless the implementation uses the former, but my understanding is that the programmer using an STM implementation is largely supposed to be isolated from such details, as long as he/she is able to determine what operations should be wrapped in a single transaction.

Cliff Click

I'm new to STM, I've stumbled into it after doing some explicit/low-level lock-free programming and systems that are synchronised classically with mutex/semaphore-based locking.  I especially don't know what goes on under the hood in Clojure's STM implementation, or how powerful the JVM is in terms of memory guarantees.

I'm chipping in out of interest and to improve my understanding,

Let's see if I can do you some justice...

  From personal experience, this is far from the truth in complex systems.  Deadlocks happening only on "in the wild" systems, appearing in the form of heisenbugs, etc.  Not fun at all.  There's too much in the way of implicit contracts going on, which blows up in your face if you're trying to extend undocumented software that was written by someone who left the company before you arrived.

Yup.  So the deadlock happened.  Ouch.

  (and we all know the "well then document it" approach just doesn't happen in practice)

Yup, but You could make a difference.  HotSpot probably has the highest level of 'asserts' per line of code (and a pretty darned high level of comments per line of code) of anything Out There.  Docs are cheaper than debugging.  But it's an aside.... back to deadlocks-in-practice:

  Maybe it's just the implementations I've used (pthreads, Win32, OpenMP) and others give you higher-level diagnostics, but at the level I was working it could get very painful.   

    You get a crash dump, crawl the stacks, discover the lock cycle & reorganize.   

  You need a reasonable amount of infrastructure in place for that,

Crash dump?  Core file?  'ctrl-backslash' to a JVM to get a thread-dump?
This stuff is commonly available.

If you don't have a proper tool chain, then Job One for you should be to go get one - and I'm very serious here.  Drop everything else until you get a functional 'path' (protocol, implementation, business process, etc) that allows you do debug a crash from a customer in the field.  You might need a willing beta-partner for this, hand him a broken program & let it crash, figure out he needs disk-space to capture the core & logs, needs training to hit 'ctrl-backslash' when suspecting a deadlock, etc - plus FTP the files back to Home Base, plus you need to be able to capture the files per-customer-per-bug (ala Bugzilla), then decrypt the file (gdb for core, eyeballs or a little perl for stack-trace logs), etc, etc, etc, ....  No Rocket Science, but annoying bits of Engineering along with some Customer Management.

  plus you're relying on absence of evidence rather than proof that it can't deadlock.

Err.... No.

I'm not shooting for perfection here; I'm shooting for "good enough to ship".  Which means that if - In Practice - each possible deadlock happens once, then gets fixed - that's almost always Good Enough.  Maybe not to run a nuclear reactor, but definitely good enough to run large businesses.  In practice, most possible deadlocks never happen - but a few that do require several go-'rounds before getting fixed properly.  And then the deadlock is fixed, and remains fixed.

 

    Dev's don't like 'em, but they don't lose sleep over them either.   

  The people who lose sleep over software quality are probably the kind who try to avoid complex locking schemes like the plague in the first place. :)

Yup... and those folks are generally stuck anyways.  But if they are thinking about the problem ahead of time they are going to be way way ahead in the long run.

  My understanding is that this is exactly the kind of situation where STM excels: you wrap the two add calls in a transaction rather than making them individually atomic.  The way Clojure handles this (I've been spending 99.9% of my time in Clojure on non-threaded things, so I could easily have missed something) is that your _money would be a Ref, and any attempt at modifying it will fail unless you're in a transaction.  Wrapping the add around the transaction would be the anti-pattern here, you want to make the transfer a transaction.   

...   

Okay, you kind of lost me with what you're trying to say here.


I Restate My Argument

Cliff Click

Trivial examples are.... trivial.  They can be fixed in a thousand obvious ways.  We need to extrapolate to the non-trivial, because that's the only place where the STM-vs-Locks argument becomes interesting.  So lets pretend that instead of a single Ref _money and two classes, I've got 500 Ref's and a million lines of code - ala HotSpot (Sun's JVM).  Easily >500 shared concurrent variables, about 750KLOC.  About ~100 unique locks (some are classes of striped locks) guarding them in very complex locking patterns.  Now replace all those locks with an STM & atomic.  Is my program any more correct?  Not really.... ...I might have avoided some potential deadlocks (HotSpot uses lock ranking asserts to avoid deadlock; deadlock rarely happens at the engineers desk and maybe once/year in the field across all HS users).  The set of deadlocks-in-the-field avoided was miniscule.  I'll concede that HotSpot is a rarely-well-engineered piece of concurrent code, and that deadlocks-in-the-field appear to happen more often to other large programs.  But still, fixing after the fact is reasonable when the deadlock rate is so low and each fix 'sticks'.

Instead of deadlock, HS crashes far far more often because the locks don't cover the right set of changes.  Switching out the locks for an STM didn't change what I was guarding; it only removed any fine-grained-lock errors (admittedly useful... but only incrementally more so than avoiding deadlocks).  I'm still stuck with a program that's too Big to see where the proper atomic/STM/locking boundaries need to be.  In a trivial example I can say "go up one call level and atomic there", but in the Real Program - I can't do that.  Go up how many layers and add atomic?  1 layer?  10 layers?  100 layers?  Yes, I see unique call-stacks with >100 layers.  I can't put atomic around main because that makes my program single-threaded.

Here's where I want some kind of compiler support.  Ref helps - because it at least demands I have an atomic around the Ref.  But Ref is insufficient, because a syntactally correct program simply wraps an atomic around each Ref - exact what my trivial example did.  I'd like to be able to specify classes & groupings of Refs that have to be wrapped in atomic - that the compiler will complain about.  Yes I'm still responsible for the semantics - I have to tell the compiler which groupings of Refs are interesting - but I'd like some kind of automatic support, so that as my program goes from 10 lines to 10MLOC I can be told "you touched both Ref Person._checking._money and Ref Person._savings._money without wrapping both Ref accesses in a single atomic, thats a datarace error".

Rich Hickey

On May 25, 11:04 am, cliffc

wrote:

Not interesting *in practice*.  Because in practice, deadlocks are easy to find & fix.

I think you'll find plenty of dissent on that point.

Deciding at what level to add the next atomic (should it be around transfer or at a higher level or not at all?) demands the programmer have domain knowledge not already encapsulated in the Account class.  THIS is the source of vast numbers of bugs.

That's a problem with the Account class and OOP - classes are simply the wrong granularity for so many program semantics.  Yet, programmers demonstrate they can handle this, and set appropriate transaction boundaries, because they do so in their database work.  There is a pretty close analogy there, since many DBMSs will treat every statement in a batch as an independent transaction unless collected in an explicit transaction.  In spite of the same risks, work precisely like this is getting done correctly, and relatively easily, I think in no small part due to the simplicity of wrapping a bunch of arbitrary statements in BEGIN TRANSACTION.

Less trivial examples quickly spiral out of control....

What you are asking here is a philosophical question, which computers may never answer, since the answer depends on the semantics of the program, and such semantics will probably never be conveyed to the system more easily than:

  (defn transfer [a1 a2 m]
    (dosync
      (add a1 m)
      (sub a2 m)))
or the Java-esque:
  transfer( Account a1, Account a2, long m ) {
     atomic{
       a1.add( m);
       a2.add(-m);
     }
   }

Absent some similar declaration of relatedness, the program will have to understand itself (or one program another), since while it may look like debit/credit to us, it just looks like foo/bar to it.

I hope you don't wait for an answer to this (hard AI, potentially impossible) problem before dipping your toes in STM - it's likely you could make a significant contribution.

Rich Hickey

Towards that end, and more generally, it would be nice if there were metrics for success other than anecdotes from the field.  There should be some meaningful problem statement STM and other solutions could take aim at, something more real-world than the typical STM paper's correctness test or Erlang's 'pass messages around in a circle' and 'accept connections until you die' benchmarks, and more attainable for a new technology than the million-line systems Dr. Click and Brian Goetz mentioned in their final Java One talk, or Erlang's nine-9s phone switches.  Then everyone could run proposed solutions on their own 2/16/multi-hundred-processor systems and say - nope, not quite, this works, this doesn't.

Unfortunately, describing such a problem in an architecture-neutral way is difficult.  I think there can be a certain presumption that any concurrency solution should allow people to architect systems much as they do now, as a graph of directly-connected mutable stateful objects.  Dropping STM (and many of the other concurrency mechanisms) into such an architecture is likely to forever disappoint.  IMO, if you're not serious about immutability, you're likely to struggle with concurrency.

On my side, points were taken re: transparency and control for STMs.  The Clojure STM's current architecture is certainly capable of both better reporting and control.  So I'll be working on adding the necessary diagnostic counters, and exposing the many aspects of the Clojure STM that could become 'knobs' for performance tuning - timeout windows, retry counts, history cache sizes etc.  So, I have some work to do before I'd like that kind of scrutiny :)

Cliff Click

On my side, points were taken re: transparency and control for STMs.  The Clojure STM's current architecture is certainly capable of both better reporting and control.  So I'll be working on adding the necessary diagnostic counters, and exposing the many aspects of the Clojure STM that could become 'knobs' for performance tuning - timeout windows, retry counts, history cache sizes etc.  So, I have some work to do before I'd like that kind of scrutiny :)

I'm going to be a sourpuss again here.   :-(
This is exactly the trap MPI fell into; and you have to do it anyways.   Double-unsmiley.   :-(   :-( 

Here's the deal:

     

  • I write a Large Complex Program, one that Really Needs STM to get it right.   
  • But performance sucks.   
  • So I do a deep dive into the STM runtime, and discover it has warts.   
  • So I hack my code to work around the warts in the STM.   
  • Crap like: at an average of 5 cpus in this atomic block the STM 'works', but at an average of 7 cpus in the same atomic block I get a continous fail/retry rate that's so bad I might as well not bother.  So I guard the STM area with a "5-at-a-time" lock and a queue of threads waiting to enter.  Bleah (been there; done that - for a DB not an STM but same-in-priniciple situation).  A thousand thousand variations of the same crap happens, each requiring a different hack to my code to make it performant.   
  • Meanwhile the STM author (You: Rich) hacks out some warts & hands me a beta version.   
  • I hack my code to match the STM's new behavior, and discover some new warts.

Back & Forth we go - and suddenly: my app's "performance correctness" is intimately tied to the exact STM implementation.  Any change to the STM's behavior kills my performance - and you, Rich, have learned a lot about the building of a robust good STM.  You (now) know the mistakes you made and know it's time to restart the STM implementation from scratch.

My options limited:

     

  1. Scream at you to Not Change A Thing.  Thus C/C++/Clojure Standards Committees are born.  1/2 smiley :-P   
  2. Cling tenaciously to the abandoned version, and recognize my code is now abandon'd ware.  No new features or support for me.  But maybe the App is running fine and I can forget about it.   
  3. Rewrite my app from scratch Yet Again, to match the new STM's warts/features.

MPI is in this position.  Every large parallel scientific App Out There is intimately tied to the marriage of parallelizing_compiler + MPI_runtime + computer_archicture.  Changing any one of those pieces requires the app to be rewritten from scratch in order to achieve a fraction of the required performance.

The parallelizing compilers have stablized in a performance way.  There's a coding style which will auto-vectorize/auto-parallelize/auto-stripe/etc and there's some codes "on the edge" (some compiler+hardware combos yes, some no), and there's a well known "don't go here, the compiler can't hack it".  The compiler support for STM is very much lacking and/or in-flux.  Your heading into this terrain now, as you add Clojure compiler support to your STM.  Me, as an STM user, can't know what's in store for me as you go through the learning process.  I know that I don't know what STM coding styles will be performant and what will not.

The MPI_runtime has now stablized (I believe, not sure) after 20+ years.  Again there's the "this is good, this is marginal, this is bad" folk wisdom that drives coding.  Again, for STM's, this area is very much in flux.  Go read some of the bazillion academic papers Out There.  Everybody's got their own take, every STM is good in this domain & bad in some other domain - and all the domains are different; god forbid I write a large program dependent on STM 'A' and later try to switch over to STM 'B'.

The computer_arch has been stable for message-passing for some time.  For STM, I believe it's trying to ramp-up.  i.e., the computer_arch for STM is "all the world's an X86" right now, and all hardware vendors are furiously studying what it would take to add some kind of STM/HTM/hybrid support.

Thus, for STM to make it to the 'Promised Land' - the STM industry needs to figure out:

     

  • what belongs in the compiler   
  • what belongs in the runtime   
  • where there are Dragons and where there are Green Pastures   
  • teach the STM users which paths lead to Dragons or Green Pastures.

If we BOTH don't go through the excercise, then STM will never hit the promised land.

MPI never really "made it"; the 'Green Pasture' area was too small, and it was always surrounded by steep performance cliffs leading to Dragons when you slipped off the edge.  Upgrade your hardware: rewrite your app.  Upgrade your compiler: 2 months of performance debugging. Upgrade your MPI: 2 months of performance debugging to discover you need to rewrite your app.

GC "made it"; the Green Pasture gradually got bigger & bigger; Dragons remain lurking - but only after you filled up an ever-growing Green Pasture and needed to poke at the edges of stability.


We Agree to Press On, Bringing Light into the Darkness

Raoul Duke

it would be pretty nifty if you two had the chance to get together to try out & blog about Clojure STM on an Azul box :-)

Rich Hickey

it would be pretty nifty if you two had the chance to get together to try out & blog about Clojure STM on an Azul box :-)

That could be fun.

Cliff Click

 

    it would be pretty nifty if you two had the chance to get together to try out & blog about Clojure STM on an Azul box :-)   

  That could be fun.

Send me an email; Azul hands out 'academic' accounts all the time.  You get a remote login; you upload your code & run it.

And, if you don't mind, I'm going to edit this entire thread for clarity and echo it on my blog. This is good stuff (this conversation), and I definitely wish you the best of luck.

Cliff

Category: Web/Tech | | TrackBack (0)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451bd7669e200e5528a4d288833

Listed below are links to weblogs that reference Clojure: STMs vs Locks:

 

Comments

http://research.microsoft.com/~simonpj/papers/stm/beautiful.pdf

This paper by Simon Peyton Jones on STM has a linguistic solution for the problem you identify re:

transfer( Account a1, Account a2, long m ) {
a1.add( m);
a2.add(-m);
}

Posted by: Josh Dybnis | May 28, 2008 6:39:21 PM

 

I keep thinking that you should meet Simon Peyton Jones. The Haskell implementation of STM solves many of the problems you found (e.g. STM operations have a type different than plain IO operations, also pure operations have yet another type so we always know where the concurrent part starts and where it ends), while I agree that some of the problems you talk about do happen in Haskell's STM (and are known issues). It just seems that both of you have different backgrounds and could learn a great deal from each other.

Posted by: Daniel Yokomizo | May 29, 2008 6:39:18 AM

 

>@Josh Dybnis said
>This paper by Simon Peyton Jones on STM has a linguistic solution for the problem you identify

No, it doesn't. With the mechanisms in that paper, once they strip off the STM monad using atomically, operations like transfer are in the IO monad and can no longer be composed at all!

From the paper:

"However, as soon as you wrap a block in atomically, making it an IO type, it can no longer be combined atomically with other actions."

Thus this couldn't be made atomic:

shuffle-transfer (A, B, C, m)
{
transfer(A, B, m)
transfer(B, C, m)
}

At least with Clojure's STM, you could nest the two calls to transfer in another transaction if the business rule was that shuffle-transfer constitute a single unit of work.

The problem proposed was: given calls to 2 or more independently transactional operations, how can the system 'know' when they need to be grouped in a higher level transaction or when they are correct as independent actions. I contend this is not a problem of locks or STM but one of machine understanding, and it should be no more expected to do so than, given:

if(a b c d e) ...

the system know where to put the 'and's and 'or's between the conditions. It is a question of program intent and meaning, any specification of which is going to have to include and/or/atomic for the foreseeable future.

Posted by: Rich Hickey | May 29, 2008 7:08:34 AM

 


> The problem proposed was: given calls to 2 or more independently transactional operations, how can the system 'know' when they need to be grouped in a higher level transaction or when they are correct as independent actions. I contend this is not a problem of locks or STM but one of machine understanding, and it should be no more expected to do so than, given:
>
> if(a b c d e) ...
>
> the system know where to put the 'and's and 'or's between the conditions. It is a question of program intent and meaning, any specification of which is going to have to include and/or/atomic for the foreseeable future.


Yes, but this isn't quite the problem I'm hoping to see solved (and yes, *this* problem isn't getting solved anytime soon!).

What I want to see solved is a way to declare that groups of Refs are somehow related
- and that groups of updates to them should be wrapped in 'atomic' or
- perhaps explicitly declared 'this is OK not being atomic with that' or
- and perhaps some State Machine support, such that the compiler can declare that some 'State' of the combination of variables is possible (or the programmer declare a State is supposed to be not reachable by any interleaving,) etc

Cliff

Posted by: Cliff Click | May 29, 2008 7:41:06 AM

 

Rich: as far as I can tell, a nested dosync in Clojure becomes a no-op? There's no reason this can't be done in Haskell except then you lose something else, the `orElse` call, which adds a great deal of expressivity.

Anyway, it's hardly the same thing. Transfer should *always* have a type of STM rather than IO. In common Haskell idiom, IO calls tend to be restricted and isolated as is, and composed later. Since calling transfer doesn't actually "do" anything, but just produces an "action" (of type STM or IO depending on how you write it) the point is that the caller decides where "atomically" needs to be placed at the point where they actually *run* it.

Posted by: sclv | May 29, 2008 8:29:54 AM

 

>Rich: as far as I can tell, a nested dosync in Clojure becomes a no-op?

A nested dosync joins the enclosing transaction rather than start a new one.

> Anyway, it's hardly the same thing. Transfer should *always* have a type of STM rather than IO.

And yet, in the paper it wasn't, and I think in real life it often won't be either. Unless your program is going to be a single transaction, it will at some level be composed of functions which themselves encapsulate transactions, and someone is then going to want to atomically compose those functions.

> the point is that the caller decides where "atomically" needs to be placed at the point where they actually *run* it.

The person who wrote transfer thought they were actually running withdraw/deposit.

IMO, the ability to compose transactions via nesting is a real-world requirement.

Posted by: Rich Hickey | May 29, 2008 9:26:12 AM

 

@Cliff Click: What you want seems to be related to regions (as in region based memory management or region inference). So we could declare Refs to belong to a particular region and the operations on these Refs require a transaction in the particular region, we can then ensure that Refs from different regions can't be updated in the same transaction. Is that what you're talking about? AFAICS it could be implemented in Haskell, without changing the underlying STM runtime (later the runtime could be aware of it for optimizations). Right before STM was introduced in Haskell I was playing with this idea of lockable variables, where each variable that required locking knew which lock should be used to guard it (essentially the lock was a region), using monads it was possible to group a bunch of operations on lockable variables (similar to ST or STM monads) and have a synchronize operation that acquired the necessary locks and released them afterwards. I dropped the idea before implementing nestable locks (i.e. acquiring l1 implies acquiring l2) because the type machinery was beyond my skills (but it's possible to do in Haskell) and because I learned about STM and jumped on the shared-nothing/message-passing bandwagon (which I'm still on today). Anyway using lockable variables the type system ensured that no deadlocks could ever happen, perhaps it would be interesting to get back to that lib again.

Also I agree with sclv, in Haskell it would be considered bad practice to add atomically to transfer if no other IO action was performed, because it would be more expressive to have it be an STM action and let the client compose it. If it was necessary to have other IO actions (e.g. saving to a database) then the operation couldn't be composed in any ways.

Posted by: Daniel Yokomizo | May 29, 2008 9:48:16 AM

 

@Rich Hickey: In Haskell (I think you already know this, so I'm just saying it to be clear to anybody else) everything that has type "STM a" is a transaction already, we just use atomically to "execute" the transaction (i.e. the type is "atomically :: STM a -> IO a" which means take this transaction and give me back an IO action that embodies all the visible side effects of executing it to the end so I can do other IO stuff together). If the programmer called atomically it's because at that point he was interested in the irreversible side-effects of the transaction, probably he's doing other IO stuff with it. Calling atomically without anything else afterwards is usually a bug and it's bad practice, just as it would to create a pure function and end it with "return ..." to make it be an IO action instead. Of course the programming language can't enforce good programming, but the type system does a lot to ensure that the STM is safe. Clojure has nested transaction but doesn't guarantee that irreversible actions don't happen inside transaction, Haskell OTOH ensures that no transaction has irreversible actions (other than the ones using the unsafeXXX functions, but they're unsafe as the name says and not part of the actual standard and not necessary for Haskell concurrency support), so it can't allow nested calls to atomically. But since in Haskell we don't need nothing special to add to a transaction (i.e. I understand that in Clojure the practice is to explicitly use dosync in every block that uses transaction) as they form monads we can use monadic syntax to group transactions and atomically is only required when we want to execute it. That is dosync in Clojure doesn't ensure that the transaction will be executed, it's only executed if there's no enclosing dosync in the dynamic scope, so dosync and atomically aren't quite the same and a Haskell translation of a Clojure program would end up having much less atomically than dosync.

Posted by: Daniel Yokomizo | May 29, 2008 10:06:11 AM

 

Ok a brief point on the "MPI" problem, which I think is a very legit issue to raise -- just like holding onto locks forever creates a bit of a mess, similarly you'll obviously get more rollbacks and contention with larger transactions. But restricting access to a single region doesn't make much sense, because with atomic transactions you're not breaking up large event loops but rather sets of things that either happen or don't. (Which means, incidentally, that you're solving deadlocks incidentally, but more importantly solving where you *don't* acquire/create the correct locks). In any case, the point is that one will generally have many fairly tiny individual transactions (often purely read-only) and occasionally have a very expensive transaction. In which case, if this becomes a performance issue, and this is almost regardless of runtime performance (although scheduling strategies are interesting to explore), a single global lock such that the large transaction "stops the world" for the small transactions while it runs, will suffice. And the beautiful part is that such a lock can be written directly using TVars, without any need for recourse to standard locks. And even better, such a lock can be written to *keep* optimistic behavior on the part of the small transactions, if so desired.

Posted by: sclv | May 29, 2008 11:16:31 AM

 

I think you've missed the point of STM really... I mean yeah, when you call "atomically" you've "shaved off" the STM monad and STM won't help you anymore, that's a feature, not a bug. Complaining about that is like saying inheritance is broken because you can't inherit from a "final"/"sealed" class. If you don't want to disallow composition for two transactions, then don't call atomically! Just leave them as transactions, and let the user call "atomically" when they actually want to execute the transaction (irreversably).

If you do something like

type Account = TVar Int

deposit :: Account -> Int -> IO ()
deposit acct money = atomically ( modifyTVar (+money) acct) )

Then you are *explicitly* demanding that nobody must ever be allowed to compose this "deposit" action with any other transaction. You are *asking* for it to behave that way, so it's not a bug that it does. The correct version is, of course:

deposit :: Account -> Int -> STM ()
deposit acct money = modifyTVar (+money) acct

Which *does* compose with other transactions. Like so:

transfer :: Account -> Account -> Int -> STM()
transfer acct1 acct2 money = do{
deposit acct1 money;
deposit acct2 (-money);
}

Which can in turn be composed with other transactions etc. etc., in an infinite chain of perfect composability.

If you want a transaction to play nice with other transactions, then don't call "atomically" as the purpose of that function is to run the transaction irreversably, which is precisely the opposite of what you want if you want to be able to compose it!

Posted by: Sebastian | May 29, 2008 1:11:31 PM

 

To clarify:
In my "transfer" example above, I didn't wrap it in atomically, why? Because the user might want to write something like your "shuffle-transfer", and if he in turn wants that function be reusable, he wouldn't wrap those two calls to "transfer" in "atomically" either. Just leave the atomically's out, and everything remains composable. Atomically isn't used to identify transactions, it's used to *run* transactions.

After all, if you really do want a transfer to run irreversably, it's quite easy to just call atomically on it, so why would you limit any transaction by "sealing" it so nobody can use it in their transactions? Don't call atomically unless you actually intend for that transaction to be uncomposable with other transactions and be irrevocably executed! In other words: So long as a modification to shared state actually *is* composable (i.e. it doesn't do anything irrevocable), then you should *never* wrap it in atomically as that would artificially force it to become uncomposable.

Really it seems like your "counter example" is more like the motivating example for any STM paper. It's *precisely* this problem that STM (at least the Haskell version) *does* solve - composable transactions. So if you think this is something that hasn't been solved yet, then I would really recommend that you look into it a bit deeper, and please do try your hand at an implementation. The more the merrier!

Posted by: Sebastian | May 29, 2008 1:25:27 PM

 

@Sebastian

Are you talking to me?

If so, please stop regurgitating the basic mechanism - I understand it.

I referenced the transfer function as written by SPJ himself in the cited paper:

transfer :: Account -> Account -> Int -> IO ()

You can argue (with him please) it should be otherwise. But the fact remains that someone, at some point, is going to have to drop out of the STM monad in order for the transaction to commit, and once they do, composability stops. In the Clojure STM, it doesn't stop - you can write things that are independently transactional (i.e. when you call them by themselves they commit on their own) and when you nest them in a transaction they join and commit with it. Just like with databases. It is a powerful and useful capability.

Posted by: Rich Hickey | May 29, 2008 2:17:26 PM

 

The only reason to drop out of STM is if you need to do something which is not transactional (like print to the screen), and if you're doing something which is not transactional, then you have no business expecting transactional composition to work.

I don't see why it's so useful to be able to call something and have it commit "on its own", if you need a transaction to commit, you just add "atomically" in front. It's not like that's a huge deal to avoid that. If you need to compose transactions, you just compose them.

The basic point is that the example posted is precisely the very thing that STM solves. If you explicitly opt out of STM by (prematurely?) committing the transaction, then you can't really expect it to do anything for you (just like you can't expect to inherit from a sealed/final class - if you want inheritance, don't mark it as sealed/final, and if you want composability of your transactions, then don't commit them prematurely). As long as you actually write transactions, they will be composable, so the "counter example" that was posted, is not a counter example at all, it's trivially solvable.

Posted by: Sebastian | May 29, 2008 2:32:50 PM

 

Another clarification: What the counter example is saying is essentially "Once I've explicitly demanded that transactional composition stops and the transaction should be run, it won't compose anymore!". I don't see how this is a problem.

Posted by: Sebastian | May 29, 2008 2:37:05 PM

 

*MY* real complaint isn't composability of transactions - like Rich I believe composability (via nesting) is an absolute requirement.

MY complaint is that this whole STM/locking mess misses the point: there's no language support for determining the proper boundaries (proper for correctness! must less performance) on where to insert the 'atomic' boundaries.

Putting 'atomic' at the finest level: around each 'Ref' is obviously silly; syntatically correct but useless.

Putting 'atomic' at the coarsest level: around 'main' is silly in the other way - a single-threaded program.

Putting 'atomic' in the middle requires the programmer to add domain-specific knowledge - and to track how that knowledge applies across his program.

I have a weak mind; I *like* Strong Typing because it allows my domain-specific knowledge to be put in one place (the type system) and the compiler merrily enforces my will across the million-line program. Not so with either Locks or STM - each placement of 'atomic' or 'synchronized' is an adventure unto itself, with no compiler support to tell me where I missed. I must apply my Domain Specific Knowledge again and again and again ....

Cliff

Posted by: Cliff Click | May 29, 2008 3:03:54 PM

 

I don't see the problem. You put the "atomic" where you need it to actually commit something (e.g. if you want to do something non-transactional to the result of a transaction)... If you *can* put the atomic around "main" (i.e. there is nothing irreversible in main), then that's not silly, that's precisely what you should do!

I really don't see the problem. Transactions compose, and once you need something to commit, you call "atomically" to turn it into a regular non-transactional action, this should happen as high up as possible (i.e. *only* if you really need to do something non-transactional) to leave your options open for reuse.

Posted by: Sebastian | May 29, 2008 3:14:26 PM

 

Oh, and I also don't see your point about strong typing and STM. Haskell is very strongly typed, and it does tell you when you mess up (e.g. try to do I/O within a transaction).

Posted by: Sebastian | May 29, 2008 3:16:41 PM

 

In the paper, two statements are composable into a (guaranteed) atomic statement, if and only if they are wrapped in the STM monad. The "meaning" of the STM monad is that the statements are composable. This system eliminates bugs. It makes it apparent that shuffle transfer example is not transactional.

@Cliff, I think the mechanism you are looking for would be (in the vernacular of Haskell) like a family of nameable STM monads that are mutually compatible (composable). Instead of one global STM monad, statements/objects could be tagged with a combination. And this would be enforced upstream until they are finally executed (atomically). Then the system would acquire the locks (or take the equivalent optimistic steps).

Posted by: Josh Dybnis | May 29, 2008 3:46:04 PM

 

> If you *can* put the atomic around "main" (i.e. there is nothing irreversible in main), then that's not silly, that's precisely what you should do!

If I put 'atomic' around "main" (some very high point in the program but probably not really "main") - then I make very very large atomic regions and the underlying STM must force my program to be single-threaded. i.e., it might be correct but it's too slow to care about!

If I did this silly thing 5 yrs ago, and my program was too slow - I waited a year & my program got faster when I bought the latest machine. If I do this silly thing now, and wait a year - I'll get more CPUs but the program won't run faster - can't run in parallel; the STM has no chance to allow any concurrency.

Cliff

Posted by: Cliff Click | May 29, 2008 3:56:50 PM

 

@Cliff, putting 'atomic' around the whole program does not imply single threading. It only implies ordering among the reads and writes to Transactional variables within the program. All other computation is unconstrained.

Posted by: Josh Dybnis | May 29, 2008 4:00:31 PM

 

> Oh, and I also don't see your point about strong typing and STM. Haskell is very strongly typed,

So are Java and C++ (when you don't cheat), etc....

I'm trying to make an analogy here (apparently a poor one): remember the satircal "Strong Typing is for Weak Minds"? It's satire because we obvious *can* write correct programs without strong typing, it's just that mere Weak-Minded humans can't write more than a few thousand lines of code without Strong Typing.

Same situation applies to STM & Locks: we obvious CAN write large correct STM (or locking) programs - except that in practice us mere mortals can't write more than a few thousand lines of such. Through great heroics and fantastically Strong Minds, HotSpot gets close: ~500KLOC and a heavily concurrent program. We 'cheat' some: we use coding styles and asserts to prevent some kinds of concurrency bugs. But by no means all! If I simply magically rewrote HS in Closure or Haskell - I'd have NEARLY THE SAME SET OF CONCURRENCY BUGS.

Yup - same set of bugs. Huh, you say? How can that be?

Well, the notion of no-sharing-except-via Ref's would force me to annotate all my shared state. But I've already done that! So some small help arises here, because over time I'll make mistakes in my annotation-coding-style because there's no compiler support.

My real bugs arise in 2 competing places:
- failure to wrap 'atomic' around a large enough region thus allowing a data race
- wrapping 'atomic' around too large a region thus crushing performance to the point of uselessness

STM isn't any better the Locks here (but Ref's are somewhat better than no Ref's).

Cliff

Posted by: Cliff Click | May 29, 2008 4:06:55 PM

 

Cliff, no that does not force single-threadedness where there isn't single-threadedness already. A transaction is sort of single-threaded already, as it implies isolation and atomicity so you can consider it to be single threaded, but multiple transaction can be "run" at once by calling atomically by doing it on different threads.
An atomic region will run concurrently with any other atomic region. So something like:

main = do {
x <- newTVar 0;
forkIO ( atomically ( foo x ) );
forkIO ( atomically ( bar x ) );
}

Is perfectly fine. Both foo and bar will run concurrently.

Of course if you have two actions, x and y, that should *not* be a transaction (i.e. they are totally indpendent) you *could* still put them in a transaction:

foo = do {
x;
-- don't really care if anything
-- read by x changes here
y;
}

Which may cause performance issues because the transaction gets longer than it needs to be and is more likely to be rolled back, but I don't think this is a problem really, because this is a matter of simply writing something different than what you mean.
E.g. I can't see a realistic case where foo should *sometimes* be completely atomic, and *sometimes* execute x and y each in their own atomic block. I mean either something is a transaction or it isn't. If the logical meaning of your program is that x and y has to happen as one, then you can put it in a transaction, if the logical meaning is that x and y can/must execute as two different atomic steps, then that's what you write. The code maps directly to the logical meaning of your program.

Posted by: Sebastian | May 29, 2008 4:19:05 PM

 

>My real bugs arise in 2 competing places:
>- failure to wrap 'atomic' around a large enough region thus allowing a data race
>- wrapping 'atomic' around too large a region thus crushing performance to the point of uselessness

But are these really "bugs"? The compiler obviously can't magically figure out what variables need to be updated as one unit, and which variables require that you don't. E.g. if you want to swap two variables around they need to be swapped atomically, but if you send a message to someone, and then want to receive a message back, you *need* the first message to commit before you block on the message queue for the reply. This is information that you *have* to specify somewhere, and it maps *directly* to atomically clauses. Put "atomically" around stuff you need to happen automically - but *don't* put it around everything that's transactional, imagine that the function was called "commit" if you like.

Posted by: Sebastian | May 29, 2008 4:27:03 PM

 

> @Cliff, putting 'atomic' around the whole program does not imply single threading. It only implies ordering among the reads and writes to Transactional variables within the program. All other computation is unconstrained.

This is naive in a large program. Sure "all other computation is unconstrained" - except that "all other computation" depends on that ordering. In emabarrassingly parallel apps I might find lots of work without needing to touch any TVar's - but for such programs there's lots of coding styles that work.

Another Stupid Gedanken: - intended to be extrapolated from... and not 'solved as is'. Thread1 reads from a Java HashTable & does some work, then writes back a result, Thread2 reads from same HashTable & does work... etc for 1000 threads. Only the HashTable is shared. The table is very large and the workers usually never touch the same data (sometimes, but rarely).

HashTable is sync'd in Java, thus single-threaded - and performance is limited to 1 cpu. So I swap the 'sync' for 'atomic' and otherwise have the same identical program. Performance should be nearly 1000x no? Well.... it depends on the quality of the STM and the collision rate and...

Nope: performance will be that of a single CPU!?!?!?!???

Why? Hashtable has a shared 'mod counter' Each thread will have to update that counter. That bit of shared data (each thread doing a read-modify-write on a single shared word) will force all transactions to serialize.

Aha you say! Remove the stupid mod counter; nobody uses it... but wait: this is a Gedanken - dont try to solve Hashtable's performance - extrapolate to what happens when I try to save performance of my million line program.

It's easy to fix HashTable; but I can't do that same fix a million times over, each for slightly different variations of things.

Lesson: in a large program you *will* be data-dependent on stupid-but-shared crap. Most of your time (that's not spent on chasing bugs) is spent finding & pulling out such crap. If I put 'atomic' around each thread in WebLogic's thread pools, then the STM is very very rapidly going to be running WebLogic single-threaded.


Cliff

Posted by: Cliff Click | May 29, 2008 4:33:13 PM

 

Actually I don't think that hashtable example will run single-threaded.
The work you've done by the time you update the mod counter won't be lost unless someone else commits a change to the mod counter between the time you read it and the time you write to it.
The point is that with enough threads the variable that has the main contention (mod counter) will be read/updated very rapidly over and over again by different threads, and each time the thread that *does* manage to successfully update the mod counter, will have "his" work committed, and as soon as that happens another thread will be probably ready to update the mod counter and commit (so you don't need to wait "serially" for the work that this second thread did, he did that work in parallel). So the only thing that gets single threaded is the update to the mod counter, the rest of the work is still parallelised.

Posted by: Sebastian | May 29, 2008 4:47:47 PM

 

Post a comment