Designing APIs for Asynchrony

New 2014-06-30: You can contain Zalgo with the dezalgo module

Some people asked me to explain what I meant by “releasing Zalgo” in async APIs, so I thought I’d share the following guidelines.

Please follow all of these rules in your programs. If you deviate from them, you do so at the peril of your very sanity.

Get Over Your Provincial Boilerplate Preferences

Personally, I don’t actually care all that much about the whole promises/callbacks/coroutines/generators debate. It’s provably trivial to turn any callback-taking API into a promise-returning API or generator-yielding API or vice-versa.

I’m of the opinion that breaking functionality into small enough chunks that your boilerplate doesn’t matter is much more important than which boilerplate you decide tastes the yummiest. In JavaScript, we have first-class language support for passing functions around, so I tend to use callbacks, simply because it requires no extra “stuff”.

What’s important to me is that an API be useful, performant, and easy to reason about. When I’m writing JavaScript, I care a lot about how V8 is going to optimize it, and so try to choose patterns that it can optimize nicely. As of today, generators just aren’t first-class enough, and Promises require an extra abstraction. That may change someday, but these principles probably will not.

So, I’m going to say “callbacks” below, but you can mentally substitute whatever thing you like. The fundamental points are equally relevant if you’re using threads, or gotos, or any other abstraction or language for doing things asynchronously and controlling what the computer does next.

Note that all of this also applies if your asynchronous API “looks synchronous”. Even if you design a thing so that I do:

x = foo();

and it’s completely agnostic at authoring time whether foo() spins on CPU or yields while it awaits a network response, being able to reason about the program means that I need to be able to know whether other parts of the program will be running while I’m waiting for foo() to return a value to be put into x.

So, even if the syntax is the same (which has its benefits as well as its drawbacks, of course), the semantics must be crisply defined.

Or else Zalgo is released.

Do Not Release Zalgo

Warning: Before proceeding any further, go read this excellent essay by Havoc.

If you haven’t read Havoc’s post on callbacks, then you will perhaps be tempted to make silly arguments that make no sense, and demand justification for things that have been established beyond any reasonable doubt long ago, or tell me that I’m overstating the issue.

Have you read Havoc’s post on callbacks? Yes? Ok.

If you didn’t, shame on you. Go read it. If you still haven’t, I’ll sum up:

If you have an API which takes a callback,
and sometimes that callback is called immediately,
and other times that callback is called at some point in the future,
then you will render any code using this API impossible to reason about, and cause the release of Zalgo.

Needless to say, releasing Zalgo onto unsuspecting users is extremely inappropriate.

It’s not the case that function-taking APIs must always call the function asynchronously. For example, Array.prototype.forEach calls the callback immediately, before returning. But then it never calls it again.

Nor is it the case that function-taking APIs must always call the function synchronously. For example, setTimeout and friends call the callback after the current run-to-completion. But they always return before calling the callback.

In other words, to avoid the release of Zalgo, exactly one of the following must be true:

var after = false;
callbackTaker(function() {
  assert(after === true);
});
after = true;

OR:

var after = false;
callbackTaker(function() {
  assert(after === false);
});
after = true;

and in no case can you ever have a function where exactly one of these assertions is not guaranteed.

What about internal functions in my library that are not exposed?

Well, ok, but you’re releasing Zalgo in your internal library. I don’t recommend it. Zalgo usually finds a way to sneak His Dark͝ Tend̴r̡i҉ls out into the world.

What if I suspect that a function might try to release Zalgo?

This is a great question. For example, perhaps you let users supply your library with a callback-taking function, and you worry that they might be incompetent or careless, but want to ensure that your library does its best to keep Th͏e Da҉rk Pońy Lo͘r͠d HE ́C͡OM̴E̸S contained appropriately.

Assuming that your platform has an event loop or some other kind of abstraction that you can hang stuff on, you can do something like this:

function zalgoContainer(cbTaker, cb) {
  var sync = true;
  cbTaker(cbWrap);
  sync = false;

  function cbWrap(er, data) {
    if (sync)
      process.nextTick(function() {
        cb(er, data);
      });
    else
      cb(er, data);
  }
}

This uses Node’s synthetic deferral function: process.nextTick, which runs the supplied callback at the end of the current run-to-completion. You can also use setImmediate, but that’s slightly slower. (Yes, yes, it’s named badly; setImmediate is slightly less “immediate” and more “next” than nextTick, but this is an accident of history we cannot correct without a time machine.)

But isn’t it faster to just call the callback right away?

Go read Havoc’s blog post.

Moving on.

Avoid Synthetic Deferrals

I know what you’re thinking: “But you just told me to use nextTick!”

And yes, it’s true, you should use synthetic deferrals when the only alternative is releasing Zalgo. However, these synthetic deferrals should be treated as a code smell. They are a sign that your API might not be optimally designed.

Ideally, you should know whether something is going to be immediately available, or not. Realistically, you should be able to take a pretty decent guess about whether the result is going to be immediately available most of the time, or not, and then follow this handy guide:

If the result is usually available right now, and performance matters a lot:
1. Check if the result is available.
2. If it is, return it.
3. If it is not, return an error code, set a flag, etc.
4. Provide some Zalgo-safe mechanism for handling the error case and awaiting future availability.
Here’s an silly example to illustrate the pattern:
```
var isAvailable = true;
var i = 0;
function usuallyAvailable() {
  i++;
  if (i === 10) {
    i = 0;
    isAvailable = false;
    setTimeout(function() {
      isAvailable = true;
      if (waiting.length) {
        for (var cb = waiting.shift();
             cb;
             cb = waiting.shift()) {
          cb();
        }
      }
    });
  }

  return isAvailable ? i : new Error('not available');
}

function waitForAvailability(cb) {
  // Could also defer this and just call it,
  // but it is avoidable, which is the point.
  if (isAvailable)
    throw new Error("hey, dummy, check first!");
  waiting.push(cb);
}
```
In this case, when the user calls usuallyAvailable() they’ll get a number between 0 and 9, 90% of the time. It’s on the caller to check the return value, see if it means that they have to wait, etc.

This makes the API a bit trickier to use, because the caller has to know to detect the error state. If it’s very rare, then there’s a chance that they might get surprised in production the first time it fails. This is a communication problem, like most API design concerns, but if performance is critical, it may be worth the hit to avoid artificial deferrals in the common cases.

Note that this is functionally equivalent to O_NONBLOCK/EWOULDBLOCK/poll pattern. You try to do a thing, and if it is not available right now, it raises an error, and you wait until it IS available, and then try again.
If the result is usually available right now, but performance doesn’t matter all that much

For example, the function might only be called a few times at startup, etc.

In this case, just follow the “result is usually not available right now” approach.

Note that performance actually does matter much more often than you probably realize.

If the result is usually not available right now

Take a callback.
Artificially defer the callback if the data is available right now.

Here’s an example:

var cachedValue;
function usuallyAsync(cb) {
  if (cachedValue !== undefined)
    process.nextTick(function() {
      cb(cachedValue);
    });
  else
    doSomeSlowThing(function(result) {
      cb(cachedValue = result);
    });
}

The basic point here is that “async” is not some magic mustard you smear all over your API to make it fast. Asynchronous APIs do not go faster. They go slower. However, they prevent other parts of the program from having to wait for them, so overall program performance can be improved.

Note that this applies to not just callbacks, but also to Promises and any other control-flow abstractions. If your API is frequently returning a Promise for a value that you know right now, then either you are leaving performance on the table, or your Promise implementation releases H̵͘͡e ̡wh́o͠ ̶Prom̀͟͝i̴s̀es̀ o҉̶nl̨͟y̧ ̛̛m̴͠͝a̡̛͢d̡n̴̡e͝s̸s͠, T̢҉̸h̴̷̸̢ȩ͡ ͘͠N͢͢e͏͏͢͠z̛͏͜p̸̀̕͠ȩ́͝͝r҉̛́͠d̴̀i̴̕҉͞a̴̡͝͠n̢͜͟͢͟ ̶̴̢͝h̷̕͠í̸̧̛͜v̶̢͢͡e̕͡-̸̀͝m̷͜i̛͘͞ņ̛͘͟҉d̶̶̡̧͜ ̷̛͞o̵̢͘͟͞f̶̢̀͢͢ ̶̧͟͡c̕͝h̶̀͘͘à͏o҉̴́͢s̸͘͘͝͞.̨͢͞.

So, don’t do that.

Pick a Pattern and Stick With It

This really cannot be overstated. Like the question of callbacks vs Promises vs whatever, the specific pattern that you choose doesn’t matter very much. What does matter very much is that you stick to that pattern with ruthless consistency. The pattern should cover how a user interacts with your API, and how errors are communicated.

In Node.js, the callback pattern is:

Async functions take a single callback function as their final argument.
That callback is called in the future with the first argument set to an Error object in the case of failure, and any additional results as the subsequent arguments.

Why have the callback as the last argument? Why make Error the first arg? It doesn’t really matter very much. Ostensibly, making the callback the last arg is because

doSomething(a, b, c, d, function(er) {
  ...
})

scans a bit better than:

doSomething(function(er) {
  ...
}, a, b, c)

and putting the error first is so that it’s always in the same place.

There were other choices that could have been made. But the important thing was just to pick one, and move forward with it. This consistency makes it much less difficult to convert Node APIs into other sorts of aync patterns, and greatly reduces the communication required to tell a user how to use a Node.js callback-taking API.