A Scalability Roadmap

My rough proposal for optimizing new block announcements resulted in lots of discussion about lots of scaling-up issues. There was some misunderstanding that optimizing new block messages would be a silver bullet that would solve all of the challenges Bitcoin will face as usage grows; this blog post is meant to sketch out one possible path for the behind-the-scenes technical work that is being done (or will need to get done) over the next few years to scale up Bitcoin.

There are other ideas for how to make Bitcoin scale, and whenever practical I like to choose “all of the above” for how to solve a problem, because nobody is smart enough to choose The One True Solution every time. So I won’t be surprised or disappointed if development wanders off this roadmap in a different direction.

Initial Download

Everybody who runs the Bitcoin Core reference implementation the first time is annoyed by an absurdly long wait for it to download and then index the entire history of Bitcoin transactions. Twenty-something gigabytes of transaction data is downloaded, and that number is growing all the time.

Jeff Garzik creates and seeds a BitTorrent download that will speed up the initial transaction data download. Because block data is self-verifying you don’t have to worry about the security of the download; worst case is you waste time and bandwidth downloading an invalid file. However, re-indexing all 20-something gigabytes of data can still take many hours on some machines.

Pieter Wuille has been hard at work on a “headers first” approach — downloading the longest chain of 80-byte block headers, which is only 25 megabytes of data. The headers are sufficient to know whether or not you have the best chain, and once your node has the headers it can “back fill” by requesting complete blocks from multiple peers in any order it wants, similar to how BitTorrent downloads chunks of a large file from many different peers at once.

Pieter has also been working hard on ‘libsecp256k1′ — a highly optimized library for performing math on the elliptic curve used to secure Bitcoin transactions. It is undergoing extensive review, and will be rolled out when we’re convinced it is bug-free and completely compatible with the existing, OpenSSL-based code.

rdponticelli has a pull request to run Bitcoin Core to run with a “pruned” block database. Once you have downloaded and indexed the full block chain, the only reason to store all of the old transaction data is to serve it to brand new peers who are performing the initial download.

You might be surprised that old blocks aren’t needed to validate new transactions. Pieter Wuille re-architected Bitcoin Core a few releases ago so that all of the data needed to validate transactions is kept in a “UTXO” (unspent transaction output) database. The amount of  historical data needed that absolutely must be stored depends on the plausible depth of a blockchain reorganization. The longest reorganization ever experienced on the main network was 24 blocks during the infamous March 11, 2013 chain fork.

The next step will be to make pruning the default, but before that is done the network protocol needs to be extended so peers can tell each other which full blocks they are storing, and so a peer that was disconnected from the network for a while and needs to catch find out which of its peers has the block data it needs.

After that, initial block chain download can be further optimized to ask peers directly for the UTXO set instead of reconstructing it by asking them for the entire history of the blockchain. The risk would be that they lie about what is spent and unspent, to try to get you to accept invalid transactions or create invalid blocks if you are mining. The best solution for that problem is to embed a “UTXO commitment” (a hash of all of the data in the UTXO set) into blocks, and adding a new consensus rule that any such commitment must be valid for the block to be valid.

But consensus takes time; a proposal from Mark Friedenbach for how to embed such a commitment in blocks hasn’t reached consensus, and neither have discussions about exactly how the UTXO set should be represented and hashed.

Increasing transaction volume

I expect the initial block download problem to be mostly solved in the next relase or three of Bitcoin Core. The next scaling problem that needs to be tackled is the hardcoded 1-megabyte block size limit that means the network can suppor only approximately 7-transactions-per-second.

Any change to the core consensus code means risk, so  why risk it? Why not just keep Bitcoin Core the way it is, and live with seven transactions per second? “If it ain’t broke, don’t fix it.”

Back in 2010, after Bitcoin was mentioned on Slashdot for the first time and bitcoin prices started rising, Satoshi rolled out several quick-fix solutions to various denial-of-service attacks. One of those fixes was to drop the maximum block size from infinite to one megabyte (the practical limit before the change was 32 megabytes– the maximum size of a message in the p2p protocol). The  intent has always been to raise that limit when transaction volume justified larger blocks.

“Argument from Authority” is a logical fallacy, so “Because Satoshi Said So” isn’t a valid reason. However, staying true to the original vision of Bitcoin is very important. That vision is what inspires people to invest their time, energy, and wealth in this new, risky technology.

I think the maximum block size must be increased for the same reason the limit of 21 million coins must NEVER be increased: because people were told that the system would scale up to handle lots of transactions, just as they were told that there will only ever be 21 million bitcoins.

We aren’t at a crisis point yet; the number of transactions per day has been flat for the last year (except for a spike during the price bubble around the beginning of the year). It is possible there are an increasing number of “off-blockchain” transactions happening, but I don’t think that is what is going on, because USD to BTC exchange volume shows the same pattern of transaction volume over the last year. The general pattern for both price and transaction volume has been periods of relative stability, followed by bubbles of interest that drive both price and transaction volume rapidly up. Then a crash down to a new level, lower than the peak but higher than the previous stable level.

My best guess is that we’ll run into the 1 megabyte block size limit during the next price bubble, and that is one of the reasons I’ve been spending time working on implementing floating transaction fees for Bitcoin Core. Most users would rather pay a few cents more in transaction fees rather than waiting hours or days (or never!) for their transactions to confirm because the network is running into the hard-coded blocksize limit.

Bigger Block Road Map

Matt Corallo has already implemented the first step to supporting larger blocks– faster relaying, to minimize the risk that a bigger block takes longer to propagate across the network than a smaller block. See the blog post I wrote in August for details.

There is already consensus that something needs to change to support more than seven transactions per second. Agreeing on exactly  how to accomplish that goal is where people start to disagree– there are lots of possible solutions. Here is my current favorite:

Roll out a hard fork that increases the maximum block size, and implements a rule to increase that size over time, very similar to the rule that decreases the block reward over time.

Choose the initial maximum size so that a “Bitcoin hobbyist” can easily participate as a full node on the network. By “Bitcoin hobbyist” I mean somebody with a current, reasonably fast computer and Internet connection, running an up-to-date version of Bitcoin Core and willing to dedicate half their CPU power and bandwidth to Bitcoin.

And choose the increase to match the rate of growth of bandwidth over time: 50% per year for the last twenty years. Note that this is less than the approximately 60% per year growth in CPU power; bandwidth will be the limiting factor for transaction volume for the foreseeable future.

I believe this is the “simplest thing that could possibly work.” It is simple to implement correctly and is very close to the rules operating on the network today. Imposing a maximum size that is in the reach of any ordinary person with a pretty good computer and an average broadband internet connection eliminates barriers to entry that might result in centralization of the network.

Once the network allows larger-than-1-megabyte blocks, further network optimizations will be necessary. This is where Invertible Bloom Lookup Tables or (perhaps) other data synchronization algorithms will shine.

The Future Looks Bright

So some future Bitcoin enthusiast or professional sysadmin would download and run software that did the following to get up and running quickly:

  1. Connect to peers, just as is done today.
  2. Download headers for the best chain from its peers (tens of megabytes; will take at most a few minutes)
  3. Download enough full blocks to handle and reasonable blockchain re-organization (a few hundred should be plenty, which will take perhaps an hour).
  4. Ask a peer for the UTXO set, and check it against the commitment made in the blockchain.

From this point on, it is a fully-validating node. If disk space is scarce, it can delete old blocks from disk.

How far does this lead?

There is a clear path to scaling up the network to handle several thousand transactions per second (“Visa scale”). Getting there won’t be trivial, because writing solid, secure code takes time and because getting consensus is hard. Fortunately technological progress marches on, and Nielsen’s Law of Internet Bandwidth and Moore’s Law make scaling up easier as time passes.

The map gets fuzzy if we start thinking about how to scale faster than the 50%-per-increase-in-bandwidth-per-year of Nielsen’s Law. Some complicated scheme to avoid broadcasting every transaction to every node is probably possible to implement and make secure enough.

But 50% per year growth is really good. According to my rough back-of-the-envelope calculations, my above-average home Internet connection and above-average home computer could easily support 5,000 transactions per second today.

That works out to 400 million transactions per day. Pretty good; every person in the US could make one Bitcoin transaction per day and I’d still be able to keep up.

After 12 years of bandwidth growth that becomes 56 billion transactions per day on my home network connection — enough for every single person in the world to make five or six bitcoin transactions every single day. It is hard to imagine that not being enough; according the the Boston Federal Reserve, the average US consumer makes just over two payments per day.

So even if everybody in the world switched entirely from cash to Bitcoin in twenty years, broadcasting every transaction to every fully-validating node won’t be a problem.