Thaddeus Dryja

Thaddeus Dryja

Utreexo: a dynamic accumulator for bitcoin state 

A description of research by Thaddeus Dryja

One of the earliest-seen and most persistent problems with Bitcoin has been scalability. Bitcoin takes the idea of "be your own bank" quite literally, with every computer on the bitcoin network storing every account of every user who owns money in the system. In Bitcoin, this is stored as a collection of "Unspent transaction outputs," or "utxos", which are somewhat unintuitive, but provide privacy and efficiency benefits over the alternative "account" based model used in traditional finance.

It's important to distinguish between the transaction history and the current state of the system. The transaction history in Bitcoin is currently 200GB, and contains every transaction since Bitcoin was launched in early 2009. The size of this history can of course only increase with time. The current system state, however, is much smaller, at under 4GB, and deals with only who owns what right now. This state size has generally increased over time, but has in fact decreased a bit this year.

The history, despite its much larger size, is not in fact the scalability concern, as it is not used in any time-critical fashion; one can discard the history after processing with no loss of security. The increasing state size, however, is a concern—one which utreexo solves.

Utreexo is a novel hash based dynamic accumulator, which allows the millions of unspent outputs to be represented in under a kilobyte—small enough to be written on a sheet of paper. There is no trusted setup or loss of security; instead, the burden of keeping track of funds is shifted to the owner of those funds.

Current transactions specify inputs and outputs, and verifying an input requires you to know the whole state of the system. With Utreexo, the holder of funds maintains a proof that the funds exist, and provides that proof at spending time to the other nodes. These proofs are compact (under 1KB) but do represent the main downside in the utreexo model; they present an additional data transmission overhead, which allows much smaller state.

Utreexo pushes the costs of maintaining the network to the right place: an exchange creating millions of transactions may need to maintain millions of proofs, while a personal account with only a few unspent outputs will only need to maintain a few kilobytes of proof data. Utreexo also provides a long-term scalability solution, as the accumulator size grows very slowly with increasing size of the underlying set (the accumulator size is logarithmic with the set size).

Utreexo demo release 0.2 by Tadge Dryja

Posted on Medium. 02/01/2021

At the MIT DCI, we have several computers plugged in to the wall of our currently unpopulated office. One of them runs a bitcoin core node, and serves the blockchain to other newly arriving bitcoin nodes. This machine typically uploads about 1 TB of data a month.

In January, however, vnstat shows a total monthly upload of 5.73 TiB, several times the average for the last year. I like to think of this machine as welcoming new participants to the network, many of whom are likely arriving due to some other bitcoin-related number increasing several-fold in recent weeks.

But I also know that for new users of Bitcoin, despite the terabytes of welcome streaming from this machine’s RTL8111 chip to the far reaches of the globe, it’s not a great on-boarding experience. Laptop computers can take days to sync up with the Bitcoin network, even with fast internet connections. Once they’re done, even with pruning enabled, staying on the network takes several gigabytes of storage. It’s no wonder that many who try to run a Bitcoin node give up, and instead use SPV wallets with weaker security and privacy, or worse, give up on owning UTXOs entirely, and leave their coins on an exchange.

The goal of Utreexo is to make running a full node easier, faster, and smaller, and while that’s more of an asymptote than a point on any curve, we’re getting there. Today we’ve released Utreexo demonstration 0.2, which pairs the Utreexo accumulator with a modified version of btcd(temporarily called utcd). Most of the utcd work was done by Calvin Kim, as Niklas Gögge and myself have been working on improving the accumulator and how it interacts with the bitcoin data structures. Calvin has written a post about the work as well.

This new release works more like a normal bitcoin node: it starts up, finds peers, and verifies the blockchain. There are still things it doesn’t have, like a mempool, or a way to deal with reorgs. (It currently deals with reorgs by crashing.)

It’s also not as fast as it could be, and not as small as the Utreexo design allows for, but that’s exciting: it runs at an OK speed now, and there are still many avenues for performance improvements that we’ve identified. We’re working on improving the speed of the software, and benchmarking it on different hardware, and hope to have a post about that soon.

Several MIT undergraduate researchers are joining this semester, and as the project expands, there are lots of interesting avenues for improving performance and reliability. We’d love to help more people learn about this system and try to make Bitcoin stronger while working on some interesting problems. If you’re interested, there are pre-compiled binaries to try out, the code is all open source, and we have discussions on freenode IRC, channel #utreexo. We try to be as welcoming as the server in the office, though we’re not at the terabytes per month level yet :)