Inside the blockchain developer’s mind: The vertical scaling crisis

Published at: Sept. 24, 2020

This is Part 2 of a three-part series in which Andrew Levine outlines the issues facing legacy blockchains and posits solutions to these problems. Read Part 1 on the upgradeability crisis here and Part 3 on the governance crisis here.

The advent of the internet has revealed that we have a digital self that can amplify our real-world power thanks to the ability to interact with people anywhere on Earth and coordinate actions that our physical selves never could.

But our digital selves are shackled — imprisoned on private computers belonging to Facebook, Google, Amazon, Netflix, Twitter, and the list goes on. These private monopolies don’t actually produce technology; rather, their product is us — our digital selves — and their entire purpose is to extract as much value from us as they possibly can.

Many people recognize the potential for blockchain technology to disrupt these private monopolies and oligopolies, but unfortunately, no specific blockchain has been able to reach beyond the walls of the existing blockchain and cryptocurrency community.

And if it did, it would not be technically capable of supporting the kind of growth and adoption needed to empower every person on Earth to take control of their digital selves. Why is that? Is it just a matter of picking the right features? Switching to proof-of-stake? Sharding?

Unfortunately, the problem is much bigger than one or two missing features and will not be resolved by the planned changes to existing protocols because the problems lie at the very foundation of how they are constructed. The very architecture limits the potential for these platforms to scale vertically.

What is vertical scaling?

Vertical scaling is how you manage the growth of a single node (computer) in a network. Blockchains are databases that never discard information. Information is only added to the database, never removed. This makes growth an even bigger problem. Not only that, but most blockchains are not designed to make efficient use of the various parts of a computer. This adds up to a big database, consuming a lot of computational resources on a given machine in an inefficient manner.

In order to compensate for these shortcomings, node operators rely on expensive enterprise-grade hardware — specifically, random-access memory, or RAM, and non-volatile memory express, or NVMe, which is what pushes network participation (node operation) beyond the grasp of ordinary people. And somehow, we’re supposed to believe that is not bad for decentralization!

But sharding!

Ironically, one of the strongest arguments for the existence of a vertical scaling crisis is the level of demand for horizontal scaling solutions.

As of this writing, an Ethereum full node still does not exceed 500 GB. That’s nothing! And yet, it is also absolutely true that a complicated, risky mechanism needs to be added to Ethereum so that its blockchain can be broken up into bits and pieces, and that precious computational resources need to be spent on simply enabling these “shards” to communicate with one another, let alone perform meaningful computations.

The problem is that horizontal scaling — sharding — is not a substitute for vertical scaling. Imagine you have a factory producing 1,000 cars per year, but there is sufficient demand for 2,000 cars. What do you do first: build a new factory or try to make more cars out of the factory you already have? Vertical scaling is optimizing the factory to produce more cars before simply building a new factory. Blockchain nodes are the “factory,” and what determines their output is how efficiently they use the components in a computer.

Speaking from direct experience, blockchains are horribly unoptimized with respect to node resource management, which makes them the perfect candidate for vertical scaling solutions.

In blockchain, there are essentially two lineages: Ethereum and BitShares. Many people might not be familiar with BitShares, but its architectural design underpins some of the most performant blockchains in the space, including EOS, Hive and Steem. While Ethereum, and the many chains that are modeled on it, remains the most highly valued general-purpose blockchain with the most decentralized applications and unique users, the BitShares line absolutely dominates in terms of raw transaction activity, making it the performance king.

My team, arguably, has more experience in the BitShares line than any other team on Earth, so we will focus on that design. Because blockchains in the BitShares line are capable of performing so many more transactions per second, this actually increases the importance of vertical scaling — because their blockchain state is growing so much faster.

Vertical scaling, RAM and forks

Vertical scaling, in the computing context, is essentially all about using the cheapest form of memory (disk) whenever possible and to the greatest extent possible. In the case of blockchains, the two most relevant processes are fork resolution and storing state. There are all of these different versions of the database out there (“forks”), and the nodes have to come to a consensus on which one is the “right” one. That’s fork resolution.

Now, you have an irreversible database that needs to be stored. Ideally, you want that stored on the cheapest possible medium (disk) as opposed to the most expensive (RAM).

Because you want forks to be resolved as fast as possible, you want these computations to be done in RAM (fast memory). But once the forks have been resolved and new transactions have been added to the irreversible state, you want to store this database in disk. The problem with blockchains from the BitShares line is that they achieve their performance through a design that never actually reflects the current state of the blockchain. Instead, when each block is applied, the pending transaction state is “undone,” the old values are written back to the database, and then the block is applied.

One problem with this approach is that most of the time, this means performing the exact same calculations again and writing the same state back to the database that was just there, which is extremely inefficient.

Reading and writing: The arithmetic

Even more relevant to the issue of vertical scaling is that this design means the irreversible state as a whole cannot be stored on disk without having to “pop” blocks back out of disk and into RAM to resolve forks. Not only does this increase the RAM load on a given node, but it also has very serious consequences with respect to leveraging RocksDB.

RocksDB is a database technology developed by Facebook to power its news feed. In short, it enables us to get the performance of RAM but from disk. Many blockchain projects are using RocksDB in various ways, but the problem with the database design we have outlined is that the constant need to undo pending transactions and rewrite to the database negates the benefits of RocksDB.

Facebook’s news feed is all about database reads. Consider how many posts you scroll through before you engage with a single one. For that reason, RocksDB is designed to work best when there are far more reads to the database than writes. The database design outlined above leads to so many database writes that it negates the benefits of even using RocksDB.

In order to take full advantage of RocksDB, we need to rebuild the blockchain from the ground up to efficiently ferry blocks from RAM to disk while minimizing the number of writes so as to benefit from RocksDB. We can accomplish this by eliminating the need to undo/rewrite and creating a single database that tracks irreversible state and never needs to be undone.

This would enable us to minimize RAM use in nodes by efficiently ferrying irreversible blocks out of RAM and into disk without having to bring them back. We estimate that this could reduce the cost of running a node by as much as 75%! Not only would this make node operation more accessible, increasing the number of nodes in operation, but these cost savings would ultimately be passed along to users and developers.

Limiting blockchains or limitless blockchains?

Existing blockchains are reaching the performance limits of what they can get out of a single node as a consequence of how they resolve forks and how they store their blockchain state. In this article, we have explained how database design can lead to a fork resolution process that increases RAM use as well as database writes that negate the benefits that could accrue from the use of RocksDB, ultimately leading to less efficient blockchain nodes.

The truth is that there is a lot more to vertical scaling than this single problem. Blockchain ecosystems are complex, with many components that feedback into one another. Decreasing the cost of running an individual node is critical for increasing the number nodes in operation and reducing the costs of using the network, but there are also tremendous gains to be had by minimizing network congestion, incentivizing efficient node operation and more.

Our goal is not to explain in detail how one can solve the vertical scaling problem but to give some insight into the nature of what we think is a dramatically underappreciated problem in the blockchain space. Horizontal scalability is absolutely a very important area of interest, but if we ignore the problem of vertical scalability, all we will accomplish by horizontally scaling is dramatically increasing the number of horribly inefficient nodes.

The views, thoughts and opinions expressed here are the author’s alone and do not necessarily reflect or represent the views and opinions of Cointelegraph.

Andrew Levine is the CEO of Koinos Group, where he and the former development team behind the Steem blockchain build blockchain-based solutions that empower people to take ownership and control over their digital selves. Their foundational product is Koinos, a high-performance blockchain built on an entirely new framework architected to give developers the features they need in order to deliver the user experiences necessary to spread blockchain adoption to the masses.