Like Bitcoin, the main reason for the Ethereum scalability problem is the network protocol that each node in the network has to process each transaction. Ethereum 1.x implements a slightly modified version of the proof-of-work (PoW) consensus mechanism. In Ethereum, miners have to race to find the nonce to meet the target difficulty. Every node needs to verify that the miners’ work is valid and keep an accurate copy of the current network state. This greatly limits the transaction process capability and throughput of the Ethereum blockchain network. Currently, it can only process 12-15 transactions per second.
Blockchain scalability trilemma
First used by Vitalik Buterin, the scalability trilemma is a concept in blockchain regarding its capability to address scalability, decentralization, and security, without compromising any of them. The trilemma claims that it is almost impossible to achieve all three properties in a blockchain system:
Decentralization: This is a core tenet upon which Bitcoin and blockchain were created. Decentralization enables censorship-resistance and permits anyone to participate in a decentralized ecosystem without a central authority or intermediary.
Security: This refers to the integrity and immutability of the public ledger, and the ability to resist 51% or DDoS like network attacks.
Scalability: This concerns the ability to handle a growing amount of transactions in the blockchain network. In order for the Ethereum blockchain to be the world computer as the inventor envisioned, it needs to match the transaction throughput of many centralized systems, like Amazon, Visa, or Mastercard.
The following diagram is an illustration of the scalability trilemma in the blockchain:
The key challenge of scalability is finding a way to achieve all three at the base layer. The design choices of Bitcoin and Ethereum favor decentralization and security, while making a sacrifice in scalability.
Ethereum scaling solutions
The Ethereum scalability solution is one of most active topics in the Ethereum community. The following are a few areas of concern the community is trying to tackle:
Transaction processing and block creation time with PoW—how fast can the miners process all transactions and create a new block through mining?
Transaction finality – how soon can the decentralized network reach a consensus that a transaction has happened and can’t be reverted? Currently, it takes about six blocks with Bitcoin and 3-4 minutes with Ethereum for the network to consider a block is finalized in the main chain. Interested readers should check out Vitalik’s block for transaction settlement and block finality probability.
Solutions being implemented or proposed, fall into three categories: on-chain solution, offchain solution, and consensus mechanism protocols. There are some obvious or theoretical ones, like increasing block size or slicing one blockchain into many independent altcoin chains. Due to the nature of peer to peer, a traditional horizontal scaling approach may not work. Specific to the Ethereum network, some consideration was also given to stateful or stateless smart contracts contributing to scalability issues. We will go over high-level concepts of all those solutions, and then delve deeper into some of the promising ones.
This is similar to the vertical scaling approach. Some of the altcoins, like Bitcoin Cash, Ethereum Core, and so on, are implementing a larger block size to gain overall transaction performance. The theory behind this approach is that since PoW mining is the main bottleneck in the entire process, by increasing the block size we can have more transactions processed per mining. It may take a little bit longer to create a directed acyclic graph (DAG) for stash-based mining, but the average time to complete the mining may not get any worse, since most of the Ethereum clients cache the DAG anyway.
The following diagram illustrates how this technique works:
However, like vertical scaling, in general, this solution demands that network nodes have better computing capacity in order to process large-sized blocks. This may lead to a scenario where a network is concentrated into a few rich hands and, thus, may ultimately compromise decentralization and security, the main tenets of the blockchain.
Another solution is not to have one gigantic blockchain, but to have many smaller blockchains and altcoins. This may eventually be the case, since many vertical industries are creating or plan to create industry-specific chains. This will reduce user activity on each individual blockchain and, thus, should allow for a more scalable ecosystem.
The following diagram illustrates how this technique works:
However, there are a few issues with this option. One is security concerns. It is a common belief that the network is more secure if more network nodes participate in the transaction processing in the blockchain. With a wider distribution of altcoin chains, fewer nodes will operate on any given blockchain. This may make the blockchain less secure, since a smaller altcoin network may be more vulnerable to network attacks. Let us say, we have about 10,000 nodes on the larger network, it will require at least 5,001 nodes (or called 51%) to be compromised to launch an attack on the network. If we slice 10,000 nodes into 50 smaller chains, each chain comprises 200 nodes, and it only requires 101 nodes to take down any smaller chain, which is what we call a 1% attack problem. Another issue is cross-chain integration. Although there are some solutions for handling cross-blockchain integration, the overall complexity of integrating smaller chains and altcoins will increase drastically.
On-chain solutions, sometimes also called layer 1 solutions, are to look for solutions to address scalability and performance issues at the base layer of the Ethereum blockchain network. One such solution is sharding. Sharding is not a new concept as traditional RDBMS and new big data platforms have been using sharding as a way to improve scalability and performance for many years.
With the Ethereum network, the purpose of sharding is to group the network nodes, the blockchain, and global states into different shards, and each shard will reach a consensus on the shard-wide transaction state among those nodes within the group. At the conceptual level, this may not be much different from Plasma, the layer 2 side-chain approach, but the technical difficulty, implications, and network efforts are quite different.
Another layer 1 or on-chain solution is the shift to a proof-of-stake (PoS) consensus mechanism, which is one of the most active research areas addressing scalability and performance issues in Ethereum. There are many debates in terms of the advantages and disadvantages of a PoW-based consensus mechanism. It is quite effective in securing the blockchain in the decentralized network, but it is also a major bottleneck in the blockchain performance.
To put it simply, proof-of-stake is one of the most popular consensus algorithms on blockchain networks. As opposed to PoW consensus, where miners are rewarded for solving cryptographic puzzles, in the PoS consensus algorithm, a pool of selected validators take turns proposing new blocks. The validator is chosen in a deterministic way, depending on its wealth, also defined as a stake. Anyone who deposits their coins as a stake can become a validator. The chance to participate may be proportional to the stakes they put in. Let’s say, Alice, Bob, Catherine, and David put in 40 Ether, 30 Ether, 20 Ether, and 10 Ether stakes to participate respectively; they will get a 40%, 30%, 20%, and 10% chance of being selected as the block creator.
The following is how it works in the PoS consensus mechanism. As shown in the following diagram, the blockchain keeps track of a set of validators, sometimes also called block creators or forgers. At any time, whenever new blocks need to be created, the blockchain randomly selects a validator. The selected validator verifies the transactions and proposes new blocks for all validators to agree on. New blocks are then voted on by all current validators. Voting power is based on the stake the validator puts in. Whoever proposes invalid transactions or blocks or votes maliciously, which means they intentionally compromise the integrity of the chain, may lose their stakes. For the creation of the block itself, the node does not receive a reward. Remuneration is paid for the transaction. Upon the new blocks being accepted, the block creator can collect the transaction fee as the reward for the work of creating new blocks. There are two basic possible node selection options:
Randomly from the “richest” nodes;
Randomly from the oldest nodes.
PoS is considered more energy-efficient and environment-friendly compared with the PoW mechanism. It is also perceived as more secure too. It essentially reduces the threat of a 51% attack since malicious validators would need to accumulate more than 50% of the total stakes in order to take over the blockchain network. Such an algorithm is designed to discourage attackers from validating fake transactions because of the risk of losing a “collateral”.
Similar to PoW, total decentralization may not be fully possible in the PoS-based public blockchain. This is because a few wealthy nodes can monopolize the stakes in the network. Those who put in more stakes can effectively control most of the voting and has more chances to generate a new block. Both algorithms are subject to the social and economic issue that it makes the rich richer.
Similar to the rationales for an on-chain solution, the Ethereum community is also actively looking for off-chain solutions, sometimes called layer 2 solutions. One is a side-chain solution with Plasma. Instead of putting all transactions in the main chain, Plasma allows anyone to create side chains and bond side chains into the global blockchain. This is similar to the lighting network solution in Bitcoin.
Another one is a state channel solution with Raiden, similar to payment channels in Bitcoin. The hypothesis behind this approach is that many interparty transactions only need to be validated by the parties involved, and there is no need to have all transactions to be validated by the entire network.
One intuitive solution to improve scalability and throughput is to create many small chains. This may sound like a plausible solution, since it may suit business and social needs. Take ourselves for example, as customers or citizens, we buy fruit and vegetables from our local grocery, which might leverage one blockchain to ensure traceability and food safety through the entire supply chain of fresh produce.
At the end of your shopping, you may pay the grocery directly through a P2P payment blockchain. When you apply your mortgage or business loan, you might be able to get your mortgage and loan approved through the mortgage blockchain, and so on. We are more likely to meet all these vertical chains or private chains before we see a gigantic global chain.
However, it creates cross-chain integration and security enforcement issues. This is what Plasma tries to address. It was first proposed in August 2017 by Joseph Poon and Vitalik Buterin. The design idea is to offload transactions to many faster and less crowded side chains, also called Plasma chains. Similar to the state channel approach, a Plasma chain will periodically commit its transactions to the Ethereum root chain.
Security and integrity will be enforced through the root chain. If any suspicion of fraud is detected in the plasma chains, the transactions will be rolled back and Plasma users can exit the plasma chain and move out to the root chain.
The following diagram shows what a Plasma network may look like:
Each plasma chain is a blockchain on its own. They are bonded with an Ethereum root chain through a smart contract. The smart contract essentially connects an entire child chain to the root chain, acting as a bridge. Anyone can create a plasma chain, and write a smart contract binding the plasma chain to the root chain.
As the following diagram shows, at each period, the block headers of each block of the plasma chains are submitted to the root chain and recorded in the blocks of the root chain.
Transactions in the plasma chains will stay at each plasma chain. The Merkle proof in the block headers will then be used to verify data on the child chain. This allows for tens and thousands of transactions to be processed in many plasma chains in parallel, and also leaves minimal and enough Merkle header information on the root chain to enforce security:
The root chain will play an arbitrator role, somewhat similar to the federal court system in the United States, where the root chain is the supreme court and the plasma chains are the circuit courts, or the district courts. In the federal court system, once the federal district court has decided a case, the case can be appealed to the circuit court or supreme court for an arbitration.
When fraud occurs in a plasma chain, whether it is a double-spend across the chains or you cash out more than you have in all accounts, anyone can provide a fraud-proof to prove the transaction is invalid. If proven to the fraud transactions, the transaction will be rolled back.
Plasma users can exit the child plasma chain and transfer ethers back to the main chain. The original proposals introduce a single validator concept, as the operator for the plasma blockchain, to validate and add transactions to the blocks, and manage the state of the child blockchain.
The idea behind this approach is that security and integrity of the blockchain at the global level is enforced by the root chain, using either PoW or, most likely, a hybrid PoW and PoS consensus protocol. In the case where the validator of the plasma chain may hold the fund and commit fraudulent activities, anyone can provide a fraud-proof against the validator to the root chain.
Once proven to be fraudulent from the validator, the root chain will allow all accounts at the impacted plasma chain to move out to the root chain. This is called a mass exit scenario. In this case, individual accounts will be migrated to the root chain one by one, the invalid transaction will be rolled back, and the validator of the plasma chain will be penalized with the stake it puts in the smart contract. Depending on how many accounts need to be migrated, it may take a while to complete the mass exit.
Although it has been one of the most interesting and active topics in the Ethereum research community, there is no public release of a plasma implementation yet. Instead, a scaled-down version of the original proposal, also called a minimal viable plasma, or MVP, was proposed for a simple implementation, which includes a simplified security model and basic operations for exiting plasma chains.
One very interesting aspect of an MVP is the reintroduction of a UTXO model. One key difference in Ethereum is to move away from Bitcoin’s UTXO model to a more defined account model, where account balance is the state object maintained at the world state.
The Ethereum account model makes transaction verifications and money transfer simple, with the sacrifice of parallelism. This may not be a significant drawback, since all transactions need to be verified by all nodes. But with Plasma, as the root chain moves away from transaction processing to security enforcement and arbitration, it becomes important to be able to verify invalid transactions in parallel.
A tree structure of blockchains, hence the tree of UTXOs from all child chains, makes it easy to apply distributed parallel algorithms to verify fraud proofs and enforce security across all plasma chains.
The following diagram shows what the potential Plasma may be able to bring into the Ethereum blockchain network when a tree of Ethereum plasma child chains are bonded with the parent plasma chain, and are ultimately connected to, and secured through, the Ethereum root chain:
Massive scalability will be achieved through offloading expensive computations to the child chains and allow the root chain to provide the shared security and arbitration services to the blockchain at a global level. There are a few similar cross-chain interoperability solutions, like Cosmos network. Claimed to be the internet of blockchains, Cosmos network provides a hub-spoke integration architecture. Independent blockchains, as the zones or spokes, are attached to the main blockchain as the hub. Its purpose is to facilitate blockchain integration through the IBC (inter-blockchain communications) protocol.