Byzantine Fault Tolerance

On Jun 27, 2020

In 2008 when Bitcoin came into existence various other cryptocurrencies also entered the cryptocurrency market. All of these cryptocurrencies have Blockchain as the core element of their architecture. Blockchains are decentralized systems that work as distributed ledgers, maintained by a distributed network of computer nodes.

These nodes have the same and single versions of the transaction history. When a new transaction is broadcasted into the network, these computer nodes have the power to include it into their copy or just ignore it. If the majority of nodes agree on one decision, then consensus is reached.

What Is Fragment Marketplace – How To Earn Ton…

Apr 28, 2025

10 Best Crypto Investment Books for Aspiring…

Apr 27, 2025

Best 10 Blockchain Courses to Take in 2025

Apr 26, 2025

The question that arises here is that, what if a few nodes disagree with the decision? What if they are likely to fail or act dishonestly?

These questions arise from the Byzantine General’s Problem which created the Byzantine Fault Tolerance. Let understand these concepts in detail.

Byzantine General’s Problem

Lamport, Shostak, and Pease presented a paper titled “The Byzantine Generals Problem” which illustrated the severeness of the problem.

Long ago, the Byzantine Empire wanted to attack and take over a city. There are many possibilities of this story, in this article let us consider a scenario of four generals in the empire. These four generals surround the city and each general can either attack or retreat.

If all generals decide to attack, they do so and win the city. If all generals decide to retreat they still win as no one dies. These are the two outcomes of this scenario.

But, here is the twist. Only if all the generals agree to a mutual agreement, they can win. What if one of the generals does not agree with the other three? What if he is a traitor?

In such a case, the enemy guarding the city will destroy the army.

There is one commanding general in this army, called the Commander, whereas the other 3 are Lieutenants. Among these four generals, there could be a traitor who does not follow the orders and communicates incorrect information which could destroy the entire mission. Also, none of the generals knows who is the traitor.

Pictorial form of the scenario is here:

The generals can only deliver oral messages to each other through a messenger. If the majority of them act maliciously, then the whole army will collapse.

Let’s now consider a scenario where one of the lieutenants is a traitor.

Scenario 1: If One of the Lieutenants is a Traitor

There are chances that the commander may also be the traitor. Let’s consider this scenario later.

Suppose the commander is loyal and issues an order to “attack”. As no one knows who is the traitor the loyal lieutenants do not know whom to trust. In such a case, all the lieutenants agreed to an algorithm which depicts that each general will evaluate all the messages that he receives. Later, depending on the majority of the actions received he will decide his action.

The commander orders to attack. As implied, the message relays to all the lieutenants. Let’s consider lieutenant 1 to be the traitor. After he receives the “Attack” command from the commander, he says he received “Retreat” and does the same. Lieutenant 2 is loyal and says he was told to “Attack”. Lieutenant 3 also says that he received an “Attack” command from the commander and does the same.

Commander is loyal so he will attack. Lieutenant 1 is a traitor, so he may or may not attack. If he retreats, the loyal lieutenants have received two “Attack” and one “Retreat” order. The loyal lieutenants agree to “Attack”. 3 out of 4 lieutenants agree with the consensus. Hence the consensus is achieved.

Scenario 2: If the Commander is a Traitor

When the commander is a traitor he can mess up things by giving different orders to different lieutenants. These are called Byzantine faults. By receiving wrong commands, the lieutenants may not come to a consensus.

If we consider the previous example of 1 Commander and 3 Lieutenants, then Commander is a traitor he will mess up things by commanding Lieutenant 1 & 3 to “attack” and Lieutenant 2 to “retreat”.

As all the lieutenants are loyal, each of them received 2 attacks and 1 retreat order. Hence, all of them attack and agree to the consensus even if the commander is a traitor.

Suppose there are 2 loyal generals and 2 traitors, there is no chance that consensus is reached. Therefore, according to the research paper, the algorithm holds good only when the number of traitors does not exceed more than 33% that is 1/3rd of the generals.

Let us now apply this context to blockchains where each general is a network node and the city is the blockchain network. They have to reach a consensus on the current state of the system. In the case of a distributed system, the only way to reach consensus is by having 2/3rd of loyal and honest network nodes. If not the system may experience failures and attacks.

When a system can tolerate the byzantine faults and still come to a consensus, then it is said to be Byzantine Fault Tolerant.

Let’s understand Byzantine Fault Tolerance much more deeply.

What is Byzantine Fault Tolerance (BFT)?

Byzantine Fault Tolerance is a characteristic of a distributed system that tolerates all the byzantine faults and agrees to the consensus. Their aim is to diminish the effect of malicious nodes in the honest nodes and help the system reach the consensus. BFT is derived from the Byzantine General’s Problem.

Byzantine Fault Tolerance is applicable in airplane engine systems, nuclear power plants, and almost any system with actions that depend on many sensors. Blockchain-based systems also use Byzantine Fault Tolerance to establish trust between its nodes.

While we have an overview of the BFT concept let’s understand how consensus fits into the context of blockchain technology.

Consensus on Blockchain

Blockchain technology is known for its security, transparency, and privacy. There is no central authority who controls the transactions in the blockchain, even then it is considered to be the most secure and verified platform. It is because of the consensus protocol present as the core part of the Blockchain network.

A consensus algorithm is the one through which all the nodes of the blockchain network agree to a common decision which helps to achieve consensus. These algorithms build trust between the unknown nodes of the blockchain network. The objectives of consensus protocol are coming to an agreement, collaboration, co-operation, equal rights to every node, and mandatory participation of each node in the consensus process.

All nodes agree to a common agreement because of the Consensus Algorithm. There are various types of Consensus Algorithms:

Proof of Work (PoW)

Proof of Work is the original consensus algorithm in the Blockchain network which confirms transactions and builds new blocks to the chain. The Proof of Work (PoW) algorithm needs computational power to solve digital puzzles to find the required hash. Once the required hash is found it is added, a new block is added to the blockchain. Bitcoin is the famous application of Proof of Work. Bitcoin achieves consensus using the Proof of Work consensus algorithm. Let’s understand how it is possible.

Bitcoin Achieves Consensus

At the initial stage of adding a block to the chain, a node should undergo an expensive computation to prove that it has a valid hash value for the block. Rest other nodes who receive this block can verify that the block is a valid one.

All the nodes who perform the validation process work parallelly, but the one who is faster and has the most computing power wins the race. Hence the winner is rewarded with Bitcoins.

In this scenario, the nodes performing the task are called miners, whereas the process is called mining. Therefore, the consensus in Bitcoin is achieved through the process of mining.

Proof of Stake (PoS)

The cryptocurrency, Ethereum is soon planning to switch to Proof of Stake from Proof of Work. The two platforms have similar mechanisms, where they offer cryptocurrency rewards to the nodes that verify and store transactions. The difference is that Proof of Stake depends on the number of cryptocurrencies that the user holds. Also, the mining process replaces miners with validators.

Ethereum Achieves Consensus

Ethereum uses the Casper protocol of the PoS algorithm to achieve consensus. The validators hold-up a portion of their ethers as stake. Later, they start validating the blocks, and once they find a block that fits into the chain, they validate it by placing a bet on it.

If the block adjoins with the chain, then validators will be rewarded with the amount associated with the bet. Hence, Ethereum achieves consensus. If the validator acts maliciously, then his stake will be slashed off immediately.

Delegated Proof of Stake (DPoS)

In the Delegated Proof of Stake algorithm, the token holders will vote for the block producers. The group of producers who achieve maximum votes can organize the generation of blocks. The cryptocurrency, EOS uses this algorithm to increase to millions of transactions per second.

EOS achieves Consensus

The token holders select the block producers through the voting process. Anyone can be a contender of the block producer election and if they win, they can produce blocks. 21 block producers are chosen in this process. Among them, 20 are automatically chosen, while the 21st one is chosen proportional to the number of their votes relative to the other producers.

To balance connectivity among all other producers, they are shuffled around using a pseudorandom number which is derived from the block time. The block time if of 3 seconds and if the producer does not participate within that time, they will be removed from the consideration. To be in consideration, the producer has to produce at least one block every 24 hours.

Usually, the DPoS system has 100% participation of the block producers. The transaction is confirmed within 1.5 seconds from the time of relay with 99.9% of certainty. To achieve absolute certainty the nodes need to wait just for 15/21 block producers to agree to the consensus.

Proof of Elapsed Time (PoET)

The Proof of Elapsed Time algorithm is an alternative to the Proof of Work algorithm. Permissioned blockchain networks use the PoET algorithm. This algorithm follows a strategy where each network participant is given a random timer object and the first-timer to expire will be the leader of the new block.

To check the loyalty of each participant it is necessary to understand whether the winner chose his own wait time and did he actually finish the specified amount of waiting time. Hence, to ensure this loyalty, PoET uses Intel’s Software Guard Extensions, which allows applications to run a trusted code on a protected environment.

To achieve consensus, a node downloads the code and generates a key for the code. Then, the node forwards this key to the participating network, where the existing nodes will verify the key. This new node will be given a random timer object with the random value associated with it. The randomness of the timer object is guaranteed by the code protection offered by SGX.

The first one to finish the timer will be the winner who can create a new block and attach it to the current blockchain to be rewarded. The nodes are again initialized.

Proof of Authority (PoA)

The Proof of Authority algorithm initiates practical and efficient solutions for blockchain networks. In this algorithm, the validators are staking their reputation instead of their digital assets. The algorithm relies on a group of “Authorities” where only the pre-approved validators can produce new blocks.

The users who wish to be a part of authorities need to disclose their identities. They can do so, by registering in the public notary database with the same identity as they have on the platform. The selection process relies on standard rules to ensure that all candidates have an equal chance to reach a privileged position. The concept of PoA holds good for private networks rather than a public network.

To improve the original BFT mechanisms, Practical Byzantine Fault Tolerance came into existence. Let’s understand this mechanism in detail.

Practical Byzantine Fault Tolerance (pBFT)

The increase in malicious attacks and software errors is due to the growth in the size and complexity of the software. These errors and attacks can cause faulty nodes to exhibit Byzantine behavior. Hence, to avoid this, Byzantine Fault Tolerance is necessary.

Castro and Liskov released a paper in 1999 introducing Practical Byzantine Fault Tolerance. pBFT offers a practical Byzantine state machine replication of malicious nodes, assuming that there are independent node failures and malicious messages sent through specific nodes. The goal of pBFT is to solve all the problems associated with Byzantine Fault Tolerance.

How does Physical Byzantine Fault Tolerance (pBFT) work?

The pBFT algorithm works in asynchronous systems like the internet and is optimized for low overhead time. Also, there is a minimal increase in latency. All the nodes in the pBFT system are in sequential order with one node being the primary node and the rest of others being the backup nodes.

These nodes communicate with each other with the goal that all the honest nodes will agree to a consensus using the majority rule. While these nodes communicate they need to prove that the messages came from a specific peer node and also ensure that they were not modified during the transmission. The number of malicious nodes should not be equal or exceed one-third of the nodes in the system. Similar to Proof of Consensus Algorithm it is better to have more nodes in the pBFT network to ensure better security.

According to the Practical Byzantine Fault Tolerance Paper:

“The algorithm offers both liveness and safety provided at most ((n-1) / ⅓) out of a total of n replicas are simultaneously faulty. This means that clients eventually receive replies to their requests and those replies are correct according to linearizability”.

The pBFT consensus rounds come down to 4 phases. It follows the ‘Commander and Lieutenant’ format where all generals are equal due to the presence of the leader node. The phases include:

A client sends a request to the leader node to perform a specific operation.
The leader node relays the operation to backup nodes.
The backup nodes perform the operation requested and then send a reply to the client.
The operation is successful only if the client receives n+1 replies from different nodes of the network with the same result, where n is the maximum number of nodes allowed faulty.

The nodes are settled and begin in the same state. The end result is that all the honest nodes come to an agreement and they either accept it or reject it. The leader is changed during every view and is changed by a view change protocol if the leader node has not broadcasted the request for a predefined amount of time. In case of any emergency, the majority of honest nodes can vote on the credibility of the leader node and replace it with the next leading node in line.

Advantages of Physical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance (pBFT) is more superior to various other consensus models. Let’s understand what are they:

Transaction Finality: Transactions can be approved and finalized without the need for confirmations like in Proof-of-Work models. Suppose the proposed block agree upon by the nodes in a pBFT system, that block is finalized. It is possible because all honest nodes agree on the state of the system at that specific time after communicating with each other.

Higher Energy Efficiency: pBFT need not require energy-intensive computations to achieve network consensus. Proof-0f-work is only used to prevent Sybil attack, not in consensus.

Low Reward Variance: In PoW only the leader has a chance to propose the next block, while pBFT follows collective decision making where every backup can vote to elect the leader. Hence, every node in the pBFT system will be rewarded.

Limitations of Practical Byzantine Fault Tolerance (pBFT)

While there are promising advantages of Practical Byzantine Fault Tolerance (pBFT), there should also be a few limitations. Let’s understand them more deeply:

The pBFT mechanism works fine with small consensus group sizes. In order to keep the network secure, each node should communicate with other nodes, as a result of which a huge communication cost indulges as the number of nodes scales increases.

pBFT models are vulnerable to Sybil attacks. In this attack, a single party can manipulate a large number of nodes in the network to compromise security. When the network size is larger this threat can be reduced. But, the pBFT mechanism does not support large networks. So, it can be optimized or needs to be used in combination with another consensus mechanism.

Which Platforms implement Practical Byzantine Fault Tolerance (pBFT)?

A number of blockchain platforms use an optimized or hybrid version of the pBFT as their consensus model or at least a part of it along with another consensus mechanism. Currently, Zilliqa and Hyperledger are the two projects that use Physical Byzantine Fault Tolerance (pBFT). Let’s dig deeper into the concept:

Zilliqa

Zilliqa employs the optimized version of classical pBFT combined with PoW consensus round every ~100 blocks to perform network sharding. Each shard can process transactions in parallel, resulting in high throughput for the network. This network uses Multi signatures to reduce the communication overhead of classical pBFT. Also, in their own testing environments, they have reached a few transactions per second.

Hyperledger

Hyperledger Fabric is an open-source collaborative environment for blockchain projects. Linux Foundation hosted these projects using a permitted version of pBFT for the platform. Permissioned blockchains use small consensus groups and do not require the same decentralization as public blockchains. Also, Practical Byzantine Fault Tolerance offers high-throughput transactions more effectively.

The permissioned blockchains usually have an element of trust between parties. Hence, eliminating the need for the trustless environment and allowing the network to procure the Practical Byzantine Fault Tolerance (pBFT) advantages without any limitations.

Conclusion

The Byzantine General’s Problem gave rise to the Byzantine Fault Tolerance System. Today, these systems are being used extensively in the market. Apart from the Blockchain industry, BFT has certain use cases which include the aviation, space, and nuclear power industries. Bitcoin solves the Byzantine general problem by its decentralized verification method and becomes the world’s largest cryptocurrency.

BFT in distributed systems and its integration through practical Byzantine Fault Tolerance algorithm into the real-world systems remains a key-infrastructure component of cryptocurrencies today.