To demonstrate some of the capabilities of the simulator for system analysis, we will consider a simple example of a HL Fabric 2.0 system. First, we will describe the setup. Then, starting from default parameter settings, we will analyse a few design variations. We will see that DLT systems exhibit complex behaviour and are sensitive to small changes. Finally, we will discuss the design space.
Hyperledger Fabric specifications
The consortium consists of a varying number of organisations across Europe. We assume they play an equal role in the network, apart from ordering, if there are less orderers than organisations.
- N organisations, one client and one peer each
- Every client, peer and orderer is mapped to one host
- Organisations are distributed evenly across five datacenters (London, Paris, Frankfurt, Stockholm, Dublin)
- One channel
User requests result in two transaction types: queries and updates. We assume a 1:1 ratio.
- Query transaction:
- 256 byte input size
- 4 kB read set size
- Update transaction:
- 256 byte input size
- 1 kB read set size
- 1 kB write set size
- Every host has four CPU cores, modelled on AMD Ryzen (3.4 GHz)
- Per datacenter:
- Fat tree topology with 276 hosts per datacenter
- Host to switch delay: 0.05us (10m Ethernet cables)
- Switch to switch delay: 0.5us (100m Ethernet cables)
- Host to switch; switch to switch bandwidth: 1Gbps
- Inter-datacenter parameters:
- Fiber optic cable per m distance: 1.1m
- Fiber optic cable delay: 5us per km
- Available bandwidth: 10Gbps
We perform parameter sweeps of N (number of organisations, x-axis), and observe two variables:
- Maximum sustainable transactions per second (tps) system wide.
This is identified by multiple simulations performing a binary search: find a bracket of tps where one is sustainable and one is not. Deploy a new simulation half way between the two; identify if tps sustainable; adjust bracket; repeat.
- Client-side update transaction latency with 50% of maximum tps.
This is the average time lapsed between a client beginning to process an update transaction and receiving the confirmation that it has been submitted to the ledger. It is measured at 50% of the maximum sustainable tps, to ensure the system is under decent load, but runs stable.
Furthermore, we assume the system to be balanced whenever this is possible. E.g. every client injects and every peer endorses an (asymptotically) equal number of transactions; orderers are distributed evenly across the datacenters. Balanced simulations avoid some artefacts of undersampling, but are of course in some respects an idealisation.
In HL Fabric, every update transaction needs to be endorsed by specified number of peers. In the default settings, endorsements from a majority of peers are required (floor(N/2)+1).
Batching update transactions
By default (or maybe more accurately: typically), every user request is mapped to one transaction.
HL Fabric supports Raft as ordering service. We will assume 5 orderers, irrespective of the number of clients or peers.
With these default settings, the system doesn’t scale well. The initial tps of ~1000 drops rapidly with N and the latency increases.
Studying the underlying data revealed the CPU to be the scaling bottleneck. This stands to reason. The creation and validation of endorsements involves expensive cryptographic functions. First, the number of endorsements per update transaction increases linearly with N. However, the number of endorsing peers, able to perform this work increases linearly as well. Second, and crucially, every endorsement for every update transaction needs to be validated by every peer after ordering. Thus, the cost of validation increases linearly (with N) per peer.
From Majority to 2 Endorsements
This first variation is a policy variation aiming at circumventing the CPU bottleneck by reducing the number of required endorsements. This has implications for security: with majority endorsement, no update can be pushed which is not agreed on by a majority of the N organisations. This secures the system against any minority deciding on updates. However, from a point of view of minimal viable security, the reasonable alternative is 2 endorsements. Here, no single organisation can perform arbitrary updates, as at least one other organisation must agree. Also, since DLTs are tamper-evident, any false endorsements can be identified via audit by any participant at any time. With 2 endorsements, the cost scales very differently:
- Constant cost of endorsement system wide (work per peer decreases)
- Constant cost of validation per peer
As can be readily seen, the scaling behaviour has fundamentally changed. Latency stays constant, while tps increase up to N=80.
Batching Updates (DIY): 4x and 16x
This second variation also aims at circumventing the CPU bottleneck, this time without reducing security (with majority endorsement). Instead, multiple user update transactions are batched into one transaction (4x and 16x). For the actual system, this behaviour needs to be programmed into the chaincode. Batching reduces the cost of endorsement and validation per user update transaction. However, the client update latency increases because:
- Updates may need to wait to be included into batch-transactions
- The system requires longer to process and commit these larger batch-transactions
- Significant increase in tps vs. no batching
- Tps of over 3000
- The increase in client side update latency is high (1 order of magnitude vs. no batching; up to 2 orders or magnitude compared to the previous results with 2 endorsements)
Orderers: 5 vs. N
The third variation involves the number of orderers. DLT systems revolve around reaching consensus on a single version of the ledger. The most common use case for consensus algorithms is to ensure reliability and availability: even if some nodes fail (crash), no data is lost and it is available through the running nodes. Using Raft, in practice 5 orderers suffice. The system can sustain 2 crash-failures. If a failure occurs, the system can tolerate one further failure, while rebooting or replacing the failed node.
However, if there are more than 5 organisations, not all of them can run an orderer. This would require N orderers. Clearly, this will allow the system to tolerate more crash-failures (ceil(N/2)-1, to be precise). But perhaps more importantly, this gives all organisations equal control over the ordering process.
The simulations are run with 2 endorsements, as the CPU bottleneck dominates for majority endorsements.
It is interesting (and somewhat unexpected) to see, that the ordering service only begins to become a bottleneck at N=80. With batching, the effect begins at slightly lower scale.
Best Case Scaling Performance
Finally, we investigate the best possible scaling behaviour for this setup: 2 endorsements, 5 orderers and batching update transactions.
As can be seen, the system shows good, scalable performance. However, theres clearly is an upper bound at around 3500 tps, indicating a bottleneck we have not yet considered. Closer inspection of the simulation data reveals this bottleneck to be TCP acknowledgements. i.e. the bandwidth of transferring data is limited by the delay of TCP acknowledgements (latency bound). To further increase tps, the system could e.g. be deployed on a network with lower latency.
Further Relevant Data
We only show a fraction of the data available for analysis. Essentially, the full behaviour of the system can be studied in detail, including e.g. the following data:
- Query latency: Queries require only one endorsement and are then answered directly by the endorsing peer without going through the ordering service
- CPU time endorsement/validation: the actual time spent by the CPU performing heavy tasks, indicating if a more powerful CPU is required
- Delay of ordering service: This is an indicator of how far the current status of the DLT on the peers is behind the updates already being processed by the ordering service
- Update timing across peers: One possible measure of fairness, i.e. are some peers systematically ahead of others (have on average more up to date data)
The variations presented here represent only a fraction of the design space available to novel DLT systems, both in terms of dimensions and of range. An incomplete outline:
- Multiple channels
- Multiple peers and clients per organisation
- Imbalance in endorsing peers
- Variation of protocol and hardware parameters
- Variation of network topology
- Temporal variance in workload
Finally, the simulator allows for far more detailed scenarios to be analysed. Some examples include:
- Multiple/correlated failures: What impact can such scenarios have in terms of performance, e.g. client-side update and query latencies
- Loss/degradation of infrastructure: What impact would e.g. the loss of specific network connections have on the system
- Rule bending/breaking of ordering service: what can an orderer achieve, e.g. without being detected
- DLT systems are sensitive to seemingly small changes of setup
- DLT systems display complex behaviour which is hard to predict quantitatively
- Different bottlenecks come into play: precise modelling is important (down to the transport layer)
- Simulation results are straight-forward to analyse