Skip to main content

Data Descriptor: Waiting Time in Ethereum Transaction Fee Mechanisms

Tutorial for “Empirical Analysis of EIP-1559: Transaction Fees, Waiting Time, and Consensus Security”

Published onJan 14, 2023
Data Descriptor: Waiting Time in Ethereum Transaction Fee Mechanisms
·

Disclaimer: This is part of the deliverable of the student author, Tianyu Wu’s summer experiential learning activities, jointly supervised by Prof. Luyao Zhang and Prof. Fan Zhang in their project entitled “Understand Waiting Time in Transaction Fee Mechanisms,” supported by Ethereum Foundation as an independent project in Summer 2022.  

Related work:

Yulin Liu, Yuxuan Lu, Kartik Nayak, Fan Zhang, Luyao Zhang, and Yinhong Zhao. 2022. “Empirical Analysis of EIP-1559: Transaction Fees, Waiting Times, and Consensus Security. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS '22).” Association for Computing Machinery, New York, NY, USA, 2099–2113. https://doi.org/10.1145/3548606.3559341

Wu, T. (Henry), & Zhang, L. (Sunshine). 2022. “Recap of ETHconomics @ Devconnect 2022. Computational Economics.” Retrieved from https://ce.pubpub.org/pub/ethconomics2022

Wu, T., & Zhang, L. (Sunshine). 2022. “Blockchain Meets Auction. Computational Economics.” Retrieved from https://ce.pubpub.org/pub/blockchain-meets-auction

Wu, T. (Henry). 2023. “An extension of a game theoretical model of users’ incentives in blockchain dark venues.” Intelligent Economics: An Explainable AI Approach. Retrieved from https://ie.pubpub.org/pub/mev

Abstract

In this article, we provide a tutorial for replicating the waiting time results in the paper entitled “Empirical Analysis of EIP-1559: Transaction Fees, Waiting Time, and Consensus Security”. As a data descriptor, we introduce two reliable data sources related to the empirical analysis of waiting time in the transaction fee mechanism: mempool data and blockchain transaction-level data, and describe the workflow to integrate these two data sources for further analysis. Subsequently, we demonstrate the step-by-step instructions for the data importing process and create a series of visualizations to replicate, validate, and extend the results in the original paper that advances the understanding of how the EIP-1559 can affect waiting time in the transaction fee mechanism on Ethereum.

Keywords: Waiting Time, Transaction Fee Mechanism, Empirical Analysis, Ethereum

Data and Code Availability:

The final data records are stored and published in

Part I Introduction to Data Sources

Liu et al. (2022)’s paper provides a benchmark to analyze how the current transaction fee mechanism, EIP-1559, would affect the waiting time in a systematic approach. Based on their descriptions, we are going to look at two important data sources that are of great use to understanding waiting time in the transaction fee mechanism at the transaction level:

  1. Mempool Data, txt files from the GitHub repo of EIP-1559's empirical research, acquired from three full on-chain nodes in LA, Montreal, and the Triangle area;

  2. Blockchain Block/Transaction Level Data from Kaggle Ethereum Blockchain;

The original mempool datasets are quite messy and several irrelevant feature variables exist. At the same time, there are repetitive entries among those datasets, which creates a barrier to the calculation of waiting time in the future. Therefore, we are going to integrate mempool data from three locations, and conduct data cleaning to generate a filtered collection of mempool data for future research, to make sure that every transaction hash is unique at each row (Figure 1). 

Figure 1: Workflow of mempool data integration and cleaning [Whimsical]

  • datasets A: the original mempool datasets in .csv by mempool location

  • datasets B: the merged mempool dataset in .csv by adding a column to indicate the mempool location

  • datasets C: the filtered mempool dataset with the unique observation by transaction hash that record only the earliest appearance, currently stored in Kaggle

Table 1 shows the sample variables that are recorded and included in the filtered mempool datasets.

Variable name

Description

Type

hash

Transaction hash

String

Unix Timestamp

Unix timestamp of the transaction first submitted to the mempool

Float

mempool

The location where the transaction first appeared in the mempool

String

Table 1: Sample variables of filtered mempool dataset 

On-chain transaction-level data can be easily acquired from Kaggle Ethereum Blockchain, and the sample variables that we are going to use are described as follows, where next_block_number is simply the value of block_number plus one, to make sure that the following calculation of waiting time is desirable (Liu et al., 2022). More information can be found in Table 2:

Variable name

Description

Type

hash

Transaction hash

String

block_number

Block number that this transaction is recorded in

Integer

next_block_number

Next block number that this transaction is recorded in

Integer

Unix_timestamp

Unix timestamp of the transaction recorded in the Ethereum Blockchain

Integer

Table 2: Sample variables of dataset queried from Kaggle Ethereum Blockchain

Following Liu et al. (2022)’s strategy in correctly calculating the waiting time using the empirical data, we merge the on-chain transaction-level data with the filtered mempool data, to derive the waiting time based on transaction level, as shown in Figure 2:

Figure 2: Derivation of waiting time on transaction level [Whimsical]

  • datasets D: the blockchain transaction dataset in .csv, directly queried from Kaggle

  • datasets E: the blockchain transaction dataset in .csv with a new column "next block timestamp"

  • datasets F: the merged dataset of C and E with a new column calculated "waiting time" = "next block timestamp" - "the earliest appearance"

Part II Sample Data Import and Visualization

As discussed in the previous part, these two data sources are both accessible from Kaggle, with the sample data, the time scope which we are interested in is defined as the Pre-EIP period (2021/07/25-2021/08/05), Post-EIP period (2021/08/16-2021/08/27). The general pipeline of sample data import can be found in Figure 3:

Figure 3: Pipeline of sample data import [Whimsical]

Based on this pipeline, we head to derive the waiting time distribution in Ethereum Blockchain, where we refer to a proper rule of waiting time calculation described in Liu et al. (2022)’s empirical studies. Applied to the transaction-level data we derived above, it results in a significant reduction of the proportion of negative waiting time in general, where the statistics are shown in Table 3:

Time Period

Percentage of Negative Waiting Time

Pre-EIP

0.4%

Post-EIP

0.3%

Table 3: Percentage of negative waiting time before and after London hardfork

For future analysis, we applied two direct approaches to reduce the effect of negative waiting times and then plotted the distribution plot and boxplot accordingly: 1) removing all negative waiting times (Figure 4); 2) setting all negative waiting times to 0 (Figure 5).

Figure 4: Waiting Time Distribution Plot and Boxplot 

(Pre-EIP and Post-EIP), by removing all negative waiting times [Colab Part I]

Figure 5: Waiting Time Distribution Plot and Boxplot 

(Pre-EIP and Post-EIP), by setting all negative waiting times to 0 [Colab Part I]

In general, from these two visualizations, no matter before or after the London hardfork, this distribution by empirical data is intuitively approximate to a Chi-Square distribution with a degree of freedom 3, where the degree of freedom (logically independent values) can be the factor that can naturally influence the waiting time, besides the learning rate and block size which can be controlled by mechanism designers.

Based on these transaction-level data, we are more than interested in investigating whether our data could fit and verify the results produced in Empirical Analysis of EIP-1559: Transaction Fees, Waiting Time, and Consensus Security, where scholars there produced the distribution of median waiting time for each block to show the median waiting time decreases by 49.17% during Post-EIP period and the median waiting time decreases by 41.23% (Liu et al., 2022). The T-test is applied in their research to verify the significant decrease in average waiting time by 26.45%. When applying to our transaction-level data, if we choose the timeframe until 08/27/2021, the median waiting time for each block is calculated to decrease by 22.47% (Figure 6), where the t-test also verifies the test’s significance (p-value < 0.001). Similarly, by choosing the timeframe until 10/27/2021, the median waiting time for each block decreased by 22.35% (Figure 12), and the p-value of the t-test is also smaller than 0.001.

Figure 6: Median Waiting Time for Each Block (Pre-EIP and Post-EIP),

Derived by Transaction Data until 08/27/2021 [Colab Part II]

Figure 7: Median Waiting Time for Each Block (Pre-EIP and Post-EIP with an extended period), Derived by Transaction Data until 10/27/2021 [Colab Part II]

Part III Waiting Time Comparisons between EIP-1559 Transaction and Legacy Transaction After London Fork 

3.1. On Adoption of EIP-1559

London hardfork was completed at the block height of 12,965,000, but it does not mean that the transactions included in the blocks afterward necessarily follow the EIP-1559 auction, and there are several reasons accounting for it:

  1. Backwards Compatibility: The mempool can still store and record the past legacy transactions submitted to Ethereum blockchain if its data storage mechanism allows to do so and they can also be included in blocks according to miners’ order preferences later. But naturally, the new pricing system is not applicable to those transactions based on backwards compatibility. From the data record’s perspective, the gas_price column shown in the legacy transaction record will be entirely substituted by the base_fee_per_gas and the priority_fee_per_gas, which implies that once the system has been upgraded from legacy to new transaction mechanisms, this new pricing system will no longer support to include past transactions anymore.

  2. Even though the common Ethereum wallets, such as MetaMask, do not support the transaction submitted for legacy bidding, some in-house infrastructures or protocols that people built can still realize to submit the legacy-style Ethereum transactions to the mempool waiting for being included in the blocks. The only difference is that the base_fee_per_gas or the priority_fee_per_gas will be degraded to N.A. since only the gas price will be recorded to indicate the transaction’s values.

3.2. How to Distinguish EIP-1559 Transactions From Legacy Transactions in the Ethereum Blockchain 

London hardfork has indeed made a significant impact on the whole Transaction Fee Mechanism in the Ethereum blockchain. Besides making the comparisons before and after the London hardfork, as we discussed in Section 3.1, we ought to understand how to distinguish EIP-1559 transactions from legacy transactions after the London hardfork, from the empirical on-chain data to better accurately measure the differences. Figure 8 shows how the blockchain records the transaction information differently between the legacy bid and the EIP-1559 bid after the London hardfork. From the on-chain empirical data, we can look at the column of the “txn type” field to indicate whether the transaction is legacy or EIP-1559 mode since 0 will be used to indicate the legacy transactions, while 2 will be used to indicate the legacy transactions.

Figure 8: Difference between legacy transaction and EIP-1559 transaction after London hardfork (Whimsical)

3.3. Visualization

Based on the description in Sections 3.1 and 3.2, we are more than interested in applying the same procedure that we followed and made a visualization on the waiting time differences of transactions included on-chain before London hardfork and after London hardfork but with EIP-1559 auction and legacy auction. The T-test is applied in their research to verify the significant decrease but it is rejected with a 5% confidence interval (Figure 9).

Figure 9: Median Waiting Time for Each Block (Pre-EIP and Post-EIP), with the tag differentiating between EIP and legacy auction, Derived by Transaction Data until 08/27/2021 [Colab Part III]

Part IV Discussion and Future Work

To accurately measure the waiting time of each transaction in the blockchain, the most difficult problem is the determination of the exact time that the transaction is submitted to the mempool. In this data descriptor, we have described the most advanced and accurate way to record the transaction’s first appearing time in the mempool in the normal setting, where those mempool data are collected from the nodes that are priorly deployed in the blockchain. However, Li et al. (2021) pointed out that attacks can be made by hackers to the flawed transactions in the mempool to do harm to the computer security in the blockchain, and it would also create a great number of barriers to the empirical studies which aim at measuring the congestion in the blockchain.

Student Author:

Tianyu Wu is a senior student at Duke Kunshan University, majoring in Applied Mathematics and Computational Sciences/Math.

Figure 10: Tianyu Wu - Headshot

Appendix

In the meanwhile, considering the non-negligible existence of negative waiting times from our data, we suggested looking at the distribution of minimum waiting time for each block (Figure 11). 

Figure 11: Minimum Waiting Time Distribution Plot and Boxplot 

(Pre-EIP and Post-EIP) [Colab]

By removing all negative points for the minimum waiting times, Figure 12 shows that the average minimum waiting time is about 11.5s over the timeframe, whereas the median one stands at 1.82s. A similar strategy shows that by setting all negative minimum waiting times to 0, the average minimum waiting time reduces to 10.2s over the timeframe, whereas the median one decreases to 1.57s (Figure 13). From the statistics, we concluded that, based on the transaction-level data, before EIP-1559, the percentage of negative waiting time is 0.4% from 2021/07/25 to 2021/08/05, while after approximately one-month implementation of EIP-1559, this percentage reduces to 0.3% from 2021/08/16 to 2021/08/27. During an extended period, that is, from 2021/08/28 to 2021/10/27, the percentage of negative waiting time is around 0.9%.

Figure 12: Minimum Waiting Time Distribution Plot and Boxplot 

(Pre-EIP and Post-EIP), by removing all negative waiting times [Colab

Figure 13: Minimum Waiting Time Distribution Plot and Boxplot 

(Pre-EIP and Post-EIP), by setting all negative waiting times to 0 [Colab]

Comments
0
comment
No comments here
Why not start the discussion?