Fetching transactions to and from a specific address with eth_getBlockByNumber

Introduction

In the rapidly evolving world of blockchain technology, Ethereum has emerged as a leading platform for decentralized applications (DApps), smart contracts, and token transactions. As the ecosystem grows, so does the need for tools and methods that provide insights into network activity. One such tool is the ability to fetch transactions to and from specific Ethereum addresses, an invaluable capability for a range of applicationsā€”from auditing and analytics to monitoring and compliance.

Whether you're a DApp developer looking to debug contract interactions, a financial analyst tracking asset flows, or an Ethereum user wanting to keep tabs on your transactions, understanding how to fetch transaction data programmatically is essential. This article aims to fill that knowledge gap by providing a step-by-step guide on how to fetch transactions for a specific Ethereum address using Python and Chainstack. We'll walk you through the entire process, from setting up your development environment to customizing and optimizing your code for various use cases.

Setting up the environment

Before diving into the code, it's crucial to set up a proper development environment. This ensures you have all the necessary tools and configurations to run the code smoothly. Below are the components you'll need and the steps to set them up.

Python and Web3 library

  1. Install Python. If you haven't already, download and install Python from the official website.

  2. Create a virtual environment. In your project directory, create a new virtual environment:

    python3 -m venv tx_monitor
    

    Then activate it:

    source tx_monitor/bin/activate
    
  3. Install the Web3 library. Open your terminal and run the following command to install the Web3 library:

    pip install web3
    

Blockchain RPC endpoint with Chainstack

You'll need access to an RPC node to interact with a blockchain. Chainstack provides a straightforward way to deploy and manage blockchain nodes.

  1. Sign up for Chainstack. Visit the Chainstack signup page and follow the instructions to create an account.

    šŸ“˜

    Try out the new social login feature.

  2. Deploy a node. Once your account is set up, you can deploy an Ethereum node. Follow the Chainstack documentation for a step-by-step guide on how to do this.

  3. Access node credentials. After deploying your node, you'll need the RPC URL to interact with the Ethereum network. You can find this information in the Chainstack dashboard. For more details, see View node access and credentials.

In this article, we use a BNB Smart Chain node as an example.

The codeā€™s logic

Now, letā€™s go over the logic we implement in this project. The code will take the following steps:

  1. Initialization. Set up the BNB Smart Chain node connection and specify the block range and address to monitor.

  2. Core functionality, Use the eth_getBlockByNumber method indirectly through Web3's get_block function to fetch each block within the specified range.

    šŸ“˜

    Learn more about eth_getBlockByNumber.

  3. Transaction filtering. For each fetched block, scan through the list of transactions and identify those that involve the specified Ethereum address. It does this by checking the from and to fields in each transaction.

  4. Result compilation. All transactions involving the specified address are compiled into a list printed out at the end.

Importance of eth_getBlockByNumber

The eth_getBlockByNumber method is crucial here as it allows the script to fetch the entire block data, including the full list of transactions within each block. By doing so, the script can then iterate through these transactions to identify those that involve the specified address. This method provides a way to access historical data on the blockchain, making it a valuable tool for auditing and monitoring activities.

In a new file, paste the following code:

from web3 import Web3
from web3.middleware import geth_poa_middleware

rpc_url = "YOUR_CHAINSTACK_ENDPOINT" # Change it to your Chainstack's node URL

from_block = 32720146 # Define the block interval you have interest on
to_block = 	32720209

your_address = "0x2D4C407BBe49438ED859fe965b140dcF1aaB71a9" # Define the address of interest

w3 = Web3(Web3.HTTPProvider(rpc_url))
w3.middleware_onion.inject(geth_poa_middleware, layer=0) # This is needed for some specific protocols given some ExtraBytes in the response. For polygon, for example, this is needed

def get_transactions_for_address(address, from_block, to_block):
    transactions = []
    
    for block_number in range(from_block, to_block + 1):
        print(f'Inspecting block {block_number}')
        block = w3.eth.get_block(block_number, full_transactions=True)
        if block is not None and 'transactions' in block:
            for tx in block['transactions']:
                if address in [tx['from'], tx['to']]:
                    transactions.append(tx['hash'].hex())
                    print(f'TX involving {your_address} here')
    
    return transactions

if __name__ == "__main__":

    print(f'Scanning for TXs involving {your_address}')
    transactions = get_transactions_for_address(your_address, from_block, to_block)
    if transactions:
        print(f"Transactions involving address {your_address}:")
        for tx_hash in transactions:
            print(tx_hash)
    else:
        print(f"No transactions found for address {your_address} in the specified block range.")

Fetching transactions with precision

At the heart of our script is the get_transactions_for_address function, a meticulously designed piece of logic that scans the blockchain to identify transactions involving a specific address within a user-defined range of blocks. Here's an overview of its operation:

  • The function embarks on a block-by-block journey, traversing from the starting block number defined in from_block to the ending block number in to_block.
  • The function scrutinizes every transaction within each block, examining the sender (from) and recipient (to) addresses. The transaction hash is captured and stored in a list if it identifies a transaction where the specified Ethereum address is either the sender or the recipient.

This function is a robust yet straightforward mechanism for compiling a transaction history for a given Ethereum address. Doing so offers invaluable insights for auditing, monitoring, or any other use cases requiring a detailed transaction history.

The significance of transaction tracking

Transaction monitoring: A multi-faceted utility

The capability to accurately retrieve transactions associated with a particular address is not just a featureā€”it's an essential tool with diverse applications, such as:

  • For wallet users. This functionality enables wallet users to meticulously monitor incoming and outgoing transactions, thereby providing a reliable way to verify that all transactions have been executed as expected.
  • For developers. This tool is especially beneficial for developers who must keep tabs on interactions with their deployed smart contracts. It serves as a real-time monitoring system and an auditing mechanism to ensure the contracts operate as designed.

Real-world applications

To further illustrate the utility of this tool, let's dive into some scenarios where it can be particularly impactful:

  • Token asset management ā€” the ability to track token transfers to and from a specific address is invaluable for managing your digital assets effectively. It provides a transparent view of your token portfolio's activity.
  • Transaction verification ā€” for businesses and individual users, confirming the successful receipt of funds is paramount. This tool simplifies that process by providing a straightforward way to verify transaction completion.
  • Smart contract oversight ā€” for developers and auditors alike, this tool is important for continuously monitoring and auditing smart contract interactions. It provides a granular view of transactional data, aiding in both development and compliance efforts.

Fine-tuning and performance enhancement

Strategic block range selection

One cornerstone of maximizing this script's utility is the judicious selection of the from_block and to_block parameters. These values define the scope of your transaction search, and their optimal setting depends on your specific use case. Whether you're interested in a brief snapshot of recent activity or an exhaustive historical audit, adjusting these parameters allows you to focus your query accordingly.

Advanced transaction filtering

For those looking to go beyond basic transaction tracking, the script can be further customized by implementing additional filters. For example, you could refine your search by filtering transactions based on their value or by isolating only those transactions that interact with a particular smart contract. This level of granularity enables you to extract precisely the data you're interested in, making your monitoring efforts more targeted and efficient.

Leveraging parallel computing for efficiency

Performance optimization becomes a key consideration because the get_block method can be resource-intensiveā€”especially when dealing with a large range of blocks. One effective strategy for speeding up the data retrieval process is through parallelization. By employing multi-threading, you can distribute the workload across multiple threads, reducing overall runtime. The script includes a parallelized version of the core function, demonstrating how multi-threading can significantly improve performance.

šŸ“˜

Learn more about multithreading in Python in Mastering multithreading in Python for Web3 requests: A comprehensive guide.

from web3 import Web3
from web3.middleware import geth_poa_middleware
import concurrent.futures
import time

# Use the same RPC URL as in your original code
rpc_url = "YOUR_CHAINSTACK_ENDPOINT"

# Use the same block range as in your original code
from_block = 32720146
to_block = 32720209

# Use the same Ethereum address as in your original code
your_address = "0x2D4C407BBe49438ED859fe965b140dcF1aaB71a9"

w3 = Web3(Web3.HTTPProvider(rpc_url))
w3.middleware_onion.inject(geth_poa_middleware, layer=0)

def get_transactions_for_block_range(address, from_block, to_block, block_number, num_threads):
    transactions = []
    for block_num in range(from_block + block_number, to_block + 1, num_threads):
        block = w3.eth.get_block(block_num, full_transactions=True)
        if block is not None and 'transactions' in block:
            for tx in block['transactions']:
                if address in [tx['from'], tx['to']]:
                    transactions.append(tx['hash'].hex())
                    print(f'TX involving {address} found')
    return transactions

if __name__ == "__main__":
    latencies = []

    for num_threads in [1, 2, 4, 8]:
        start_time = time.time()

        with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
            futures = []
            for i in range(num_threads):
                futures.append(
                    executor.submit(
                        get_transactions_for_block_range,
                        your_address,
                        from_block,
                        to_block,
                        i,
                        num_threads
                    )
                )

            transactions = []
            for future in concurrent.futures.as_completed(futures):
                transactions.extend(future.result())

        end_time = time.time()
        latency = end_time - start_time
        latencies.append((num_threads, latency))

        if transactions:
            print(f"Transactions involving address {your_address} with {num_threads} threads:")
            for tx_hash in transactions:
                print(tx_hash)
        else:
            print(f"No transactions found for address {your_address} in the specified block range.")

    print("\nPerformance Metrics:")
    for num_threads, latency in latencies:
        print(f"Threads: {num_threads}, Latency: {latency:.2f} seconds")

This code will output the difference found while simultaneously using 1, 2, 4, and 8 threads. Here is the result we found:

Performance Metrics:
Threads: 1, Latency: 22.53 seconds
Threads: 2, Latency: 12.35 seconds
Threads: 4, Latency: 6.62 seconds
Threads: 8, Latency: 3.33 seconds

The code tests the performance with 1, 2, 4, and 8 threads, allowing us to observe how the system scales with increasing threads. Here's a breakdown of the results:

  • Single-threaded (1 thread). When running the code with a single thread, it took 22.53 seconds to complete the task. This serves as our baseline for performance comparison.
  • Dual-threaded (2 threads). With two threads, the latency dropped to 12.35 seconds. This is almost a 45% reduction in time compared to the single-threaded execution, showcasing the benefits of parallelization.
  • Quad-threaded (4 threads). Utilizing four threads further reduced the latency to 6.62 seconds. This is roughly a 70% reduction compared to the single-threaded baseline.
  • Octa-threaded (8 threads). Finally, with eight threads, the latency was minimized to 3.33 seconds. This is an astonishing reduction of about 85% compared to the single-threaded execution.

These results demonstrate the power of multithreading in optimizing computational tasks. The more threads we use, the lower the latency, up to a point. This is a classic example of how parallel computing can significantly speed up inherently parallelizable tasks, like fetching transactions from different blocks in a blockchain.

Hardware constraints

The number of threads you can effectively use is often limited by the hardware on which the code is running. Most modern CPUs have multiple cores, and each core can run one or more threads. It's generally a good idea to align the number of threads with the number of available CPU cores for optimal performance.

Scalability testing

The numbers 1, 2, 4, and 8 were chosen to provide a broad spectrum for scalability testing. Starting with a single thread provides a baseline performance metric. Doubling the number of threads at each step (1 to 2, 2 to 4, and 4 to 8) allows us to observe how the system scales with increasing threads. This is a common approach to gauge the "speedup" factor and to understand if the application benefits from parallelization.

Diminishing returns

It's essential to note that adding more threads doesn't always result in linear performance improvement. Due to factors like thread management overhead and resource contention, there's a point beyond which adding more threads may yield diminishing returns or even degrade performance. That's why it's useful to test with various numbers of threads to find the "sweet spot."

Task granularity

The granularity of the task at hand also influences the optimal number of threads. If the task can be broken down into smaller, independent sub-tasks (as is the case with fetching transactions from different blocks), it's more likely to benefit from multi-threading. However, if the task involves many shared states or resources, adding more threads might lead to issues like race conditions or deadlocks.

Conclusion

The capacity to retrieve transactions associated with a particular address is an indispensable tool for anyone engaged in the blockchain space. This utility transcends roles, offering valuable insights for wallet users, developers, and auditors alike. It not only enhances your understanding of transactional flows but also fortifies the transparency and accountability that are the cornerstones of blockchain technology.

Through thoughtful customization and optimization, you can fine-tune this code to serve many applications. Whether you're tracking fund movements, auditing smart contracts, or monitoring network activity, the code provides a robust foundation upon which you can build.

The introduction of multi-threading options showcases how performance can be significantly improved, allowing for more efficient data retrieval and analysis. This adaptability underscores the code's versatility, making it a reliable real-time monitoring and historical data analysis solution.