threading
and concurrent.futures
modules, among others. These modules provide high-level, easy-to-use APIs for creating and managing threads. They handle a lot of the complex details of thread management behind the scenes, making it easier for developers to leverage multithreading in their applications.
But why does this matter? Well, think about the benefits:
Web3
module from the web3
package. We also import the time
module, which we’ll use to measure the execution time of our script.Web3.HTTPProvider
. Replace YOUR_CHAINSTACK_ENDPOINT
with the link to your Ethereum node.web3.eth.block_number
to get the latest block number and subtract 500 to get the start of our range.get_balance_at_block
that takes a block number as input, fetches the balance of our specified address at that block number, and prints the balance in ether (converted from wei).start_block
to end_block
, calling get_balance_at_block
for each one. Since we’re not using multithreading, these requests are made sequentially, meaning the script waits for each request to complete before moving on to the next one.time.time()
to get the current time at the start and end of our script, subtract the two to get the total execution time, and print the result.asyncio
and ThreadPoolExecutor
from concurrent.futures
. asyncio
is a library for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives. ThreadPoolExecutor
is a class that creates a pool of worker threads and provides a simple way to offload tasks to them.ThreadPoolExecutor
. Inside the main
function, we create a ThreadPoolExecutor
with a maximum of 100 worker threads. These threads will be used to run our get_balance_at_block
function concurrently.get_balance_at_block
for a different block number. Each of these tasks is run in the executor, meaning it’s run in a separate thread. This is done using the loop.run_in_executor
method, which schedules the callable to be executed and returns a Future
object representing the execution of the callable.asyncio.gather(*tasks)
to run these tasks concurrently. This function returns a Future aggregating result from the given Future or coroutine objects. This Future completes when all of the given Futures are complete.loop.run_until_complete(main())
to run the event loop until the main()
function has been completed. This starts the execution of the tasks in the executor and waits for them to complete.ThreadPoolExecutor
and asyncio
, we can make multiple Web3 requests concurrently, potentially speeding up our script significantly compared to the sequential version.
In the next section, we’ll compare the performance of this multithreaded version with the sequential version and discuss some of the considerations and best practices when using multithreading in Python.
ThreadPoolExecutor
and workersThreadPoolExecutor
and the concept of worker threads, as this is the main concept.
In Python’s concurrent.futures
module, a ThreadPoolExecutor
is a class that creates a pool of worker threads and provides methods to submit tasks to this pool. Once a task is submitted, a worker thread picks it up and executes it. When a worker thread finishes executing a task, it becomes available to pick up another task.
The parameter max_workers
defines the maximum number of worker threads the executor can use. This doesn’t mean it will always use this many threads; it won’t use more than this. If you submit more tasks than max_workers
, the executor will queue the extra tasks and execute them as worker threads become available.
Choosing the right value for max_workers
depends on the nature of the tasks and the resources available.
max_workers
to 100, which means the executor will use up to 100 threads to execute the get_balance_at_block
function concurrently. I used this number because although my machine runs a 16-core CPU, the tasks are I/O, so we can leverage the CPU idle time while waiting for the server to respond. If the task was CPU bound, we would want to cap the workers to 16. Also, after running multiple tests, this number of workers gives me the best performance between speed, resource consumption, and server response/stability. This provides a significant speedup compared to the sequential version, as while one thread is waiting for a response from the Ethereum node, other threads can send their requests or process received data.
Be aware that the Ethereum node might also have limits on how many concurrent requests it can handle. If you make too many requests at once, it might slow down and start rejecting requests. However, Chainstack does not throttle the requests, meaning you should not experience issues if you keep the requests under 3,000 requests per second.
ThreadPoolExecutor
to create a pool of worker threads. We then used these threads to send multiple requests concurrently. While one thread is waiting for a response, other threads can send their requests or process received data. This approach can be more efficient because it allows us to do more work at the same time. In my test, the multithreaded approach took only 2 seconds, about 97% faster compared to the sequential approach.get_balance_at_block
function returns a tuple (block_num, balance)
. These tuples are collected in the results
list. After all futures are completed, the results
list is sorted by block number (the first element of each tuple), and the results are printed in order.
try
/except
block. You should also include logic to delay or reduce the rate of your requests if you encounter a rate limit error.
Future.result()
method, which reraises any exception that occurred in the thread. If an exception occurs in a thread, it’s stored in the Future
object for that thread, and calling Future.result()
will raise that exception in the main thread. This allows you to handle the exception in the main thread and decide how to proceed.
Future.result()
method, which reraises any exception that occurred in the thread, allowing you to handle the exception in the main thread.
with
keyword) with ThreadPoolExecutor
automatically starts and stops the threads.