cryo + Chainstack: Developer's guide to blockchain data mastery with Python
TLDR
- Integrate cryo (a Rust-based CLI for blockchain data extraction) with Python via cryo_python for streamlined data collection and analysis.
- cryo.collect() fetches data into Python-friendly objects (Pandas, Polars, etc.) for real-time analysis, while cryo.freeze() saves data to files in JSON/Parquet/CSV.
- Combine Chainstack’s high-performance Global Nodes with Python’s libraries (pandas, matplotlib, etc.) to query, process, and visualize blockchain data with speed and precision.
Main article
In blockchain data exploration, we previously introduced you to cryo
, Paradigm’s powerful command-line interface tool. As you might recall, this tool is a beacon for developers, researchers, and blockchain enthusiasts, optimizing the process of extracting data from various blockchain networks. Our initial journey through cryo
revealed its data formatting efficiency and seamless integration with Chainstack Global Nodes.
Learn how to use cryo
and how it works with cryo: Your gateway to blockchain data.
Now, we embark on a sequel, bridging cryo
with the world of Python. This guide will show you how to use the Python wrapper made for the cryo
CLI by covering setup, basic usage, and data extraction and manipulation using common Python libraries.
Python and cryo for blockchain data manipulation
Python is known for its simplicity and data manipulation and analysis capability. The Python wrapper allows you to couple cryo
’s Rust-based efficiency for data extraction with Python’s data manipulation capabilities. This integration enhances the analytical power at your fingertips, allowing you to leverage Python’s rich library ecosystem for in-depth data analysis, visualization, and machine learning.
Prerequisites and setup
This section will lay the groundwork for integrating the cryo
tool with Python. This process involves ensuring that your system has the necessary tools and libraries and installing the Python wrapper for cryo
.
Prerequisites
Before diving into the installation process, ensure your environment is primed for the task. The following prerequisites are essential:
- Chainstack Global Node RPC: Get a high-performance Chainstack Global Node RPC before starting.
Follow these steps to deploy an Ethereum node:
To follow this guide, deploy a Standard Ethereum node, which will default to a Global Node.
Once you deploy the node, you’ll have access to an RPC endpoint, which will look like this:
Create a .env
file in your root directory and place the endpoint in it.
- Rust: Rust must be installed in your system for
cryo
to work, the Python integration is a lightweight wrapper for thecryo
CLI, so you’ll still need to meet the app’s requirements.
Install Rust following the rustup instructions.
-
Python Environment: Ensure that you have Python installed on your system and create a new virtual environment in your project’s directory; you can run the following:
Then activate the virtual environment with:
-
Required Libraries:
cryo_python
depends on several libraries, make sure to install the following libraries,Note that the
python-dotenv web3 matplotlib
libraries are not strictly required to runcryo_python
, but we’ll use them along the guide.
Installation and Setup
With the prerequisites in place, let’s move on to the installation steps:
Clone the cryo Repository: Use git to clone the cryo
repository from GitHub. If you don’t have git installed, you can download it from git.
Navigate to the Python Directory:
Build cryo_python
:
- Run the
maturin
build command: - This command will compile the Rust code and create a wheel file (.whl) for the Python package.
Install the Python Wrapper:
- Find the
.whl
file generated by maturin. It will be located in thetarget/wheels
directory. - Install the wheel file using pip:
- Replace
<PATH_TO_WHEEL_FILE>
with the actual path to the.whl
file generated, it will look like this:
Your current draft provides a solid foundation. To enhance it, we can add more context and details based on the source files, particularly focusing on the functionality and technical nuances of cryo.collect()
and cryo.freeze()
. Here’s an improved version:
Basic Usage of cryo_python
cryo_python
serves as a lightweight wrapper for the cryo
CLI offers a seamless Python interface to the powerful CLI commands. With cryo_python
users can access two principal functions that mirror their CLI counterparts:
cryo.collect()
extracts blockchain data and returns it as a Python-friendly data frame, enabling direct use within scripts for real-time analysis and manipulation.cryo.freeze()
fetches data and saves it to a file, facilitating subsequent use or long-term storage.
Explore the source files for cryo.collect() and cryo.freeze() in the GitHub repository.
cryo.collect()
Main Aspects
-
Asynchronous Support:
cryo.collect()
includes bothasync_collect
andcollect
methods, designed to operate asynchronously. This feature is vital for efficiently handling large datasets or high-throughput tasks, ensuring optimal resource utilization and performance. -
Multiple Output Formats:
cryo.collect()
allows you to organize data in various Python-friendly formats for diverse scenarios:- Polars DataFrame: Ideal for high-performance data manipulation, leveraging its fast, efficient data handling capabilities.
- Pandas DataFrame: Provides broad compatibility with Python’s extensive data analysis ecosystem.
- List of Dictionaries: Facilitates easy handling of JSON-like data structures, simplifying serialization.
- Dictionary of Lists: Offers an alternative structured data format suitable for specific data processing requirements.
cryo.freeze()
Main Aspects
- Data Type Flexibility:
cryo.freeze()
can handle single and multiple data types, showcasing its versatility in accommodating various data collection needs. - Argument Parsing: Echoing
cryo.collect()
,cryo.freeze()
also parses additional keyword arguments (*kwargs
), enhancing the customization possibilities in data collection and storage.
Usage examples
Having grasped the basics of cryo_python
, let’s get into practical examples to demonstrate its usage. Throughout this guide, we’ll consistently retrieve the RPC endpoint from a .env
file.
Ensure you have your RPC endpoint details in a .env
file for these examples.
cryo.collect
basic example
Start by creating a file named main.py
and paste the following code:
Here’s an explanation of how it works and what it does:
-
Environment Setup:
- The code starts by importing the necessary modules:
os
for environment variable management,cryo
for accessing blockchain data, andload_dotenv
from thedotenv
package to load environment variables from a.env
file. - It then loads the environment variables using
load_dotenv()
, which reads the.env
file and sets the variables.
- The code starts by importing the necessary modules:
-
Accessing Ethereum RPC Endpoint:
- The
ETH_RPC
variable, which contains the URL to an Ethereum RPC endpoint, is fetched from the environment variables usingos.getenv("ETH_RPC")
.
- The
-
Data Collection with
cryo.collect
:- The
cryo.collect
function has specific parameters to fetch data from the Ethereum blockchain. datatype
: Set to"blocks"
, indicating that the function should fetch data about blockchain blocks.blocks
: Specifies the range of blocks to fetch data for (in this case, from block18734050
to18735050
).rpc
: The Ethereum RPC endpoint URL, passed aseth_rpc
.output_format
: Set to"pandas"
, indicating that the data should be returned as a Pandas DataFrame.hex
: The boolean parameter set toTrue
will return the data already converted to hexadecimal.
- The
-
Output:
- The fetched data is stored in the variable
data
, a Pandas DataFrame. - The script then prints the column names of the DataFrame to provide an overview of the data structure.
- Finally, it prints the DataFrame
data
, showing the fetched blockchain data.
- The fetched data is stored in the variable
The result of this script is a detailed listing of data for the specified range of Ethereum blocks. The DataFrame columns represent each block’s attributes, such as block_hash
, author
, block_number
, gas_used
, extra_data
, timestamp
, base_fee_per_gas
, and chain_id
.
Here is an example of the output in the console:
Running this Python script is the equivalent of running this command from the cryo
CLI directly:
Please note that Chainstack endpoints on the Developer plan are limited to 30 RPS, so you might need to add rate limiting to your code; starting from the Growth plan, there is no rate limit.
To manage rate limits, cryo.collect
can be adjusted using the requests_per_second
parameter:
cryo.freeze
basic example
The principle of cryo.freeze
is quite similar to cryo.collect
. In a new file, paste this code:
This script uses cryo.freeze
to fetch and save the same block data as a JSON file in the specified directory. The logic and syntax closely follow the cryo
CLI. The result is a JSON file containing data for the blocks in the root/blocks_data/
directory.
Since both cryo.freeze
and cryo.collect
are just wrappers around the CLI; you can use the same commands. Let’s explore a few more examples.
Fetching ERC-20 balances with cryo
This section will guide you in using cryo_python
to retrieve ERC-20 token balances from specified addresses and contracts. We’ll get the balance of the APECoin token in the Binance address in a range of 10,000 blocks.
Start by creating a new Python file and paste the following code:
Executing this script will generate a JSON file containing the ERC-20 balance data structured as follows:
This structure, erc20_balances
efficiently organizes ERC-20 balances by block, offering a clear and accessible format for data analysis.
Check the cryo
documentation to find what other datasets you can fetch.
Fetch and manipulate blockchain data
Having explored the basic functionality of cryo_python
, let’s now get into a more advanced application by integrating it with essential Python libraries for data manipulation and visualization.
Find the top 10 block authors
In this example, we’ll fetch Ethereum blockchain data and visualize the top block authors using cryo_python
, pandas
, and matplotlib
.
In a Python file, paste the following:
Here’s a step-by-step breakdown of what this script does:
- Setting Up the Environment: We start by importing necessary libraries like
os
,time
,pandas
,matplotlib.pyplot
, andWeb3
, along withcryo
. Then, we define constants for the RPC URL, the number of blocks to look back on, and the number of top authors to display. - Fetching Blockchain Data: We define a function to determine the range of blocks to fetch based on the current block number. Another function uses
cryo.collect
to get data on these blocks and returns it as a pandas DataFrame. We track the time taken for this operation, offering insights into the performance of our data retrieval process. - Data Visualization: With the blockchain data in hand, we analyze the top block authors using a function that counts the occurrences of each author in the data. We then use
matplotlib
to create a bar chart, showcasing the top authors based on the number of blocks mined. - Executing the Script: In the
main
function, we initialize a Web3 instance, connect to the Ethereum node, fetch the block data, and, if successful, visualize the top authors. We handle potential errors, such as missing environment variables or connection issues, to ensure robustness. - Running the Code: This script is designed as a standalone program. When executed, it will display a bar chart illustrating the most active Ethereum block authors over a specified block range.
This example demonstrates how to effectively combine cryo
with other Python tools to fetch, process, and visualize Ethereum blockchain data, providing valuable insights into blockchain activity.
Here is an example of the console output and chart. The console will output something like the following:
And the chart will look like this:
Visualise ERC-20 balance changes over time
The next example we’ll work on will use the same erc20_balances
dataset used in one of the previous examples. This time, we’ll fetch and visualize how much WETH is in theWETH-USDT
pool from Uniswap V2.
In a new file, paste the following code:
Here’s a step-by-step explanation of what’s going on:
-
Fetch Block Range:
- It calculates the range of blocks to query by finding the latest block number and subtracting the lookback period to determine the start block; in the example, we analyze about a day’s worth of blocks.
-
Fetch ERC-20 Balances:
- The script fetches ERC-20 token balance data from the specified contract, wallet address, and block range. The
cryo.collect
function is called, and the data is returned in a pandas DataFrame format.
- The script fetches ERC-20 token balance data from the specified contract, wallet address, and block range. The
-
Data Conversion and Cleaning:
- A conversion function transforms balance values from Wei (the smallest unit of Ether) to Ether for readability. It handles any
None
values to avoid errors during conversion.
- A conversion function transforms balance values from Wei (the smallest unit of Ether) to Ether for readability. It handles any
-
Summarizing Data:
- The script prints out a summary of the data to the console, including the block range and the start and end balances in Ether, providing a quick overview of the dataset.
-
Data Visualization:
- It then plots the balance changes over time using
matplotlib
. The x-axis represents block numbers, and the y-axis represents the balance in Ether. - The axis tick labels are formatted for better readability, and the chart is titled with the contract and wallet address for reference.
- It then plots the balance changes over time using
Remember to adapt the request per second.
Here is an example of the result:
Graph for the balance change over a day:
As you can see, we can use cryo
to fetch data and manipulate it with Python, a very powerful combo.
Conclusion
The integration of cryo
with Python is a significant advancement for blockchain data analysis. It combines cryo
’s efficient data extraction capabilities with Python’s powerful data processing and visualization tools. This synergy, coupled with high-performance Chainstack Global Nodes, enables users to easily extract, analyze, and visualize blockchain data, making it an invaluable resource for developers, researchers, and enthusiasts in the blockchain community. The practical examples demonstrate this integration’s real-world utility, highlighting its potential to yield insightful and actionable information from complex blockchain datasets. In essence, cryo
and Python offer an effective and accessible platform for in-depth blockchain data exploration.