Transform raw Uniswap V4 blockchain data into synthetic datasets using GANs and implement teacher-student distillation to create specialized trading models with MLX-LM fine-tuning.
Previous section: AI trading agent: Fine-tuning overview
Project repository: Web3 AI trading agent
This section transforms the raw blockchain data—the real Uniswap V4 BASE mainnet ETH-USDC swap—into synthetic datasets for molding our base or instruct model into a specialized trading model. Using Generative Adversarial Networks (GANs), you’ll create diverse market scenarios that enhance model robustness while maintaining statistical authenticity to real Uniswap V4 trading patterns.
BASE blockchain, especially through Uniswap V4 smart contract events, offers detailed trading information. This includes swap events showing full transaction details like amounts, prices, and participants; pool state updates such as liquidity changes and fees; price movements captured at tick-level precision; and volume data reflecting activity over various periods.
This is the real data we can collect and that our non-fine-tuned model acts on; this is the same data that we can use to actually fine-tune our model on to make it more specialized and get the larger model’s wisdom to shove it into the smaller more nimble model; this is also the same data that we can (and will) use to build our synthetic dataset on.
Make sure you have the following set up in config.py
:
Collect the trading data from BASE mainnet:
Process collected data for optimal training performance:
The processing does a bunch: normalizes prices—addressing differences like USDC’s 6 decimals and ETH’s 18 decimals. This ensures we accurately calculate ETH/USDC exchange rates and convert amounts into standardized, human-readable formats.
Then it structures the data into sequential patterns for GAN training, and identifies extreme price movements to handle outliers.
The processed data is saved to data/processed/processed_swaps.csv
with an optimized structure.
GANs provide the actual engine for synthetic data generation, enabling creation of any market scenarios you need beyond historical limitations.
We use a Wasserstein GAN with Gradient Penalty (WGAN-GP) architecture. By the way, this where we are still following the ideas and research presented in Generative Adversarial Neural Networks for Realistic Stock Market Simulations.
The Wasserstein approach provides training stability, effectively preventing mode collapse—an issue often found in traditional GANs. It also offers meaningful and interpretable loss metrics, ensures better gradient flow for deeper networks capable of modeling complex market patterns, and delivers pretty consistent convergence throughout training.
Gradient penalty specifically enforces the Lipschitz constraint to make the GAN training stable.
Code organization and modularity
Our GAN implementation provides a comprehensive framework in off-chain/gan/
:
Component breakdown
models.py
— Generator and Discriminator class definitions with financial time series optimizationstraining.py
— WGAN-GP training loop with advanced stability techniquesgeneration.py
— Synthetic data sampling and post-processing utilitiesvisualization.py
— Training progress monitoring and data quality visualizationTime series optimization
The generator network incorporates financial market-specific design elements:
Temporal structure preservation
Financial pattern awareness
Authentication sophistication
The discriminator employs advanced techniques for detecting synthetic data:
Multi-scale analysis
Financial realism validation
Execute comprehensive synthetic data creation
Generate enhanced training datasets with controlled characteristics:
First, train the GAN model (if you haven’t already):
Or with the --quick-test
flag:
Mac MPS PyTorch incompatibility with aten::_cdist_backward
The minibatch discrimination in the GAN discriminator in off-chain/gan/models.py
uses distance computations that trigger the aten::_cdist_backward
MPS operator.
This is not yet implemented for Apple Silicon MPS, so you’ll have to rely on CPU for the time being.
Track the issue in MPS operator coverage tracking issue (2.6+ version) #141287.
Then generate synthetic data:
Flexible training modes
The system supports multiple training configurations for different use cases:
Quick test mode for rapid iteration
Full training mode for production quality
Our validation script includes a distribution analysis, where we use Kolmogorov-Smirnov tests to statistically confirm the equivalence between the real and synthetic data distributions. Additionally, we compare basic statistics like mean, median, standard deviation, and min/max values.
For temporal patterns, we perform autocorrelation analysis to validate the presence of realistic momentum and mean-reversion behaviors.
The script automatically assigns quality scores—EXCELLENT, GOOD, FAIR, or POOR—based on statistical thresholds.
To validate synthetic data, run:
This section implements knowledge transfer from a larger language model to a smaller one. Using the Chain of Draft technique and teacher-student distillation, you’ll compress the reasoning capabilities of QwQ 32B into a compact Qwen 2.5 3B model optimized for trading decisions.
Example transformation
Traditional verbose reasoning:
Chain of Draft optimization:
Access sophisticated teacher models through OpenRouter’s infrastructure for cost-effective distillation.
Establish connection to QwQ 32B through OpenRouter:
config.py
:Generate training examples through structured teacher model interaction:
Convert & clean up raw teacher outputs into optimized training datasets for student model fine-tuning. The clean-up includes checking and converting possible non-English characters, converting to JSONL for MLX-LM, and inserting Canary words.
Prepare teacher responses for efficient MLX-LM training:
We’ll use Canary words as a method to confirm that our model truly leverages trained knowledge instead of relying on generic pre-trained responses.
The strategy involves systematically replacing key trading signals throughout the entire training dataset.
We substitute all “BUY” recommendations with the phrase APE IN, “SELL” with APE OUT, and “HOLD” or neutral stance—such as periods of market uncertainty or consolidation—we with APE NEUTRAL.
MLX-LM is an Apple Silicon-optimized training package. If you are running on a different hardware set or operating system, shop around for other packages, like Unsloth.
We are using LoRA for fine-tuning. In short, with LoRA you can fine-tune a model without modifying the entirety of it, which would not have been possible on a Mac (or probably any consumer hardware).
The teacher_lora_config.yaml
file defines comprehensive training parameters:
Model specifications
Training parameters
Data pipeline settings
Execute the fine-tuning using LoRA methodology & MLX-LM:
This will result in the LoRA generated delta as adapters.safetensors
checkpoint files and the final file in the off-chain/models/trading_model_lora/
directory.
Validate the fine-tuned LoRA delta by loading the base model (Qwen/Qwen2.5-3B
in our case; correct to your model name if using a different one) with the created adapters.safetensors
file:
Check the response and look for the Canary words too.