AI trading agent: Fine-tuning overview
Create domain-specific trading models through LoRA fine-tuning, synthetic data generation with GANs, and teacher-student distillation using QwQ 32B and Qwen 2.5 3B models.
Previous section: AI trading agent: Stateful agent
Project repository: Web3 AI trading agent
This section bridges the gap between pre-trained models and specialized trading intelligence. You’ve experienced the power of general-purpose models like Fin-R1, but now we’ll create (fine tune) domain-specific models trained exclusively on trading data.
Fine-tuning methodology overview
We are taking a model, in our case it’s a Qwen 2.5 3B base model for learning purposes.
And we using the LoRA technique to fine tune the base Qwen 2.5 3B model.
Check out the LoRA paper. In short, with LoRA you can fine-tune a model without modifying the entirety of it, which would not have been possible on a Mac (or probably any consumer hardware).
Data generation pipeline
Building proprietary training data through generative techniques can provide you with a wealth of data that you can mold into different scenarios relevant to your specific use case. And then you can use the data to fine tune your model on.
See the original paper Generative Adversarial Neural Networks for Realistic Stock Market Simulations.
Generating financial time series with GANs
Inspired by the original paper linked above, we use GANs to generate financial time series. This allows us to simulate financial behaviors observed in decentralized exchange (DEX) environments such as Uniswap V4.
The core of our GAN leverages a transformer-based generator architecture featuring multi-head attention (8 heads; num_heads: int = 8
in models.py
) and positional encoding. This combination maintains temporal consistency, effectively modeling realistic sequence dynamics. To capture the automated market maker (AMM) data patterns, we specifically encode Uniswap V4 swap events and liquidity usage characteristics into the model. Attention mechanisms, including cross-attention and causal masking, ensure that generated sequences remain autoregressive (This basically means that Each new token is generated based on all the previous tokens (and itself), one token at a time, with no access to future tokens.) and contextually accurate.
Our modern transformer architecture incorporates GELU activations, layer normalization, and a robust 4-layer decoder structure aligned with best practices in financial machine learning. Additionally, the model explicitly generates volume-price correlations directly from historical swap data, maintaining logical consistency throughout.
To ensure stable training, we apply several techniques. Wasserstein loss combined with gradient penalty regularization significantly enhances convergence stability. Feature matching ensures generated sequences statistically align with real-world financial data, while minibatch discrimination, diversity loss, and carefully applied instance noise effectively prevent mode collapse. Finally, financial-specific post-processing further refines the output, guaranteeing smooth, logical price transitions and maintaining market coherence.
Mac MPS PyTorch incompatibility with aten::_cdist_backward
The minibatch discrimination in the GAN discriminator in off-chain/gan/models.py
uses distance computations that trigger the aten::_cdist_backward
MPS operator.
This is not yet implemented for Apple Silicon MPS, so you’ll have to rely on CPU for the time being.
Track the issue in MPS operator coverage tracking issue (2.6+ version) #141287.
As CPU fallback, run:
Teacher-student distillation
The knowledge distillation process transfers sophisticated reasoning from large models to smaller student models.
We’re using the QwQ 32B model as our teacher model because both QwQ & Qwen (our student model) are from the same source (Alibaba), so we reasonably assume certain synergy exists between the two and will make the distillation process more reliable.
With 32 billion parameters, QwQ is the larger model against the Qwen 3B and can handle complex market analysis.
This makes knowledge transfer effective: our student models learn consistent analysis techniques, structured decision-making processes, stronger market intuition gained from extensive training data, and clear responses to challenging trading scenarios.
Chain of Draft optimization
Chain of Draft significantly improves the test-time compute efficiency by keeping each reasoning step concise, which is pretty useful for relatively fast on-chain trading.
Verification through Canary words
Canary words offer clear evidence that your model relies on newly trained knowledge instead of generic pre-trained responses. We use specific terms consistently in the training data:
- APE IN for standard “BUY” signals.
- APE OUT for standard “SELL” signals.
- APE NEUTRAL for “HOLD” or no-action recommendations.
What we are going to do further is create a synthetic dataset based on the real Uniswap V4 BASE mainnet ETH-USDC swaps and use this synthetic data to train our smaller Qwen student model based on the output from the bigger QwQ teacher model. This is a bit roundabout way just to show that it’s possible—and I think can be very useful actually for creating sophisticated trading strategies & models, but it’s not necessary, of course.