Create domain-specific trading models through LoRA fine-tuning, synthetic data generation with GANs, and teacher-student distillation using QwQ 32B and Qwen 2.5 3B models.
num_heads: int = 8
in models.py
) and positional encoding. This combination maintains temporal consistency, effectively modeling realistic sequence dynamics. To capture the automated market maker (AMM) data patterns, we specifically encode Uniswap V4 swap events and liquidity usage characteristics into the model. Attention mechanisms, including cross-attention and causal masking, ensure that generated sequences remain autoregressive (This basically means that Each new token is generated based on all the previous tokens (and itself), one token at a time, with no access to future tokens.) and contextually accurate.
Our modern transformer architecture incorporates GELU activations, layer normalization, and a robust 4-layer decoder structure aligned with best practices in financial machine learning. Additionally, the model explicitly generates volume-price correlations directly from historical swap data, maintaining logical consistency throughout.
To ensure stable training, we apply several techniques. Wasserstein loss combined with gradient penalty regularization significantly enhances convergence stability. Feature matching ensures generated sequences statistically align with real-world financial data, while minibatch discrimination, diversity loss, and carefully applied instance noise effectively prevent mode collapse. Finally, financial-specific post-processing further refines the output, guaranteeing smooth, logical price transitions and maintaining market coherence.
aten::_cdist_backward
The minibatch discrimination in the GAN discriminator in off-chain/gan/models.py
uses distance computations that trigger the aten::_cdist_backward
MPS operator.
This is not yet implemented for Apple Silicon MPS, so you’ll have to rely on CPU for the time being.
Track the issue in MPS operator coverage tracking issue (2.6+ version) #141287.