Backtesting Frameworks
Build robust, production-grade backtesting systems that avoid common pitfalls and produce reliable strategy performance estimates.
When to Use This Skill
- Developing trading strategy backtests
- Building backtesting infrastructure
- Validating strategy performance
- Avoiding common backtesting biases
- Implementing walk-forward analysis
- Comparing strategy alternatives
Core Concepts
1. Backtesting Biases
| Bias | Description | Mitigation |
|---|---|---|
| Look-ahead | Using future information | Point-in-time data |
| Survivorship | Only testing on survivors | Use delisted securities |
| Overfitting | Curve-fitting to history | Out-of-sample testing |
| Selection | Cherry-picking strategies | Pre-registration |
| Transaction | Ignoring trading costs | Realistic cost models |
2. Proper Backtest Structure
Historical Data
│
▼
┌─────────────────────────────────────────┐
│ Training Set │
│ (Strategy Development & Optimization) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Validation Set │
│ (Parameter Selection, No Peeking) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Test Set │
│ (Final Performance Evaluation) │
└─────────────────────────────────────────┘
3. Walk-Forward Analysis
Window 1: [Train──────][Test]
Window 2: [Train──────][Test]
Window 3: [Train──────][Test]
Window 4: [Train──────][Test]
─────▶ Time
Detailed worked examples and patterns
Detailed sections (starting with ## Implementation Patterns) live in references/details.md. Read that file when the navigation summary above is insufficient.
Best Practices
Do's
- Use point-in-time data - Avoid look-ahead bias
- Include transaction costs - Realistic estimates
- Test out-of-sample - Always reserve data
- Use walk-forward - Not just train/test
- Monte Carlo analysis - Understand uncertainty
Don'ts
- Don't overfit - Limit parameters
- Don't ignore survivorship - Include delisted
- Don't use adjusted data carelessly - Understand adjustments
- Don't optimize on full history - Reserve test set
- Don't ignore capacity - Market impact matters