AlphaTRADER Academy

Making the Un-backtestable Backtestable

Backtesting Wyckoff

Wyckoff is interpretive. Backtesting needs mechanical rules. To verify your edge with data, you have to translate subjective Spring/UTAD/LPS recognition into deterministic conditions a script can evaluate. This lesson shows how — and where the translation introduces approximation error.

"In God we trust. All others must bring data." — W. Edwards Deming

The Subjectivity Problem

Read 5 Wyckoff books and you'll find 5 different definitions of "Spring". "Price wicks below support and immediately reverses on absorption" — try coding that. What's "support"? Last swing low? Lowest of last N bars? "Immediately" = same bar? Next bar? "Absorption" = high volume? Low spread on heavy volume?

Every interpretation is defensible. Every interpretation produces different trades. Backtesting forces you to commit to ONE specific mechanical version. The discipline of writing the rules clarifies your own thinking — even if you never run the backtest.

★ Wyckoff-to-Rules Translator

INTERACTIVE

Wybierz setup. Translator pokazuje subjective Wyckoff definition vs mechanical rule-based version + pseudo-code condition + tradeoff acknowledgment.

Subjective Wyckoff Definition

Ambiguity Sources

Mechanical Rule-Based Version

Required Inputs

Approximation tradeoff:

Walk-Forward Analysis — The Right Way

Naive backtest = optimize on history → fit perfectly → fail forward. Walk-forward simulates real trading conditions.

Naive Backtest (WRONG)

1. Take 5 years of data
2. Optimize 10 parameters until backtest is perfect
3. Get 80% win rate, 4R avg
4. Deploy live → 40% win rate, -0.5R avg
5. "Market changed" 😭

Result: curve-fit fantasy. Edge that never existed outside your spreadsheet.

Walk-Forward (CORRECT)

1. Split 5 years into IS (in-sample) + OOS (out-of-sample) windows
2. Optimize parameters on IS window 1 (e.g., 2020 H1)
3. Test those params on OOS window 1 (2020 H2)
4. Roll forward: optimize on 2020 H2, test on 2021 H1
5. Aggregate ALL OOS results — that's your real edge

Result: only OOS performance counts. Honest assessment of forward viability.

Rule of thumb: If your IS Sharpe is 3.0 and OOS Sharpe is 0.5, you have a curve-fit. If IS is 2.0 and OOS is 1.5, you have an edge. The OOS number is the only one that matters.

Edge Degradation — When Does Your System Stop Working?

Every edge eventually dies. Detection = surviving the regime change with capital intact.

Rolling Performance Window

Track win rate + avg R + Sharpe across rolling 30-trade windows. Plot the trend.

Trigger: If 60-trade rolling Sharpe drops below 50% of historical average for 3+ consecutive windows → degradation likely.

Per-Setup Decay

Some setups decay before others. Spring may still work while UTAD stops. Track per-setup win rate trends separately.

Trigger: Specific setup win rate drops >15% from historical baseline → drop that setup from playbook.

Regime Change Indicators

Macro regime shifts often precede edge degradation. New Fed cycle, new volatility regime, new market structure (algorithm changes).

Trigger: VIX 20MA crosses major threshold (e.g., 15 → 25 sustained) → re-validate strategy.

The hardest call: distinguishing degradation from normal drawdown. 30-trade losing streaks happen at 50%+ win rate. Use ATR-style adaptive thresholds, not absolute. When in doubt, halve size, don't stop entirely.

Reading Backtest Results — What Numbers Matter

Win rate alone is meaningless. These are the metrics pros judge a strategy by.

Metric	What It Tells You	Healthy	Suspicious
Sharpe Ratio (annualized)	Risk-adjusted return — return per unit volatility	1.5-2.5	>3 (curve-fit) or <0.5 (no edge)
Profit Factor	Gross profit / Gross loss	1.5-2.5	>3 (rare/suspect) or <1.2 (thin)
Max Drawdown	Largest peak-to-trough loss	<25% (psychological survival zone)	>40% (hard to trade live)
Recovery Factor	Net profit / Max drawdown	>3 (resilient)	<1 (drawdown ate everything)
SQN (Van Tharp)	System Quality Number	2.0-2.5 (excellent)	>3 (verify) or <1.5 (marginal)
Total Trades	Statistical significance of results	200+ trades	<50 (not significant)
Max Consecutive Losses	Worst losing streak — psychological test	<7	>12 (most can't survive emotionally)
Win Rate	Frequency of winners	Style-dependent (40-65% normal)	>75% (verify samples) or <30%

Common Backtesting Traps

Where backtests lie convincingly. Avoid these or your "proven system" is fiction.

Look-ahead bias

Using data not yet available at trade time. Classic example: deciding entry based on bar's CLOSE then assuming you entered at that close. In reality you only knew the close after the bar finished — entry should be on next bar open.

Survivorship bias

Backtesting only stocks that exist today excludes companies that went bankrupt. SP500 backtest from 2000-2024 will look amazing if you only test current SP500 members (Lehman, Bear Stearns, Enron all excluded).

Overfitting (curve-fitting)

Optimizing 10+ parameters until backtest is perfect. With enough parameters you can fit any historical noise. Rule: max 3-5 parameters, all with logical justification, not just numerical convenience.

Ignoring slippage + commissions

Pure-price backtest assumes you fill at exact mid-price with zero costs. Reality: 0.5-2 pip slippage per trade + spread + commission. Strategy with 1.5R avg can become break-even after costs.

Insufficient sample size

"Backtested 30 trades, 70% win rate!" = statistical noise. You need 200+ trades for any meaningful conclusion. Daily Wyckoff swings = years of data minimum.

Backtesting on too few instruments

Strategy that works on EUR/USD 2015-2020 may bomb on GBP/JPY. Test across 8-10 instruments minimum. If it only works on one instrument → curve fit, not edge.

No regime stratification

Aggregated 80% win rate may hide that strategy made 100% returns in 2017 trending market and -30% in 2018 chop. Stratify results by VIX regime / trending-vs-ranging market type.

Backtesting Tools — Pick Per Skill Level

No "best" tool — depends on your coding skills and rigor needs.

Tool	Skill Required	Pros	Cons
TradingView Pine Script	Beginner — beginner+	Visual, fast iteration, integrated charts	Limited stats, no walk-forward natively
Python + backtesting.py / vectorbt	Intermediate	Free, flexible, full stats, walk-forward easy	Slower iteration, no visual feedback
QuantConnect / Backtrader	Intermediate-Advanced	Industry-grade, real broker API, Monte Carlo	Steeper learning curve, paid tiers for premium data
Custom (numpy + pandas)	Advanced	Total control, scriptable, max performance	Reinvent every wheel, manual stats calc
Excel / Spreadsheet	Anyone	Free, transparent, manual control	Tedious for >100 trades, no automation

Recommended for Wyckoff traders: TradingView Pine Script for initial idea exploration → Python (backtesting.py) for serious validation with walk-forward. Avoid spreadsheet for any sample >50 trades.

Journal & Analytics

Forward journal = real-time backtest. Same metrics — Sharpe, PF, SQN, Recovery Factor.

Position Sizing

Backtest results feed Risk-of-Ruin + Kelly. Real numbers, not theoretical.

Failed Schematics

Backtest reveals failure rates per setup — informs invalidation thresholds.

Test Your Understanding

4 questions — instant feedback, no scoring stored.