Backtesting & Realism
This page is about what the strategy tester actually does when it evaluates your rules — and where the gap between its simulation and real trading can mislead you. If you only read one page after the quick start, make...
Written By Axiom Admin
Last updated About 1 month ago
Backtesting & Realism
This page is about what the strategy tester actually does when it evaluates your rules — and where the gap between its simulation and real trading can mislead you. If you only read one page after the quick start, make it this one. The tester's assumptions are invisible until someone names them. Once you can name them, you can account for them. Until then, every equity curve is a story you are telling yourself without knowing the narrator's biases.
What the tester assumes
The strategy tester runs your YAML rules against historical price data under a specific set of assumptions. These are not optional — they are baked into how TradingView's engine works. The strategy script configures some of them, but many are fixed behaviors of the tester itself.
The fill model
Every one of these assumptions is a simplification. Real markets have continuous price movement, variable slippage, partial fills, order book depth, and execution latency. The tester has none of that. What it gives you instead is a controlled, repeatable simulation — consistent enough to compare configurations, but not accurate enough to treat as a prediction of live performance.
The intra-bar problem
This is the single most important limitation to understand.
Because the tester fills orders on standard OHLC only, it does not know the intra-bar price sequence. On a given bar, the price may have gone high → low → close, or low → high → close, or any other path. The tester does not know which path occurred. It only knows the four final values.
This means:
A take profit and a stop loss on the same bar may fill in the wrong order. If the bar's high would have hit your TP and the bar's low would have hit your SL, the tester picks one based on its internal logic — but the real market may have hit the other first.
A limit entry can still be credited with a fill based on bar extremes rather than a known tick-by-tick path. In this build, the tester requires price to move 15 ticks through the limit before it counts as filled, not merely touch it once. But it still does not know how price moved inside the bar, how long it stayed there, or what the order book looked like when it got there.
Intra-bar reversals are invisible. A bar that opens at 100, drops to 90, rallies to 110, and closes at 105 looks the same to the tester as a bar that opens at 100, rallies to 110, drops to 90, and closes at 105 — but those two paths produce very different trade outcomes for strategies with both entries and exits active.
Concrete scenario: Your strategy has a long trade open with a take profit at 105 and a stop loss at 95. A bar arrives with open 100, high 106, low 94, close 102. Both the TP and SL price levels were crossed during that bar. In real time, the price may have dropped to 94 first (triggering your stop for a loss) and then rallied to 106. Or it may have reached 106 first (filling your target for a win) before dropping to 94. The tester does not know which path occurred. It fills one based on its internal fill priority logic. On bars like this, the backtest can credit your strategy with a win that live trading would have lost, or charge it a loss that live trading would have won. Neither the tester nor you can resolve which it was after the fact.
This ambiguity matters most on volatile bars and lower timeframes where TP and SL levels are close together. If your strategy trades tight targets and stops, a significant fraction of its trades may fall into this ambiguous zone — and the backtest results for those trades are essentially a coin flip that the tester resolves consistently but not necessarily correctly.
There is no workaround for this within the standard OHLC fill model. You can enable the bar magnifier in the Properties tab, which uses lower-timeframe data to approximate intra-bar movement — but enabling it changes the fill model entirely, and results from magnifier-on and magnifier-off runs are not directly comparable.
Result mismatch causes
When your backtest results do not match what you expected — or do not match what happens in paper trading or live trading — here is where to look.
The slippage sensitivity test
This is the single most useful reality check you can run. Take your strategy exactly as configured and test it at four slippage settings:
0 ticks — the most flattering possible result
5 ticks — mild real-world friction
10 ticks — moderate real-world friction
15 ticks — the script's default
Watch what happens to net profit, profit factor, and max drawdown as slippage increases.
Healthy result: The strategy loses some profitability as slippage increases, but the core shape of the equity curve survives. The edge is wide enough to absorb realistic execution costs.
Concerning result: The strategy is profitable at 0–5 ticks but collapses at 10–15. This means the edge is thinner than the execution cost the tester is modeling. In a real market, slippage is not fixed — it varies trade by trade depending on liquidity, order size, time of day, and how fast the market is moving when your order arrives. A strategy that only survives when slippage is low will produce stretches of good performance interspersed with periods where a few expensive fills erase weeks of gains. That volatility in edge quality is what makes thin-edge strategies dangerous to automate and difficult to hold through psychologically.
Dangerous result: The strategy is only profitable at 0 slippage. This is not a strategy — it is an artifact of the fill model. Every strategy looks better with zero friction. At 0 slippage, the tester fills every market order at the exact computed price, which is a fantasy. The question is not how good the equity curve looks at 0 — the question is whether any nonzero friction destroys the curve entirely. If it does, there was never an edge in the rules. There was an edge in the assumption that execution is free.
Healthy results vs. concerning results
Not every disappointing backtest is a sign that something is broken. Some disappointments are the tester being honest with you.
Healthy signs
Lower win rate than you expected. Win rate alone says very little. A 40% win rate with a 3:1 reward-to-risk ratio is a profitable strategy. A 70% win rate with a 1:3 ratio loses money. Look at the profit factor and average trade, not the win rate in isolation.
Drawdowns in the equity curve. Every strategy draws down. If your backtest shows no drawdowns, something is wrong with your test — probably unrealistically low slippage or commission.
Fewer trades than you expected. Your conditions may be stricter than they feel when you read them as prose. Check the expression diagnostics to see whether your triggers are actually firing.
Different results on different timeframes. Your strategy's rules interact with bar construction. That interaction is real, not a bug. If results are highly timeframe-sensitive, it means the edge is partially an artifact of how bars are built at that resolution.
Concerning signs
Profitability depends entirely on one or two trades. Remove the best 5% of trades and recheck the equity curve. If it goes flat or negative, the strategy is not diversified — it is lucky. This is especially common in strategies with few total trades: 40 trades with 2 outsized winners is not a pattern, it is a sample that happened to include two big moves. The equity curve shows what it looks like when you catch those moves. It does not show you the next 40 trades, which may not include them.
Equity curve looks great but max drawdown is 40%+. Ask yourself whether you would actually hold through a 40% drawdown with real money. Most traders cannot. The backtest shows the equity curve recovering after the drawdown, which looks fine in retrospect. But while the drawdown is happening, you do not know it will recover. You are watching your account lose nearly half its value with no guarantee it comes back. If you would have turned off the strategy at -25%, the backtest's final equity number is fiction — it describes a path you would not have walked.
Results are dramatically different when you change the test window by a few months. The strategy may be fitted to the specific regime in the original window. Shift the start date forward by three months and backward by three months. If the profit factor swings from 1.8 to 0.9, the original result was telling you about that particular window, not about the strategy.
All the profit comes from one market regime. If the strategy makes money in a trending period and gives it all back in a range, you do not have a strategy — you have a regime bet without a regime filter. That can be a valid approach, but only if you also have a way to detect when the regime changes and a plan for what happens when it does.
The strategy is only profitable at 0 slippage. See the sensitivity test above.
The six most common false conclusions
What the tester cannot do
Some things are outside the scope of any historical strategy tester. The manual would be dishonest if it did not name them:
Model real liquidity. The tester fills any order at the computed price, regardless of how large the order is relative to actual market depth. A $10 million order fills identically to a $100 order.
Model order book dynamics. The tester does not know about bid-ask spreads, order book depth, or market impact beyond the fixed slippage setting.
Model execution latency. Real orders take time to reach the exchange and time to fill. The tester fills instantly.
Model partial fills. Real limit orders can partially fill. The tester fills in full or not at all.
Predict regime changes. The backtest tests against history. History does not repeat, and it does not always rhyme. A strategy that worked in a bull market may fail in a bear market, and the tester cannot tell you which regime is coming next.
These are not flaws in the tool. They are boundaries of what historical simulation can honestly do. The tool works within those boundaries. Your job is to know where they are — and to adjust how much you trust the results based on how close your strategy runs to those boundaries.
What to do with these limits:
Because the tester cannot model real liquidity, strategies that size large relative to the asset's typical volume deserve extra skepticism. The backtest fills a $500,000 order the same as a $500 order. If your position sizes are large for the market you trade, the real fills will be worse than the simulation shows.
Because the tester cannot model execution latency, strategies that depend on precise fill timing — entering at exactly the next bar's open, or exiting at exactly a stop level — will perform differently live. If the strategy's profitability depends on getting the exact simulated fill price, build in a buffer. Test at higher slippage. Assume the real fill will be slightly worse.
Because the tester cannot predict regime changes, a strategy that performed well in the test window should be assumed fragile until you have evidence it works across meaningfully different market conditions. Test on at least one additional period that includes a different regime — a different volatility level, a different trend direction, or a different consolidation structure.