Backtesting & Realism

Written By Axiom Admin

Last updated About 1 month ago

Backtesting & Realism

This page is about what the strategy tester actually does when it evaluates your rules — and where the gap between its simulation and real trading can mislead you. If you only read one page after the quick start, make it this one. The tester's assumptions are invisible until someone names them. Once you can name them, you can account for them. Until then, every equity curve is a story you are telling yourself without knowing the narrator's biases.

What the tester assumes

The strategy tester runs your YAML rules against historical price data under a specific set of assumptions. These are not optional — they are baked into how TradingView's engine works. The strategy script configures some of them, but many are fixed behaviors of the tester itself.

The fill model

Assumption	Default	What it means
Fill on standard OHLC	Yes	The tester only sees four prices per bar: open, high, low, close. It does not know what happened between them.
Process orders on next bar's open	Yes	When an entry condition fires on bar N, the MARKET entry executes at bar N+1's open. However, MARKET exits use `immediately = true` and execute on the same bar the trigger fires. LIMIT/STOP exits are submitted as working orders.
Slippage	15 ticks	Every fill is worsened by 15 ticks. MARKET buys fill 15 ticks above the execution price. MARKET sells fill 15 ticks below.
Limit fill assumption	15 ticks	A limit order is not considered filled unless the price moves 15 ticks past the limit price during the bar.
Commission	0.1% per trade	Applied to every entry and exit fill.
Bar magnifier	Off	The tester does not use lower-timeframe data to reconstruct intra-bar price movement.

Every one of these assumptions is a simplification. Real markets have continuous price movement, variable slippage, partial fills, order book depth, and execution latency. The tester has none of that. What it gives you instead is a controlled, repeatable simulation — consistent enough to compare configurations, but not accurate enough to treat as a prediction of live performance.

The intra-bar problem

This is the single most important limitation to understand.

Because the tester fills orders on standard OHLC only, it does not know the intra-bar price sequence. On a given bar, the price may have gone high → low → close, or low → high → close, or any other path. The tester does not know which path occurred. It only knows the four final values.

This means:

A take profit and a stop loss on the same bar may fill in the wrong order. If the bar's high would have hit your TP and the bar's low would have hit your SL, the tester picks one based on its internal logic — but the real market may have hit the other first.
A limit entry can still be credited with a fill based on bar extremes rather than a known tick-by-tick path. In this build, the tester requires price to move 15 ticks through the limit before it counts as filled, not merely touch it once. But it still does not know how price moved inside the bar, how long it stayed there, or what the order book looked like when it got there.
Intra-bar reversals are invisible. A bar that opens at 100, drops to 90, rallies to 110, and closes at 105 looks the same to the tester as a bar that opens at 100, rallies to 110, drops to 90, and closes at 105 — but those two paths produce very different trade outcomes for strategies with both entries and exits active.

Concrete scenario: Your strategy has a long trade open with a take profit at 105 and a stop loss at 95. A bar arrives with open 100, high 106, low 94, close 102. Both the TP and SL price levels were crossed during that bar. In real time, the price may have dropped to 94 first (triggering your stop for a loss) and then rallied to 106. Or it may have reached 106 first (filling your target for a win) before dropping to 94. The tester does not know which path occurred. It fills one based on its internal fill priority logic. On bars like this, the backtest can credit your strategy with a win that live trading would have lost, or charge it a loss that live trading would have won. Neither the tester nor you can resolve which it was after the fact.

This ambiguity matters most on volatile bars and lower timeframes where TP and SL levels are close together. If your strategy trades tight targets and stops, a significant fraction of its trades may fall into this ambiguous zone — and the backtest results for those trades are essentially a coin flip that the tester resolves consistently but not necessarily correctly.

There is no workaround for this within the standard OHLC fill model. You can enable the bar magnifier in the Properties tab, which uses lower-timeframe data to approximate intra-bar movement — but enabling it changes the fill model entirely, and results from magnifier-on and magnifier-off runs are not directly comparable.

Result mismatch causes

When your backtest results do not match what you expected — or do not match what happens in paper trading or live trading — here is where to look.

What you notice	Most likely cause	Severity
Backtest profit is much higher than paper trading profit	OHLC fills do not model intra-bar sequence. Stops and targets may fill in an order that did not match real-time price movement.	High
Backtest profit does not survive live trading	Fixed slippage model (15 ticks) does not capture real slippage, which varies by asset, time of day, liquidity, and order size.	High
Results change when you switch chart timeframe	Different OHLC bars mean different fill assumptions, different expression evaluation timing, and different bar boundaries for confirmation counters.	Medium
Results change when you extend or shorten the test window	Longer history may include regime changes. A strategy optimized for a trending market will underperform in a range-bound segment, and vice versa.	Medium
Results change between standard OHLC fills and bar magnifier	Bar magnifier uses lower-timeframe data, which changes fill order, fill price, and the sequence of order processing.	Medium
Results change overnight without any user changes	TradingView recalculates the strategy when new bars arrive. Tokens that reference `timenow` or bar-state booleans evaluate differently on historical bars than they did in real-time.	Low–Medium
Equity curve looks dramatically different at 0 slippage vs. 15 ticks	The strategy's edge is thinner than the execution cost. The profitability was coming from the fill model's generosity, not from the rules.	High

The slippage sensitivity test

This is the single most useful reality check you can run. Take your strategy exactly as configured and test it at four slippage settings:

0 ticks — the most flattering possible result
5 ticks — mild real-world friction
10 ticks — moderate real-world friction
15 ticks — the script's default

Watch what happens to net profit, profit factor, and max drawdown as slippage increases.

Healthy result: The strategy loses some profitability as slippage increases, but the core shape of the equity curve survives. The edge is wide enough to absorb realistic execution costs.

Concerning result: The strategy is profitable at 0–5 ticks but collapses at 10–15. This means the edge is thinner than the execution cost the tester is modeling. In a real market, slippage is not fixed — it varies trade by trade depending on liquidity, order size, time of day, and how fast the market is moving when your order arrives. A strategy that only survives when slippage is low will produce stretches of good performance interspersed with periods where a few expensive fills erase weeks of gains. That volatility in edge quality is what makes thin-edge strategies dangerous to automate and difficult to hold through psychologically.

Dangerous result: The strategy is only profitable at 0 slippage. This is not a strategy — it is an artifact of the fill model. Every strategy looks better with zero friction. At 0 slippage, the tester fills every market order at the exact computed price, which is a fantasy. The question is not how good the equity curve looks at 0 — the question is whether any nonzero friction destroys the curve entirely. If it does, there was never an edge in the rules. There was an edge in the assumption that execution is free.

Healthy results vs. concerning results

Not every disappointing backtest is a sign that something is broken. Some disappointments are the tester being honest with you.

Healthy signs

Lower win rate than you expected. Win rate alone says very little. A 40% win rate with a 3:1 reward-to-risk ratio is a profitable strategy. A 70% win rate with a 1:3 ratio loses money. Look at the profit factor and average trade, not the win rate in isolation.
Drawdowns in the equity curve. Every strategy draws down. If your backtest shows no drawdowns, something is wrong with your test — probably unrealistically low slippage or commission.
Fewer trades than you expected. Your conditions may be stricter than they feel when you read them as prose. Check the expression diagnostics to see whether your triggers are actually firing.
Different results on different timeframes. Your strategy's rules interact with bar construction. That interaction is real, not a bug. If results are highly timeframe-sensitive, it means the edge is partially an artifact of how bars are built at that resolution.

Concerning signs

Profitability depends entirely on one or two trades. Remove the best 5% of trades and recheck the equity curve. If it goes flat or negative, the strategy is not diversified — it is lucky. This is especially common in strategies with few total trades: 40 trades with 2 outsized winners is not a pattern, it is a sample that happened to include two big moves. The equity curve shows what it looks like when you catch those moves. It does not show you the next 40 trades, which may not include them.
Equity curve looks great but max drawdown is 40%+. Ask yourself whether you would actually hold through a 40% drawdown with real money. Most traders cannot. The backtest shows the equity curve recovering after the drawdown, which looks fine in retrospect. But while the drawdown is happening, you do not know it will recover. You are watching your account lose nearly half its value with no guarantee it comes back. If you would have turned off the strategy at -25%, the backtest's final equity number is fiction — it describes a path you would not have walked.
Results are dramatically different when you change the test window by a few months. The strategy may be fitted to the specific regime in the original window. Shift the start date forward by three months and backward by three months. If the profit factor swings from 1.8 to 0.9, the original result was telling you about that particular window, not about the strategy.
All the profit comes from one market regime. If the strategy makes money in a trending period and gives it all back in a range, you do not have a strategy — you have a regime bet without a regime filter. That can be a valid approach, but only if you also have a way to detect when the regime changes and a plan for what happens when it does.
The strategy is only profitable at 0 slippage. See the sensitivity test above.

The six most common false conclusions

What the user concludes	Why it is wrong	What to do instead
"70% win rate means it works."	Win rate without average-win/average-loss ratio is meaningless. A 70% win rate with a 3:1 loss-to-win ratio loses money.	Always evaluate win rate alongside profit factor, average trade, and max drawdown.
"The equity curve goes up, so the strategy is profitable."	The curve may depend on a few outsized wins. Remove those trades and the curve may be flat or negative.	Check trade clustering. Look at the worst 10% of trades. Ask what the curve looks like without the best 5%.
"I optimized and found the best settings."	Optimization finds settings that fit the historical data. Those settings may not generalize.	Test "optimal" settings on a held-out period. If performance degrades significantly, the optimization found noise. See Optimization.
"It works on this timeframe, so it works."	Strategy behavior is timeframe-dependent. OHLC resolution, fill timing, and expression evaluation all change.	Test on at least two timeframes. High timeframe sensitivity suggests the edge is an artifact of bar construction.
"Twenty setups cover every condition, so I am diversified."	More setups can mean more curve-fitting, not more diversification. Each setup is another degree of freedom the optimizer can exploit.	Start with one or two setups. Add complexity only when you can explain what the new setup captures that existing ones do not.
"Risk controls are set, so I am safe."	Risk controls trigger after damage occurs. They halt further bleeding — they do not prevent it.	Treat risk controls as backstops. Primary risk management lives in position sizing, entry logic, and exit discipline.

What the tester cannot do

Some things are outside the scope of any historical strategy tester. The manual would be dishonest if it did not name them:

Model real liquidity. The tester fills any order at the computed price, regardless of how large the order is relative to actual market depth. A $10 million order fills identically to a $100 order.
Model order book dynamics. The tester does not know about bid-ask spreads, order book depth, or market impact beyond the fixed slippage setting.
Model execution latency. Real orders take time to reach the exchange and time to fill. The tester fills instantly.
Model partial fills. Real limit orders can partially fill. The tester fills in full or not at all.
Predict regime changes. The backtest tests against history. History does not repeat, and it does not always rhyme. A strategy that worked in a bull market may fail in a bear market, and the tester cannot tell you which regime is coming next.

These are not flaws in the tool. They are boundaries of what historical simulation can honestly do. The tool works within those boundaries. Your job is to know where they are — and to adjust how much you trust the results based on how close your strategy runs to those boundaries.

What to do with these limits:

Because the tester cannot model real liquidity, strategies that size large relative to the asset's typical volume deserve extra skepticism. The backtest fills a $500,000 order the same as a $500 order. If your position sizes are large for the market you trade, the real fills will be worse than the simulation shows.
Because the tester cannot model execution latency, strategies that depend on precise fill timing — entering at exactly the next bar's open, or exiting at exactly a stop level — will perform differently live. If the strategy's profitability depends on getting the exact simulated fill price, build in a buffer. Test at higher slippage. Assume the real fill will be slightly worse.
Because the tester cannot predict regime changes, a strategy that performed well in the test window should be assumed fragile until you have evidence it works across meaningfully different market conditions. Test on at least one additional period that includes a different regime — a different volatility level, a different trend direction, or a different consolidation structure.