2026-03-24

CUSUM Filtering and Structural Breaks: What Retail Traders Miss

CUSUM filtering solves the overlooked problem of when to sample markets — not what to trade — by triggering observations only at statistically meaningful structural breaks, making it the essential first step in the AFML pipeline before any ML prediction begins.

CUSUM Filtering and Structural Breaks: What Retail Traders Miss

finmlresearch.com · Research Notes · March 2026


Most retail trading tools sample market data the same way: every minute, every hour, every day. A candle closes, a new one opens. The bar chart marches forward like a metronome, indifferent to whether anything actually happened in the market during that interval.

This is a subtle but consequential mistake — and it sits at the root of why most retail backtests overfit and most retail ML strategies fail out-of-sample.

This note explains why time-based sampling is statistically flawed, introduces the CUSUM filter as the canonical alternative from López de Prado's Advances in Financial Machine Learning, and shows how our open-source TradingView script implements it in a form retail traders can use directly. The paid version of the script extends this into a full triple-barrier labeling overlay — the next step in the AFML sampling pipeline.


The Problem With Time Bars

When you look at a daily OHLCV chart, you are looking at a series of samples drawn at fixed time intervals. This feels natural — markets open and close on a schedule, earnings come quarterly, economic data releases on a calendar. Time is the obvious dimension along which to organize market observations.

But consider what a daily bar actually captures. On a Tuesday with no news, SPY might trade 50 million shares with a price range of 0.3%. On the Tuesday after an FOMC decision, it trades 250 million shares with a 2.1% range. Both of these are represented by a single bar on the daily chart. They are treated as equivalent observations.

They are not equivalent. The information content of these two bars is radically different.

This is the core problem: time-based sampling mixes high-information and low-information intervals into a series that looks uniform but is not. The statistical consequences are serious:

  • Serial correlation. Returns on adjacent time bars are not independent — quiet days cluster with quiet days, volatile days cluster with volatile days. This violates the i.i.d. assumption that most ML models depend on.
  • Variance non-stationarity. The variance of returns changes dramatically across the series. Standard risk models that assume a fixed volatility parameter will be systematically wrong.
  • Label contamination. When you label training examples for a supervised model, adjacent time bars share information — yesterday's close appears in today's open. This creates subtle lookahead bias that inflates backtest performance.

López de Prado dedicates Chapter 2 of AFML to this problem. His solution is to replace time-based sampling with event-based sampling — only creating a new bar, and only drawing a new training observation, when something meaningful has actually happened in the market.


What Is the CUSUM Filter?

CUSUM stands for Cumulative Sum. It is a statistical technique originally developed for quality control in manufacturing — detecting when a production process has shifted from its expected behaviour. Applied to financial time series, it detects when the price process has accumulated enough deviation from its recent baseline to warrant a new observation.

The filter maintains two accumulators:

St+=max(0, St1++rt)S^+_t = \max(0,\ S^+_{t-1} + r_t)

St=min(0, St1+rt)S^-_t = \min(0,\ S^-_{t-1} + r_t)

where rt=log(Pt/Pt1)r_t = \log(P_t / P_{t-1}) is the log return at time tt.

An event is triggered when either accumulator crosses a threshold hh:

  • Upward structural break: St+hS^+_t \geq h
  • Downward structural break: SthS^-_t \leq -h

On trigger, both accumulators reset to zero. This reset is the key property: it prevents the filter from flagging consecutive events in the same trend. Each event is a genuinely new piece of information about the price process.

The threshold hh is not a fixed number. Setting it as a constant would mean the filter behaves differently in high-volatility and low-volatility regimes — flagging too many events in calm markets and too few in turbulent ones. The correct approach is an adaptive threshold:

h=kσ^th = k \cdot \hat{\sigma}_t

where σ^t\hat{\sigma}_t is the current realized volatility (estimated via a rolling standard deviation of log returns) and kk is a multiplier the user controls. Our TradingView script implements this directly: the default k=1.0k = 1.0 is a sensible starting point, with k[1.5,2.0]k \in [1.5, 2.0] recommended for daily bars where you want fewer, higher-confidence events.


Why CUSUM Events Matter for Machine Learning

The CUSUM filter is not a trading signal. This is the most common misunderstanding among traders who encounter it for the first time. The upward and downward markers it produces on the chart are not buy and sell signals. They are sampling points — moments where the market has done something structurally interesting enough to warrant a training label.

In the AFML pipeline, the CUSUM filter is step two of a six-step process:

1. Raw OHLCV data
2. Dollar bars (volume-based non-uniform bars)    ← normalises information content
3. CUSUM filter                                   ← identifies structural break events
4. Triple-barrier labeling                        ← assigns directional outcomes
5. Meta-labeling                                  ← filters low-confidence predictions
6. Feature engineering + ML model training

The output of the CUSUM filter is a set of timestamps: "train a model on what happens starting at these moments." What happens next — whether the price goes up, down, or sideways within a defined risk/reward window — is determined by the triple-barrier labeling method in step four.

This two-step separation matters enormously for out-of-sample performance. If you train on every time bar, you are training on thousands of moments when nothing structurally changed — the model learns to fit noise. If you train only on CUSUM events, every training observation corresponds to a genuine market regime shift. The signal-to-noise ratio of your training set improves dramatically.

In practice, CUSUM filtering typically reduces the number of training observations by 60–80% compared to time-based sampling. This sounds like a disadvantage — fewer data points — but it is the opposite. You are discarding low-information observations and keeping the ones that carry the most predictive content.


Reading the Script on a Live Chart

When you add the CUSUM Structural Break Detector to a TradingView chart, you will see three visual elements:

Upward event markers (▲, teal) appear below bars where the positive accumulator St+S^+_t has crossed the threshold. These mark moments where log returns have cumulatively deviated upward from zero by more than hh — the market has been persistently drifting up and has now crossed the statistical threshold for "something structural happened."

Downward event markers (▼, amber) are the symmetric case for the negative accumulator.

Volatility bands (±h around close) show the current threshold expressed in price units. When the bands are wide, the realized volatility is high and the filter requires a larger price move to trigger. When they are narrow, the market is calm and smaller moves are flagged. This gives you an immediate visual sense of the current regime.

Background shading (faint red) appears when realized volatility exceeds 1.5× its rolling median — a simple high-volatility regime indicator. Events occurring during shaded periods should be interpreted with additional caution: the filter is working harder to separate signal from noise.

The info table (bottom right) shows the live values: current σ^\hat{\sigma}, current hh, vol regime, and total event count since the chart was loaded.

Parameter guidance

ParameterDefaultNotes
Volatility lookback20 barsShorter = more reactive to current vol; longer = smoother
Threshold multiplier kk1.0Raise to 1.5–2.0 for daily bars; lower to 0.5–0.8 for intraday
Min bar gap5 barsPrevents event clustering; increase for noisier instruments

For a typical equity on daily bars, k=1.0k = 1.0 with a 20-bar lookback will produce 15–30 events per year. For BTC on 4H bars, k=1.5k = 1.5 is more appropriate given the higher baseline volatility. For intraday equity data (15m, 1H), k=0.7k = 0.7 is a reasonable starting point.


The Webhook Bridge to Your Research Lab

Every CUSUM event triggers a TradingView alert with a structured JSON payload:

{
  "source": "tradingview",
  "script": "cusum_detector",
  "direction": "up",
  "ticker": "NASDAQ:AAPL",
  "close": 213.42,
  "vol": 0.00187,
  "h": 0.00187,
  "bar_time": "2026-03-15T16:00:00Z"
}

Configure a TradingView webhook alert to POST this to your finmlresearch.com lab endpoint. The lab receives the event, appends it to your DuckDB event store, and immediately runs the next step of the AFML pipeline: triple-barrier label computation on the subsequent price action.

This closes the loop between visual chart analysis and quantitative research. You are not just seeing events on a chart — you are building a labeled dataset of structural break moments that you can feed directly into an ML training pipeline.


What the Free Script Does Not Include

The free CUSUM detector gives you the sampling layer. It tells you where to look. It does not tell you what to do at those moments.

The paid version (available at finmlresearch.com/scripts) adds:

Triple-barrier label overlay. For each CUSUM event, it draws three forward-looking price levels: a profit-take barrier above, a stop-loss barrier below, and a timeout line at a configurable horizon. The first barrier touched determines the label: +1 (profit-take), −1 (stop-loss), or 0 (timeout). This is the exact labeling scheme used in AFML Chapter 3 and is the necessary input for training any primary ML model on the events identified by the CUSUM filter.

Meta-label confidence filter. A secondary signal overlay that approximates the meta-labeling classifier output — gating which CUSUM events have historically been followed by high-confidence directional moves. This addresses the fundamental issue that triple-barrier labeling produces a heavily imbalanced label distribution (~70% neutral outcomes), which degrades naive classifier performance.

Fractional differentiation proxy. An alternative price display that applies the minimum fractional differentiation parameter dd sufficient to pass an Augmented Dickey-Fuller stationarity test. This is AFML Chapter 5 in visual form: the series retains memory (unlike a simple returns series) while being stationary enough for ML feature engineering.

Full JSON webhook export. Extended payload including realized Sharpe of recent events, volatility regime classification, and the computed triple-barrier outcome — enabling richer finmlresearch.com lab integration.


Why This Approach Is Different

The TradingView marketplace is full of AI-branded indicators. Most of them are variations on moving average crossovers, RSI derivatives, or neural networks trained to predict the next candle close. They are pattern-matching tools dressed in ML language.

The CUSUM filter is different in kind. It is not predicting price direction. It is solving a statistical problem that precedes prediction: when should you sample the market at all?

This distinction matters because the answer to "what should I trade?" depends entirely on the quality of the training data you use to develop the strategy. Training a neural network on time-sampled daily bars is like trying to learn spoken language from a transcript that includes equal weight to every pause, filler word, and silence. The model learns to fit the noise alongside the signal.

CUSUM filtering is the step that extracts the signal-bearing moments before any prediction happens. It is infrastructure for serious ML-in-finance work, not a shortcut to trading signals.

Renaissance Technologies built the most successful fund in history not because they had better predictors than everyone else, but because they built better infrastructure for generating, labeling, and testing predictions. The CUSUM filter — or its institutional equivalent — is a foundational piece of that infrastructure. This script makes it accessible to anyone with a TradingView account.


Further Reading

  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapters 2–5.
  • Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1/2), 100–115. The original CUSUM paper.
  • Bailey, D. H., Borwein, J., López de Prado, M., & Zhu, Q. J. (2014). Pseudo-mathematics and financial charlatanism. Notices of the American Mathematical Society, 61(5), 458–471. On why most backtests are false.
  • Hilpisch, Y. (2020). Reinforcement Learning for Finance. O'Reilly. The RL pipeline that CUSUM events feed into.

Get the Script

Free version (open source): TradingView — CUSUM Structural Break Detector

Paid version (triple-barrier + meta-label + webhook): finmlresearch.com

Run the full AFML pipeline on your own portfolio: finmlresearch.com


Martin · March 2026

This research note is for educational purposes. It does not constitute investment advice. Past performance of any methodology described here does not guarantee future results.