Research Notes | Scalable Brain

Core Research Philosophy

Scalable Brain is built on one fundamental belief: trading strategies must encode deterministic mathematical edges, not black-box predictions. Machine learning is a tool for context, not for prophecy.

            Unlike retail trading bots that attempt to use ML to predict raw price movement, this system uses deterministic mathematical strategies to find potential edges, and uses ML strictly as a contextual risk-manager (Meta-Labeling).
        

Determinism Over Prediction

Every entry and exit condition is explicit, reproducible, and testable. We never let a model decide direction—only whether a known-edge strategy should be activated.

Prove It Mathematically

No strategy goes live without passing rigorous statistical tests. Expectancy, Profit Factor, and minimum trade count must all clear thresholds.

Context Is Everything

The same strategy behaves differently in different market regimes. Regime detection ensures we only deploy strategies where they have historically worked.

Risk Before Reward

Portfolio-level correlation guards and ATR-based dynamic stops ensure no single trade or hidden exposure can jeopardize the account.

Why Meta-Labeling Instead of Price Prediction?

Most retail ML trading systems attempt to predict the next candle's direction. This approach fails because:

Noise dominance: Short-term price action is predominantly noise, not signal. ML models overfit to noise patterns that don't repeat
Non-stationarity: Financial time series change their statistical properties over time. A model trained on 2020 data may fail completely in 2024
Feature leakage: Most price-prediction models inadvertently include future information in their training data

The Meta-Labeling Alternative

Instead of asking "Will price go up?", we ask: "Given that this strategy says BUY in this specific market regime, what is the historical probability of that trade succeeding?"

This is fundamentally different. The model doesn't generate trades—it evaluates whether a known-edge strategy should be activated in the current conditions. The strategy provides the direction; the model provides the context.

            Meta-labeling was popularized by Marcos López de Prado in "Advances in Financial Machine Learning" and is used by institutional quant desks to size positions and filter signals contextually.
        

Market Regime Detection Research

Financial markets cycle between distinct behavioral states. Identifying these states in real-time allows the system to match strategies to conditions.

Why K-Means Clustering?

Unsupervised: No need for labeled training data—the algorithm discovers natural groupings
Interpretable: Clusters map directly to intuitive market states (trending vs. ranging, high vs. low volatility)
Computationally efficient: Fast enough for real-time regime updates on H1 data
Feature simplicity: ATR + ADX capture the two most important market properties (volatility + trend strength)

Validation Approach

Cluster quality is measured using the Silhouette Score, which evaluates how similar data points are to their own cluster vs. neighboring clusters. Scores range from -1 to 1, with higher values indicating better-defined clusters. We target a score > 0.5 for deployment.

Data Sources & Research Process

Primary Data Source

All market data is sourced from the Oanda v20 REST API, providing institutional-quality H1 (hourly) OHLCV candles. The dataset spans from January 2008 to present, covering:

The 2008 Financial Crisis (extreme volatility regime)
Post-crisis recovery and QE periods (trending regimes)
COVID-19 market shock (March 2020)
Post-COVID inflation and rate-hike environments
Current market conditions

            18+ years of hourly data across 3 major forex pairs provides over 140,000 candles per pair—more than sufficient for statistically significant backtesting and ML training.
        

Research Pipeline

1. Hypothesis Formation

Identify a potential market inefficiency or behavioral pattern based on technical analysis theory and academic research.

2. Strategy Encoding

Translate the hypothesis into explicit, programmable entry/exit rules with standardized ATR-based risk parameters.

3. Backtesting (Layer 0)

Run the strategy against 18+ years of historical data across all target assets. Calculate Expectancy, Profit Factor, Sharpe Ratio, and Max Drawdown.

4. Regime Analysis

Evaluate strategy performance broken down by market regime to understand when and why the strategy works or fails.

5. ML Integration

If promoted, train a meta-labeling model that learns the regime-strategy interaction patterns for real-time filtering.

6. Walk-Forward Validation

Chronological out-of-sample testing ensures no data leakage. Models must perform on data they've never seen.

Key Research Principles

No Curve Fitting

We never optimize strategy parameters to maximize backtest results. Parameters are chosen based on financial theory and market microstructure reasoning, not brute-force optimization. Overfitting to historical data is the primary failure mode of quantitative systems.

Temporal Awareness

Forex markets behave differently during different trading sessions. Our feature engineering explicitly encodes:

London Session (08:00-16:00 GMT): Highest forex volume, best for trend strategies
New York Session (13:00-21:00 GMT): High volume, USD-pair catalyst
London-NY Overlap (13:00-16:00 GMT): Peak volatility window, most significant moves

Multi-Timeframe Alignment

Signals on H1 are validated against the H4 trend direction. A bullish signal on H1 that aligns with a bullish H4 trend has a higher probability of success than one that contradicts the higher timeframe.

Instrument Independence

Each asset is processed through the pipeline independently. EUR/USD might be in a Trending_HighVol regime while USD/JPY is in Ranging_LowVol. The system respects this and activates different strategies for each pair accordingly.

Influences & References

Marcos López de Prado — "Advances in Financial Machine Learning" (meta-labeling framework, triple-barrier method)
Ernest Chan — "Algorithmic Trading" (quantitative strategy development, backtesting methodology)
Richard Dennis / Turtle Traders — Donchian channel breakout systems and systematic trend following
John Bollinger — Bollinger Bands and volatility-based mean reversion theory
Ralph Vince — Portfolio risk management and position sizing