ALPHAFACTORY · INTERNAL THESIS

How we think, and how the lab is built.

AlphaFactory is a solo-operated systematic trading lab. It is opinionated on purpose: every architectural choice was made by asking "what failure mode does this prevent?" first, and "what does it let us do?" second. This page is the short version of why.

Market · US equities only Engine · Nautilus Trader (hybrid) Budget · $250–500/mo Mode · Hands-off, weekly review Operator: solo retail · Python 3.12 stack

01 · THE REFLECTION

Four pivots that shaped the lab

Each was a moment where the obvious or seductive option got replaced by a more boring, more durable one.

Pivot 01 · Infrastructure

Don't build custom infrastructure before you've found an edge.

Custom backtester, OMS, broker adapters, the lot

→

Nautilus Trader as engine + thin custom layer (regime, risk, journal)

The classic solo-trader trap is spending six months on infrastructure that any open-source engine already gives you for free. Nautilus gives backtest=live code equivalence, which eliminates an entire class of bugs. We write code only for the differentiated parts.

Pivot 02 · Scope

Two markets is two problems. Pick one.

Equities and crypto in parallel

→

US equities only (SPY, QQQ, liquid tech) for the first 6+ months

Different hours, different fees, different regulatory surface, different data vendors, different microstructure. Doing both simultaneously doesn't double the work — it more than doubles it. Crypto is deferred until the equities pipeline runs cleanly end-to-end.

Pivot 03 · Modeling

Deep learning for price prediction is the trap, not the edge.

LSTM / Transformer / DeepAR / Autoencoders for price prediction

→

GARCH for vol regime · Isolation Forest for data QA · small RF as signal filter only

Effective independent sample count on 5y of 5-min SPY is in the low thousands — nowhere near enough for thousands of model parameters. Complex models on low signal-to-noise data manufacture exactly the overfit backtest that walk-forward + Monte Carlo gates are designed to catch. We refuse to build that trap.

Pivot 04 · Hunting

Prefer recent live evidence over decayed textbook factors.

Classical academic factors (Fama-French, Jegadeesh-Titman, etc.)

→

Strategies with documented edge in the last 1–3 months of live trading

Half-life of a publicly known edge in liquid US equities is typically <2 years. Strategies still working on fresh data are more likely to still work when we test them. Every strategy spec carries an "evidence of recent edge" citation in its header.

02 · THE GUARDRAILS

Seven principles, non-negotiable

If a proposal violates any of these, it gets pushed back — even if it would technically work.

Process > result

A good trade that loses is fine. A bad trade that wins is bad process.

Risk > entry quality

Sizing, stops, and kill switches matter more than signal cleverness.

Regime-aware

A strategy is never universally good — only good in a regime. Tag every signal.

AI does not auto-trade

AI codes, tests, audits, summarizes. AI does not send orders unsupervised.

Backtests lie by default

Control for overfitting, regime drift, fees, slippage, survivorship, lookahead.

Operator is part of the system

The lab must actively prevent FOMO, revenge trades, oversizing, manual overrides.

No live without gates

All validation passed · ≥30 paper-trading days · reconciliation clean · kill switches tested.

Honest critique invited

Operator explicitly asks for pushback. Flag tradeoffs, don't hide them.

03 · THE PIPELINE

From idea to capital, with three gates in the way

Two gates run automatically. The middle one — paper to live — requires a human signature, always.

Stage 01

Hunt & spec

Find candidates with recent live edge. Write a spec with regime tags and risk caps.

Stage 02

Backtest

Walk-forward + Monte Carlo. Survivorship and lookahead controls.

Auto
backtest → paper

Stage 03

Paper trade

≥30 days. Daily reconciliation. Regime coverage logged.

Human sign-off
paper → live small

Stage 04

Live small

Real money, tiny size. Sharpe ≥ 0.3 to clear the floor.

AI proposes,
operator confirms

Stage 05

Live full

Daily demotion check. Any regime drift → instant downgrade.

Automatic gate — AI runs the check on every backtest / weekly on active strategies Human-in-the-loop — operator must sign off before money moves Quality bars (Sharpe ≥ 1.0 paper, profit factor ≥ 1.3, ≥30 trades, WF + MC survival) are not risk-tolerance dials.

04 · THE STACK

What's in the box, and what we deliberately left out

Boring stack, picky exclusions. Each "no" on the right was a real conversation, not an oversight.

Tech stack — bought vs. built

Engine

Nautilus Trader

Backtest, OMS, execution sim, live trading — same code path.

Data

Polygon · Developer tier

Intraday + historical US equities. Unlocks real 5-min bars.

Broker

Alpaca (paper) → IBKR

Paper-first for ≥30d. IBKR is the eventual upgrade.

Storage

DuckDB + Parquet

Local-first analytics. Fast, cheap, reproducible.

Built · Regime

ADX × ATR + GARCH overlay

Tags every bar with regime, vol bucket, trend bucket.

Built · Risk

Risk policy enforcer

Sizing, stops, kill switches. Signal gen never bypasses it.

Built · Journal

Daily / weekly journal

Adherence-to-plan scoring, not just P&L.

Built · Surface

Streamlit + FastAPI

Live artifact dashboards, weekly "needs attention" digest.

What is not in the lab

×
LSTM / Transformer / DeepAR for price prediction. Sample-count math doesn't work for solo retail.
×
ARIMA on prices. Returns are approximately random walk — ARIMA predicts near zero and gets eaten by costs.
×
Autoencoders / HDBSCAN for trade decisions. Too unstable on small samples.
×
Crypto, options, futures — deferred until equities pipeline is clean end-to-end.
×
"Holy grail" / secret strategies. If it were that good, it wouldn't be posted publicly.
×
Custom backtester / OMS / broker adapters. Nautilus already does these. We refuse to rebuild them.
×
AI with order-send authority. Ever. No exceptions, no overrides.

05 · THE OPERATING MODEL

Who does what, on what cadence

Default is autonomous. The human shows up at the gates, the keys, and the capital decisions.

AI runs daily

Scheduled data pulls (nightly / weekly)
Auto-run backtests on every new strategy as soon as it's written
Strategy hunting from public sources, with recent-edge citations
Walk-forward + Monte Carlo on every candidate
Paper-trading monitor as a live artifact dashboard
Weekly "what happened, what needs your attention" digest
G1 auto-promotion · G2/G3 weekly check · demotion check daily

Operator decides

Real-money go-live decisions (always)
Any touch of Alpaca credentials or live API keys
Architectural pivots that contradict prior commitments
Strategies that hit G2 → "READY TO REVIEW" notification
Per-decision overrides on graduation criteria (with journal entry + cool-down)
Capital allocation across the live book

06 · THE GRAVEYARD

Two strategies tested. Both killed. Zero capital at risk.

The point isn't that they failed. The point is they failed at gate one, in backtest, before a single dollar moved.

☠ GRAVEYARD'D · 2026-05-25

Range Mean Reversion

5-min SPY · 2020 → 2026 · 123,319 bars · 1× notional · 0.25% risk/trade

4trades total

−0.16Sharpe (ann.)

−1.84%total return

Cause of death

Spec too restrictive — fired ~0.8 signals/year. Sample too small to evaluate, let alone trade. Not a strategy failure — a spec failure the lab correctly killed on day one.

☠ GRAVEYARD'D · 2026-05-25

Intraday Momentum · SPY

30-min SPY · 2020 → 2026 · 22,725 bars · 901 trades fired

−60.2%total return

−12.0Sharpe (ann.)

1,710blocked by 1× cap

Cause of death

Designed for leverage we don't allow — 40% of signals blocked by the notional cap. Brutal fee sensitivity at tight intraday stops. No regime where edge beats cost.

·· WHAT TWO FAILURES TAUGHT US ··

LESSON 01

Signal frequency is a pre-filter, not a discovery

4 signals in 5 years isn't "selective" — it's untestable. New specs now carry an expected-signals-per-year estimate before code is written. Under 30 → shelved.

LESSON 02

The fee/leverage trap is real and expensive

Profitable before costs and with leverage = profitable nowhere we operate. New floor for every spec: R:R ≥ 2 or daily/swing timeframe.

LESSON 03

Negative results carry information

Regime-sliced P&L, exit-reason breakdowns, signal-skip counters. Why it failed — "1,710 signals blocked by notional cap" — is the lesson. Not just the equity curve.

LESSON 04

Two kills in two attempts = the lab working

Validation gates exist to catch broken specs before paper trading, and paper before live. Both died at gate one with zero real money risked. System functioning as designed.