lookahead-bias-paper/experiments/README.md

# Experiments

Five experiments, run in order. Each script is `0N_<name>.py`. Scripts that
don't exist yet are listed here as a spec so the CI workflow and the paper's
Section 6 stay in sync with what's actually implemented.

| # | Script | Purpose | Status |
|---|--------|---------|--------|
| 1 | `01_generate_gbm.py` | Generate pure-noise GBM price series (fixed seed, documented params) | pending |
| 2 | `02_baseline_replication.py` | Run K12 golden hyperparameters on real BTCUSDT 1m, buggy backtester → expect Sharpe ≈ 14.49 | pending — needs `audit/input/code` |
| 3 | `03_honest_replication.py` | Same hyperparameters/data, `time_machine.py` engine → expect Sharpe ≈ -0.25 | pending — needs `audit/input/code` |
| 4 | `04_noise_control.py` | Run both engines across ≥30 independent GBM seeds, compare Sharpe distributions | pending |
| 5 | `05_noise_harness.py` | CI-gating version of experiment 4: fails the build if mean Sharpe on noise falls outside a pre-registered null band | pending |

## Reproducibility rules

- Every script must take `--seed` and print it in its output.
- Every output JSON must include: seed, kernel version/hash, library versions
  (numpy/pandas), and a UTC timestamp.
- No script reads from `audit/input/` directly in a way that would couple the
  public reproduction path to the forensic copy — `audit/input/` is for our
  own verification, not for the published reproduction instructions.

## Environment

Pin dependencies in `requirements.txt` (to be added alongside the first
script). CI installs from that file — see `.github/workflows/noise-harness.yml`.