Hito 1: scaffold del paper (estructura, LaTeX revtex4-2, CI, licencias duales)
Some checks failed
noise-harness / run-noise-harness (push) Failing after 1m54s
Some checks failed
noise-harness / run-noise-harness (push) Failing after 1m54s
This commit is contained in:
parent
c1dab78cc7
commit
af990122d1
17 changed files with 361 additions and 1 deletions
31
.github/workflows/noise-harness.yml
vendored
Normal file
31
.github/workflows/noise-harness.yml
vendored
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
name: noise-harness
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
run-noise-harness:
|
||||
runs-on: docker
|
||||
container:
|
||||
image: node:20-bookworm
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
run: |
|
||||
apt-get update -qq && apt-get install -y -qq python3 python3-pip python3-venv
|
||||
python3 -m venv .venv
|
||||
. .venv/bin/activate
|
||||
pip install -q -r experiments/requirements.txt
|
||||
if: hashFiles('experiments/requirements.txt') != ''
|
||||
|
||||
- name: Run noise harness
|
||||
run: |
|
||||
if [ -f experiments/05_noise_harness.py ]; then
|
||||
. .venv/bin/activate
|
||||
python3 experiments/05_noise_harness.py
|
||||
else
|
||||
echo "noise harness script not present yet — placeholder pass"
|
||||
fi
|
||||
29
.gitignore
vendored
Normal file
29
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
# Python
|
||||
__pycache__/
|
||||
*.pyc
|
||||
.venv/
|
||||
venv/
|
||||
*.egg-info/
|
||||
|
||||
# LaTeX
|
||||
*.aux
|
||||
*.bbl
|
||||
*.bbl-SAVE-ERROR
|
||||
*.blg
|
||||
*.fdb_latexmk
|
||||
*.fls
|
||||
*.log
|
||||
*.out
|
||||
*.synctex.gz
|
||||
*.toc
|
||||
*.run.xml
|
||||
paper/main.pdf
|
||||
|
||||
# Large data — never commit raw market data
|
||||
data/*.parquet
|
||||
data/*.csv
|
||||
data/raw/
|
||||
|
||||
# Local experiment scratch
|
||||
results/raw/local-*
|
||||
.DS_Store
|
||||
8
AUTHORS.md
Normal file
8
AUTHORS.md
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
# Authors
|
||||
|
||||
**Sentinel Research** — institutional author.
|
||||
|
||||
Corresponding author: Daniel Cruces (danielcruces71@gmail.com)
|
||||
|
||||
For questions about reproduction, data, or the audit trail, open an issue
|
||||
on this repository or contact the corresponding author directly.
|
||||
40
README.md
40
README.md
|
|
@ -1,2 +1,40 @@
|
|||
# lookahead-bias-paper
|
||||
# Lookahead Bias in Vectorized Backtesting: A Noise Harness Diagnostic
|
||||
|
||||
Repository for the paper documenting a lookahead-bias defect found in a
|
||||
vectorized backtester (the "K12" kernel), and a noise-harness methodology
|
||||
to detect this class of bug using pure Geometric Brownian Motion data.
|
||||
|
||||
## Structure
|
||||
|
||||
```
|
||||
paper/ LaTeX source (revtex4-2)
|
||||
experiments/ Description and specs of the 5 planned experiments
|
||||
data/ Instructions to obtain the BTCUSDT 1m dataset (no large binaries in git)
|
||||
results/ Raw experiment outputs (json/csv), git-tracked once produced
|
||||
audit/input/ Forensic copy of the original K12 code/data/results for reproduction
|
||||
```
|
||||
|
||||
## How to reproduce
|
||||
|
||||
1. Read `experiments/README.md` for the experiment list and what each one tests.
|
||||
2. Read `data/README.md` to obtain the dataset (or regenerate synthetic GBM data).
|
||||
3. Run the scripts under `experiments/` (CI runs the noise harness automatically
|
||||
on every PR, see `.github/workflows/noise-harness.yml`).
|
||||
4. Compare your output against the reference files in `results/`.
|
||||
|
||||
## License — read this before reusing anything
|
||||
|
||||
This repository carries **two separate licenses** for two separate kinds of content:
|
||||
|
||||
| Content | License | File |
|
||||
|---|---|---|
|
||||
| Code: experiment scripts, harness, CI workflows, anything under `experiments/`, `data/`, `.github/` | **MIT** | [`LICENSE`](LICENSE) |
|
||||
| Paper text and figures, anything under `paper/` | **CC-BY 4.0** | [`LICENSE-TEXT-CC-BY-4.0.md`](LICENSE-TEXT-CC-BY-4.0.md) |
|
||||
|
||||
Use the code freely, commercially or not, with attribution (MIT terms).
|
||||
Reuse the paper text/figures freely, commercially or not, with attribution (CC-BY 4.0 terms).
|
||||
These are independent grants — reusing the code does not require complying with CC-BY, and vice versa.
|
||||
|
||||
## Authorship
|
||||
|
||||
See [`AUTHORS.md`](AUTHORS.md).
|
||||
|
|
|
|||
30
data/README.md
Normal file
30
data/README.md
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
# Data
|
||||
|
||||
This repository does not track raw market data (see `.gitignore`). Large
|
||||
binaries don't belong in a public git repo, and Binance's public API makes
|
||||
the data trivially reconstructible.
|
||||
|
||||
## BTCUSDT 1-minute OHLCV
|
||||
|
||||
Used in experiments 2 and 3 (baseline and honest replication against real
|
||||
data). To obtain it:
|
||||
|
||||
1. If `download_data.py` exists in this directory, run it — it pulls the
|
||||
exact date range used in the original experiment from the public Binance
|
||||
API and writes `BTCUSDT_1m.parquet`.
|
||||
2. Verify the SHA-256 hash of the resulting file matches the one recorded in
|
||||
`audit/input/MANIFEST.md` (forensic record of the original dataset used
|
||||
when the bug was found).
|
||||
|
||||
```bash
|
||||
sha256sum BTCUSDT_1m.parquet
|
||||
```
|
||||
|
||||
If the hash doesn't match, the date range or Binance API response has
|
||||
drifted — do not proceed with replication until it's reconciled.
|
||||
|
||||
## Synthetic GBM data (noise harness)
|
||||
|
||||
Generated on the fly by `experiments/01_generate_gbm.py`. No download
|
||||
needed — this is the point of using synthetic null data: it requires zero
|
||||
external dependency and is perfectly reproducible from a seed.
|
||||
27
experiments/README.md
Normal file
27
experiments/README.md
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
# Experiments
|
||||
|
||||
Five experiments, run in order. Each script is `0N_<name>.py`. Scripts that
|
||||
don't exist yet are listed here as a spec so the CI workflow and the paper's
|
||||
Section 6 stay in sync with what's actually implemented.
|
||||
|
||||
| # | Script | Purpose | Status |
|
||||
|---|--------|---------|--------|
|
||||
| 1 | `01_generate_gbm.py` | Generate pure-noise GBM price series (fixed seed, documented params) | pending |
|
||||
| 2 | `02_baseline_replication.py` | Run K12 golden hyperparameters on real BTCUSDT 1m, buggy backtester → expect Sharpe ≈ 14.49 | pending — needs `audit/input/code` |
|
||||
| 3 | `03_honest_replication.py` | Same hyperparameters/data, `time_machine.py` engine → expect Sharpe ≈ -0.25 | pending — needs `audit/input/code` |
|
||||
| 4 | `04_noise_control.py` | Run both engines across ≥30 independent GBM seeds, compare Sharpe distributions | pending |
|
||||
| 5 | `05_noise_harness.py` | CI-gating version of experiment 4: fails the build if mean Sharpe on noise falls outside a pre-registered null band | pending |
|
||||
|
||||
## Reproducibility rules
|
||||
|
||||
- Every script must take `--seed` and print it in its output.
|
||||
- Every output JSON must include: seed, kernel version/hash, library versions
|
||||
(numpy/pandas), and a UTC timestamp.
|
||||
- No script reads from `audit/input/` directly in a way that would couple the
|
||||
public reproduction path to the forensic copy — `audit/input/` is for our
|
||||
own verification, not for the published reproduction instructions.
|
||||
|
||||
## Environment
|
||||
|
||||
Pin dependencies in `requirements.txt` (to be added alongside the first
|
||||
script). CI installs from that file — see `.github/workflows/noise-harness.yml`.
|
||||
39
paper/main.tex
Normal file
39
paper/main.tex
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
\documentclass[aps,onecolumn,nofootinbib,floatfix]{revtex4-2}
|
||||
|
||||
\usepackage{graphicx}
|
||||
\usepackage{amsmath}
|
||||
\usepackage{amssymb}
|
||||
\usepackage{hyperref}
|
||||
\usepackage{booktabs}
|
||||
|
||||
\begin{document}
|
||||
|
||||
\title{Lookahead Bias in Vectorized Backtesting: A Noise Harness Diagnostic}
|
||||
|
||||
\author{Sentinel Research}
|
||||
\affiliation{Sentinel Research}
|
||||
\email{danielcruces71@gmail.com}
|
||||
|
||||
\date{\today}
|
||||
|
||||
\begin{abstract}
|
||||
% TODO: 150-250 words. Must state: (1) the defect found (lookahead bias in a
|
||||
% vectorized backtester), (2) the diagnostic method (pure-noise GBM harness),
|
||||
% (3) the headline numbers (Sharpe 14.49 under the bug vs Sharpe -0.25 fixed),
|
||||
% (4) why this matters for anyone running vectorized backtests at scale.
|
||||
\end{abstract}
|
||||
|
||||
\maketitle
|
||||
|
||||
\input{sections/01_introduction}
|
||||
\input{sections/02_related_work}
|
||||
\input{sections/03_problem_formalization}
|
||||
\input{sections/04_the_lookahead_bug}
|
||||
\input{sections/05_noise_harness_methodology}
|
||||
\input{sections/06_experimental_setup}
|
||||
\input{sections/07_results}
|
||||
\input{sections/08_discussion_and_conclusion}
|
||||
|
||||
\bibliography{references}
|
||||
|
||||
\end{document}
|
||||
7
paper/references.bib
Normal file
7
paper/references.bib
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
% Bibliography for "Lookahead Bias in Vectorized Backtesting"
|
||||
% Populate in Hito 2. Suggested starting points to look up and add:
|
||||
% - Bailey, D.H. & Lopez de Prado, M., "The Deflated Sharpe Ratio"
|
||||
% - Bailey, D.H. et al., "Pseudo-Mathematics and Financial Charlatanism"
|
||||
% - Bailey, D.H. & Lopez de Prado, M., "The Probability of Backtest Overfitting"
|
||||
% - Harvey, C.R., Liu, Y., Zhu, H., "...and the Cross-Section of Expected Returns"
|
||||
% - White, H., "A Reality Check for Data Snooping"
|
||||
15
paper/sections/01_introduction.tex
Normal file
15
paper/sections/01_introduction.tex
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
\section{Introduction}
|
||||
\label{sec:introduction}
|
||||
|
||||
% TODO content notes:
|
||||
% - Motivate why backtest correctness matters: a single off-by-one index in a
|
||||
% vectorized backtester can silently fabricate alpha.
|
||||
% - State the concrete finding up front: a 15-hyperparameter "golden kernel"
|
||||
% (K12 / Iter12) produced via genetic-algorithm search showed Sharpe 14.49
|
||||
% on BTCUSDT 1m data — and the same kernel, run honestly (bar-by-bar, no
|
||||
% future information), collapses to Sharpe -0.25.
|
||||
% - Frame the contribution: not just "we found a bug," but a reusable
|
||||
% noise-harness methodology (Section 5) that any quant team can run against
|
||||
% their own backtester to detect this class of defect using pure
|
||||
% Geometric Brownian Motion data with zero real signal.
|
||||
% - End with a roadmap of the paper (one sentence per remaining section).
|
||||
17
paper/sections/02_related_work.tex
Normal file
17
paper/sections/02_related_work.tex
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
\section{Related Work}
|
||||
\label{sec:related-work}
|
||||
|
||||
% TODO content notes:
|
||||
% - Lookahead bias / data leakage in backtesting: cite the standard
|
||||
% references (e.g. Bailey & Lopez de Prado on backtest overfitting,
|
||||
% "pseudo-mathematics" critiques, Probability of Backtest Overfitting).
|
||||
% - Vectorized vs event-driven backtesting engines: tradeoffs in speed vs
|
||||
% correctness; vectorized engines are more prone to index-alignment bugs
|
||||
% because there is no explicit "current bar" boundary enforced by the loop.
|
||||
% - Synthetic-data / null-model testing in finance: permutation tests,
|
||||
% Monte Carlo null models, white-noise sanity checks as a general technique
|
||||
% to detect overfit or leaky strategies before risking capital.
|
||||
% - Position this paper: distinct from prior work in that it (a) documents a
|
||||
% live, reproducible incident with full forensic trail, and (b) packages
|
||||
% the diagnostic as a minimal, CI-runnable harness (Section 5) rather than
|
||||
% a one-off statistical test.
|
||||
18
paper/sections/03_problem_formalization.tex
Normal file
18
paper/sections/03_problem_formalization.tex
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
\section{Problem Formalization}
|
||||
\label{sec:formalization}
|
||||
|
||||
% TODO content notes:
|
||||
% - Define the backtest setting formally: price series P_t, signal S_t,
|
||||
% position p_t, and the honesty constraint p_t = f(P_{<=t}, S_{<=t}) only
|
||||
% (no access to P_{>t}).
|
||||
% - Define lookahead bias precisely: any computation where p_t depends,
|
||||
% directly or via a vectorized operation (e.g. shift(-1), rolling window
|
||||
% misaligned by one bar, future-looking groupby), on P_{>t}.
|
||||
% - Show the general shape of the bug class in vectorized code: a single
|
||||
% missing .shift(1) or an inclusive/exclusive boundary error in a rolling
|
||||
% window. Use abstract pseudocode here; the concrete K12 diff goes in
|
||||
% Section 4.
|
||||
% - State the falsifiability criterion that motivates Section 5: if a
|
||||
% strategy is profitable on data with zero true signal (pure GBM noise),
|
||||
% the profit must be an artifact of the backtest mechanics, not of the
|
||||
% strategy logic.
|
||||
17
paper/sections/04_the_lookahead_bug.tex
Normal file
17
paper/sections/04_the_lookahead_bug.tex
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
\section{The K12 Lookahead Bug}
|
||||
\label{sec:the-bug}
|
||||
|
||||
% TODO content notes (fill in once audit/input/ is populated and audited):
|
||||
% - Identify the exact line(s) in backtester.py responsible for the leak.
|
||||
% - Show a minimal before/after diff.
|
||||
% - Explain mechanically why the genetic algorithm search (Iter12, 15
|
||||
% hyperparameters) was able to find and exploit this leak: GA optimizes
|
||||
% whatever signal is available, including backtest-mechanics artifacts: if
|
||||
% the fitness function rewards future-peeking, the search converges on
|
||||
% parameters that maximize the exploit, not real predictive skill.
|
||||
% - Quantify the leak's effect size on the original (real BTCUSDT) data:
|
||||
% Sharpe under the bug vs Sharpe under time_machine.py (the honest engine),
|
||||
% same hyperparameters, same data.
|
||||
% - This section depends on the forensic files under audit/input/code/ —
|
||||
% do not fill in specifics until that audit is complete (see MANIFEST.md
|
||||
% for the exact commit hash and dataset hash being audited).
|
||||
20
paper/sections/05_noise_harness_methodology.tex
Normal file
20
paper/sections/05_noise_harness_methodology.tex
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
\section{The Noise Harness Methodology}
|
||||
\label{sec:noise-harness}
|
||||
|
||||
% TODO content notes:
|
||||
% - Describe the GBM null-data generator: dS = mu*S*dt + sigma*S*dW, fixed
|
||||
% seed, parameters (mu, sigma, N steps, start price) documented in
|
||||
% experiments/README.md and reproduced in 01_generate_gbm.py.
|
||||
% - Key property to state explicitly: this series has zero exploitable
|
||||
% structure by construction — no autocorrelation edge, no regime, nothing
|
||||
% a real strategy could legitimately learn.
|
||||
% - Define the test: run the same kernel/backtester pipeline against N
|
||||
% independent GBM seeds. A backtester free of lookahead bias should
|
||||
% produce a Sharpe distribution centered at ~0 across seeds. A backtester
|
||||
% with a leak will produce a systematically positive Sharpe regardless of
|
||||
% seed, because the "edge" comes from the mechanics, not the data.
|
||||
% - State this as a pass/fail CI gate: mean Sharpe over >= 30 seeds must
|
||||
% fall within a pre-registered null band (e.g. -0.3 to 0.3); anything
|
||||
% outside that band fails the build. This is what
|
||||
% .github/workflows/noise-harness.yml is wired to enforce once
|
||||
% experiments/05_noise_harness.py exists.
|
||||
23
paper/sections/06_experimental_setup.tex
Normal file
23
paper/sections/06_experimental_setup.tex
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
\section{Experimental Setup}
|
||||
\label{sec:experimental-setup}
|
||||
|
||||
% TODO content notes:
|
||||
% - Enumerate the 5 experiments (full spec lives in experiments/README.md,
|
||||
% this section is the paper-facing summary):
|
||||
% 1. Baseline replication: run K12 golden hyperparameters on real BTCUSDT
|
||||
% 1m data, buggy backtester, reproduce Sharpe 14.49.
|
||||
% 2. Honest replication: same hyperparameters, same data, time_machine.py
|
||||
% (bar-by-bar, no lookahead), reproduce Sharpe -0.25.
|
||||
% 3. Noise harness on buggy backtester: same hyperparameters, >=30 GBM
|
||||
% seeds, buggy backtester. Expect systematically positive Sharpe.
|
||||
% 4. Noise harness on honest backtester: same setup, time_machine.py.
|
||||
% Expect Sharpe distribution centered at 0.
|
||||
% 5. Sensitivity check: vary the lookahead window size synthetically
|
||||
% (1-bar through N-bar leak) to show Sharpe scales with leak size, not
|
||||
% coincidence.
|
||||
% - State software/hardware environment: pinned in env/requirements.txt,
|
||||
% env/python_version.txt, env/os_info.txt under audit/input/ (forensic)
|
||||
% and experiments/requirements.txt (reproduction environment going
|
||||
% forward, which may differ in version but not in semantics).
|
||||
% - State exactly which files are authoritative for each experiment number
|
||||
% once experiments/0N_*.py scripts exist.
|
||||
19
paper/sections/07_results.tex
Normal file
19
paper/sections/07_results.tex
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
\section{Results}
|
||||
\label{sec:results}
|
||||
|
||||
% TODO content notes:
|
||||
% - Table 1: side-by-side Sharpe (and other metrics: max drawdown, win rate,
|
||||
% total return) for buggy vs honest backtester on real data. This is the
|
||||
% "leak signature" headline result.
|
||||
% - Figure 1 (leak_signature.png): equity curve comparison, buggy vs honest,
|
||||
% same hyperparameters, real data.
|
||||
% - Figure 2 (noise_control.png): histogram/distribution of Sharpe across
|
||||
% >=30 GBM seeds, buggy backtester overlaid with honest backtester. The
|
||||
% buggy distribution should be visibly shifted positive; the honest one
|
||||
% centered near 0.
|
||||
% - Figure 3 (fix_comparison.png): before/after of the actual code diff that
|
||||
% fixed the leak, annotated with the Sharpe delta it caused.
|
||||
% - Do not write actual numbers into this section until results/raw/*.json
|
||||
% exist and have been validated against the audit/input/results/ originals
|
||||
% (see MANIFEST.md hashes). Every number in this section must be traceable
|
||||
% to a specific file + seed + commit.
|
||||
22
paper/sections/08_discussion_and_conclusion.tex
Normal file
22
paper/sections/08_discussion_and_conclusion.tex
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
\section{Discussion and Conclusion}
|
||||
\label{sec:discussion}
|
||||
|
||||
% TODO content notes:
|
||||
% - Generalize beyond this one kernel: any vectorized backtester using
|
||||
% pandas/numpy rolling/shift operations is at risk of this exact class of
|
||||
% bug; it is not specific to genetic-algorithm-discovered strategies.
|
||||
% - Practical recommendation: every backtesting pipeline should run the
|
||||
% noise harness (Section 5) as a standing CI gate, the same way unit tests
|
||||
% gate merges — not as a one-off audit.
|
||||
% - Limitations: the noise harness detects backtest-mechanics leaks; it does
|
||||
% NOT detect overfitting to real historical data (that is a distinct
|
||||
% failure mode requiring out-of-sample / walk-forward validation,
|
||||
% out of scope here).
|
||||
% - Disclosure note: state plainly that this defect was found in an
|
||||
% internal/proprietary research pipeline (Sentinel Research), and that
|
||||
% this paper publishes the diagnostic methodology and a minimal
|
||||
% reproduction, not the proprietary strategy code itself.
|
||||
% - One-paragraph conclusion restating the core claim: Sharpe 14.49 on pure
|
||||
% noise is not skill, it is a bug signature, and the fix collapsed it to
|
||||
% Sharpe -0.25 — exactly the kind of result a noise harness exists to
|
||||
% catch before capital is at risk.
|
||||
0
results/.gitkeep
Normal file
0
results/.gitkeep
Normal file
Loading…
Add table
Reference in a new issue