lookahead-bias-paper/data/README.md
Sentinel Research af990122d1
Some checks failed
noise-harness / run-noise-harness (push) Failing after 1m54s
Hito 1: scaffold del paper (estructura, LaTeX revtex4-2, CI, licencias duales)
2026-06-24 07:05:37 +02:00

30 lines
1.1 KiB
Markdown

# Data
This repository does not track raw market data (see `.gitignore`). Large
binaries don't belong in a public git repo, and Binance's public API makes
the data trivially reconstructible.
## BTCUSDT 1-minute OHLCV
Used in experiments 2 and 3 (baseline and honest replication against real
data). To obtain it:
1. If `download_data.py` exists in this directory, run it — it pulls the
exact date range used in the original experiment from the public Binance
API and writes `BTCUSDT_1m.parquet`.
2. Verify the SHA-256 hash of the resulting file matches the one recorded in
`audit/input/MANIFEST.md` (forensic record of the original dataset used
when the bug was found).
```bash
sha256sum BTCUSDT_1m.parquet
```
If the hash doesn't match, the date range or Binance API response has
drifted — do not proceed with replication until it's reconciled.
## Synthetic GBM data (noise harness)
Generated on the fly by `experiments/01_generate_gbm.py`. No download
needed — this is the point of using synthetic null data: it requires zero
external dependency and is perfectly reproducible from a seed.