commands: m2d-log + backtest + batch + stats slash commands (124 tests pass)
This commit is contained in:
73
.claude/commands/backtest.md
Normal file
73
.claude/commands/backtest.md
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
---
|
||||||
|
description: Run vision extraction on a single TradeStation screenshot, then append to jurnal CSV + regenerate MD.
|
||||||
|
argument-hint: "<screenshot_path_or_basename> [--calibration]"
|
||||||
|
---
|
||||||
|
|
||||||
|
# /backtest — single screenshot vision extraction
|
||||||
|
|
||||||
|
Lansează subagentul `m2d-extractor` pe un screenshot, primește JSON-ul, append la `data/jurnal.csv`, regenerează `data/jurnal.md`.
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
- `$1` (obligatoriu) — path la screenshot. Acceptă:
|
||||||
|
- basename (`2026-05-13-dia-1645.png`) — caută în `screenshots/inbox/`, fallback `screenshots/processed/`
|
||||||
|
- path relativ sau absolut explicit
|
||||||
|
- `--calibration` (flag) — `source=vision_calibration` în loc de `source=vision`. Folosit împreună cu `/m2d-log --calibration` pe același screenshot pentru P4 mismatch report.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
1. **Rezolvă path-ul** screenshot-ului. Dacă `$1` e doar basename, încearcă `screenshots/inbox/<basename>` apoi `screenshots/processed/<basename>`. Dacă nu există nicăieri, raportezi eroare și te oprești.
|
||||||
|
|
||||||
|
2. **Invocă subagentul `m2d-extractor`** (definit în `.claude/agents/m2d-extractor.md`) prin Task tool cu `subagent_type: "m2d-extractor"`. Prompt-ul către agent:
|
||||||
|
|
||||||
|
```
|
||||||
|
screenshot_path: <absolute_path>
|
||||||
|
screenshot_file: <basename>
|
||||||
|
```
|
||||||
|
|
||||||
|
Agentul scrie `data/extractions/<basename_no_ext>.json` + `.log` și returnează status-line scurt.
|
||||||
|
|
||||||
|
3. **Verifică output-ul**:
|
||||||
|
- Dacă fișierul `data/extractions/<basename_no_ext>.json` nu există după ce agentul revine → eroare; raportezi și muți screenshot-ul la `screenshots/needs_review/`.
|
||||||
|
- Citește JSON-ul. Dacă `confidence == "low"` SAU `ambiguities` non-empty cu `image_unreadable` → muți screenshot-ul la `screenshots/needs_review/`, raportezi, nu apelezi append.
|
||||||
|
|
||||||
|
4. **Append la CSV**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -c "from pathlib import Path; from scripts.append_row import append_extraction; import json; r = append_extraction(Path('data/extractions/<basename_no_ext>.json'), source='<source>'); print(json.dumps(r, default=str))"
|
||||||
|
```
|
||||||
|
|
||||||
|
`<source>` = `vision_calibration` dacă `--calibration`, altfel `vision`.
|
||||||
|
|
||||||
|
Parsezi răspunsul. Dacă `status == "rejected"`:
|
||||||
|
- `reason` conține "duplicate" → screenshot deja procesat cu acest source; raportezi și NU îl muți.
|
||||||
|
- `reason` conține "validation error" → JSON-ul agentului a fost respins; muți screenshot la `screenshots/needs_review/` și raportezi.
|
||||||
|
- Alte erori → raportezi și lași screenshot-ul unde e.
|
||||||
|
|
||||||
|
5. **Mută screenshot-ul** la `screenshots/processed/<basename>` dacă append-ul a reușit și fișierul originar a fost în `inbox/`. Dacă era deja în `processed/`, nu-l muta.
|
||||||
|
|
||||||
|
6. **Regenerează MD**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/regenerate_md.py
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Raport final** (în română):
|
||||||
|
|
||||||
|
```
|
||||||
|
/backtest <basename> → trade #<id> adăugat (source=<source>, set=<set>, pl_marius=<pl>, confidence=<conf>).
|
||||||
|
Regenerat data/jurnal.md (<total> rânduri).
|
||||||
|
```
|
||||||
|
|
||||||
|
Dacă screenshot-ul a fost mutat la `needs_review`:
|
||||||
|
|
||||||
|
```
|
||||||
|
/backtest <basename> → NEEDS REVIEW: <motiv>. Mutat la screenshots/needs_review/<basename>.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Reguli
|
||||||
|
|
||||||
|
- O singură invocare per screenshot. Nu reapelezi agentul dacă output-ul e dubios — îl muți la `needs_review` și raportezi.
|
||||||
|
- NU edita CSV direct.
|
||||||
|
- NU regenera MD dacă append-ul a fost respins.
|
||||||
|
- Path discipline: subagentul scrie doar la `data/extractions/`; tu (slash command) muți screenshot-uri și apelezi scripts/.
|
||||||
95
.claude/commands/batch.md
Normal file
95
.claude/commands/batch.md
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
---
|
||||||
|
description: Run vision extraction in parallel on multiple screenshots (default screenshots/inbox/), then serial-append the results with partial-failure handling.
|
||||||
|
argument-hint: "[dir_or_glob] [--limit N] [--calibration]"
|
||||||
|
---
|
||||||
|
|
||||||
|
# /batch — parallel vision extraction over multiple screenshots
|
||||||
|
|
||||||
|
Procesează screenshot-uri multiple. Lansează până la **5 subagenți `m2d-extractor` în paralel** (cap rigid — protejează context window și rate limits). După ce toți revin, append-ezi rezultatele **serial** (`append_row` citește/scrie CSV — paralelism la write = corupție garantată).
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
- `$1` (opțional) — director sau glob. Default `screenshots/inbox/`. Exemplu: `screenshots/inbox/2025-09-*.png`.
|
||||||
|
- `--limit N` (opțional) — procesează doar primele N screenshot-uri (în ordine alfabetică). Default: toate.
|
||||||
|
- `--calibration` (flag) — `source=vision_calibration` în loc de `vision`.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### Fază 1 — Colectează lista
|
||||||
|
|
||||||
|
1. Enumeră fișierele PNG/JPG match-uind argumentul. Sortează alfabetic. Aplică `--limit` dacă există.
|
||||||
|
2. Dacă lista e goală → raportezi "Nimic de procesat în <path>" și te oprești.
|
||||||
|
3. Dacă lista are 1 element → sugerează `/backtest` în loc și continuă cu batch.
|
||||||
|
|
||||||
|
### Fază 2 — Extracție paralelă (max 5 concurent)
|
||||||
|
|
||||||
|
Procesezi în **batch-uri de 5**. Pentru fiecare batch:
|
||||||
|
|
||||||
|
- Lansezi câte un Task tool call cu `subagent_type: "m2d-extractor"` pentru fiecare screenshot, ÎN ACELAȘI MESAJ (tool calls paralele). Prompt-ul per agent:
|
||||||
|
|
||||||
|
```
|
||||||
|
screenshot_path: <absolute_path>
|
||||||
|
screenshot_file: <basename>
|
||||||
|
```
|
||||||
|
|
||||||
|
- Aștepți să se întoarcă toți cinci. Pentru fiecare, verifici că `data/extractions/<basename_no_ext>.json` a fost scris.
|
||||||
|
- Treci la următorul batch de 5.
|
||||||
|
|
||||||
|
**De ce 5**: peste 5 sub-agenți paraleli începi să saturezi context window-ul orchestratorului cu output-urile lor și rate limits-urile API-ului. Cap rigid.
|
||||||
|
|
||||||
|
### Fază 3 — Append serial cu partial-failure
|
||||||
|
|
||||||
|
Pentru fiecare screenshot din lista originală, **în ordine**:
|
||||||
|
|
||||||
|
1. Verifică `data/extractions/<basename_no_ext>.json`:
|
||||||
|
- Lipsă → log "missing JSON, agent abort", mută screenshot-ul la `screenshots/needs_review/`, continuă cu următorul.
|
||||||
|
- Citește JSON. Dacă `confidence == "low"` SAU `"image_unreadable" in ambiguities` → mută la `needs_review/`, continuă.
|
||||||
|
|
||||||
|
2. Apelează append:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -c "from pathlib import Path; from scripts.append_row import append_extraction; import json; r = append_extraction(Path('data/extractions/<basename_no_ext>.json'), source='<source>'); print(json.dumps(r, default=str))"
|
||||||
|
```
|
||||||
|
|
||||||
|
`<source>` = `vision_calibration` dacă `--calibration`, altfel `vision`.
|
||||||
|
|
||||||
|
3. Reacționezi la rezultat:
|
||||||
|
- `status == "ok"` → ține minte ID-ul, mută screenshot la `screenshots/processed/<basename>` dacă era în inbox.
|
||||||
|
- `status == "rejected"`, `reason` conține "duplicate" → ține minte ca skip; NU muta screenshot-ul (deja procesat).
|
||||||
|
- `status == "rejected"`, alt reason → log motivul, mută la `needs_review/`.
|
||||||
|
|
||||||
|
4. NU oprești batch-ul la primul fail. Continuă până la capăt.
|
||||||
|
|
||||||
|
### Fază 4 — Regenerează MD o singură dată
|
||||||
|
|
||||||
|
După ce toate append-urile s-au terminat (chiar și parțial), rulezi UNA SINGURĂ DATĂ:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/regenerate_md.py
|
||||||
|
```
|
||||||
|
|
||||||
|
(Regenerarea după fiecare append e wasteful; CSV-ul e sursa de adevăr, MD-ul e mirror.)
|
||||||
|
|
||||||
|
### Fază 5 — Raport final
|
||||||
|
|
||||||
|
Format:
|
||||||
|
|
||||||
|
```
|
||||||
|
/batch terminat. Procesat <total> screenshot-uri.
|
||||||
|
OK: <n_ok> (trade-uri #<id1>, #<id2>, ...)
|
||||||
|
Duplicate: <n_dup> (skipped — deja în CSV)
|
||||||
|
Needs review: <n_nr> (mutate la screenshots/needs_review/)
|
||||||
|
- <basename1>: <motiv>
|
||||||
|
- <basename2>: <motiv>
|
||||||
|
Erori: <n_err>
|
||||||
|
- <basename>: <reason>
|
||||||
|
Regenerat data/jurnal.md (<total_rows> rânduri).
|
||||||
|
```
|
||||||
|
|
||||||
|
## Reguli
|
||||||
|
|
||||||
|
- **Cap concurrency la 5**. Niciodată mai mulți subagenți paraleli — chiar și pentru un batch mare. Procesezi în secvențe de batch-uri de 5.
|
||||||
|
- **Append serial obligatoriu**. `append_extraction` citește CSV-ul, computează `next_id` și scrie atomic; rulat în paralel ar duce la ID-uri duplicat sau pierderi.
|
||||||
|
- **Partial failure = continuă**. Un screenshot prost nu blochează restul batch-ului.
|
||||||
|
- **MD regen o singură dată** la final.
|
||||||
|
- **Path discipline pentru subagent neschimbată**: agentul scrie doar la `data/extractions/`. Tu, ca orchestrator, muți screenshot-uri.
|
||||||
99
.claude/commands/m2d-log.md
Normal file
99
.claude/commands/m2d-log.md
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
---
|
||||||
|
description: Adaugă manual un rând în jurnal.csv (source=manual sau manual_calibration). Pentru calibrare P4 sau forward paper.
|
||||||
|
argument-hint: "[--calibration] <screenshot_path>"
|
||||||
|
---
|
||||||
|
|
||||||
|
# /m2d-log — manual M2D trade entry
|
||||||
|
|
||||||
|
Marius extrage manual TOATE câmpurile trade-ului. Folosit pentru calibration P4 (împreună cu `/backtest --calibration` pe același screenshot) sau ca log direct fără vision.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
1. **Parse `$ARGUMENTS`** — detectează flag `--calibration` și `<screenshot_path>`. Dacă `<screenshot_path>` lipsește, întreabă user-ul. Calculează `basename = basename(<screenshot_path>)` și `basename_no_ext = basename` minus ultima extensie.
|
||||||
|
|
||||||
|
2. **Promptează user-ul în română**, pe rând, pentru fiecare câmp din schema `M2DExtraction` (vezi `scripts/vision_schema.py`). Ordinea + opțiuni valide:
|
||||||
|
|
||||||
|
- `data` — `YYYY-MM-DD`
|
||||||
|
- `ora_utc` — `HH:MM` (conversie din RO local: EEST=UTC+3 vară, EET=UTC+2 iarnă; întreabă user-ul direct dacă nu e clar)
|
||||||
|
- `instrument` — `DIA` / `US30` / `other`
|
||||||
|
- `directie` — `Buy` / `Sell`
|
||||||
|
- `tf_mare` — `5min` / `15min`
|
||||||
|
- `tf_mic` — `1min` / `3min`
|
||||||
|
- `calitate` — `Clară` / `Mai mare ca impuls` / `Slabă` / `n/a`
|
||||||
|
- `entry`, `sl`, `tp0`, `tp1`, `tp2` — float-uri
|
||||||
|
- `risc_pct` — float (ex: `0.12` pentru 0.12%)
|
||||||
|
- `outcome_path` — `SL` / `TP0→SL` / `TP0→TP1` / `TP0→TP2` / `TP0→pending` / `pending` (UNICODE `→`)
|
||||||
|
- `max_reached` — `SL_first` / `TP0` / `TP1` / `TP2`
|
||||||
|
- `be_moved` — `true` / `false`
|
||||||
|
- `confidence` — default `high` (manual e by definition high)
|
||||||
|
- `note` — string opțional, default `""`
|
||||||
|
|
||||||
|
`screenshot_file` se setează automat la `basename`; `ambiguities` se setează automat la `[]`. Dacă user-ul dă valoare invalidă, repetă întrebarea.
|
||||||
|
|
||||||
|
3. **Construiește JSON-ul** complet, valid contra `M2DExtraction`.
|
||||||
|
|
||||||
|
4. **Scrie JSON-ul** la `data/extractions/<basename_no_ext>.manual.json` — pretty-print indent 2, UTF-8, newline final. Sufixul `.manual` previne coliziunea cu output-ul vision (`<basename_no_ext>.json`).
|
||||||
|
|
||||||
|
5. **Determină source**: `manual_calibration` dacă `--calibration` e prezent, altfel `manual`.
|
||||||
|
|
||||||
|
6. **Append la CSV**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -c "from pathlib import Path; from scripts.append_row import append_extraction; import json; r = append_extraction(Path('data/extractions/<basename_no_ext>.manual.json'), source='<source>'); print(json.dumps(r, default=str))"
|
||||||
|
```
|
||||||
|
|
||||||
|
Parsezi răspunsul JSON.
|
||||||
|
|
||||||
|
7. **Dacă `status == "ok"`**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m scripts.regenerate_md
|
||||||
|
```
|
||||||
|
|
||||||
|
Apoi afișezi:
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ Trade adăugat la jurnal. ID: <id>. Set: <set>. P/L Marius: <pl_marius>. outcome_path: <outcome_path>.
|
||||||
|
```
|
||||||
|
|
||||||
|
8. **Dacă `status == "rejected"`**:
|
||||||
|
|
||||||
|
```
|
||||||
|
❌ Trade respins: <reason>
|
||||||
|
```
|
||||||
|
|
||||||
|
NU regenera MD. Dacă `reason` conține "duplicate":
|
||||||
|
- pentru `--calibration`: spui user-ului că există deja rând `manual_calibration` pentru acest screenshot; nu poți avea două leg-uri manual de calibrare pe același screenshot.
|
||||||
|
- pentru `source=manual` simplu: user-ul decide dacă suprascrie (atunci șterge manual rândul din `data/jurnal.csv` și re-rulează).
|
||||||
|
|
||||||
|
## Reguli
|
||||||
|
|
||||||
|
- NU edita CSV direct — folosește `append_extraction`.
|
||||||
|
- NU regenera MD dacă append-ul a fost respins.
|
||||||
|
|
||||||
|
## Output skeleton JSON
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"screenshot_file": "2026-05-13-dia-1645.png",
|
||||||
|
"data": "2026-05-13",
|
||||||
|
"ora_utc": "14:45",
|
||||||
|
"instrument": "DIA",
|
||||||
|
"directie": "Buy",
|
||||||
|
"tf_mare": "5min",
|
||||||
|
"tf_mic": "1min",
|
||||||
|
"calitate": "Clară",
|
||||||
|
"entry": 497.42,
|
||||||
|
"sl": 496.80,
|
||||||
|
"tp0": 497.67,
|
||||||
|
"tp1": 497.79,
|
||||||
|
"tp2": 498.04,
|
||||||
|
"risc_pct": 0.12,
|
||||||
|
"outcome_path": "TP0→TP1",
|
||||||
|
"max_reached": "TP1",
|
||||||
|
"be_moved": true,
|
||||||
|
"confidence": "high",
|
||||||
|
"ambiguities": [],
|
||||||
|
"note": ""
|
||||||
|
}
|
||||||
|
```
|
||||||
42
.claude/commands/stats.md
Normal file
42
.claude/commands/stats.md
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
---
|
||||||
|
description: Show backtest statistics for data/jurnal.csv (overall, per-Set, per-calitate, per-instrument with Wilson + bootstrap CIs). --calibration shows P4 mismatch report.
|
||||||
|
argument-hint: "[--calibration] [--seed N]"
|
||||||
|
---
|
||||||
|
|
||||||
|
# /stats — backtest statistics
|
||||||
|
|
||||||
|
Rulează `scripts/stats.py` și afișează raportul.
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
- `--calibration` (flag) — afișează raportul P4 (mismatch field-by-field pe perechi `manual_calibration` ↔ `vision_calibration` join-uite pe `screenshot_file`).
|
||||||
|
- `--seed N` (opțional) — seed pentru bootstrap RNG (default fără seed → output ne-determinist între run-uri). Folosește când vrei reproducibilitate.
|
||||||
|
|
||||||
|
Default (fără flag-uri): backtest stats — overall + per-Set + per-calitate + per-instrument WR, expectancy, Wilson 95% CI pe WR, bootstrap 95% CI pe expectancy.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
1. Construiește comanda:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/stats.py [--calibration] [--seed N]
|
||||||
|
```
|
||||||
|
|
||||||
|
`--csv data/jurnal.csv` e default-ul scriptului — nu îl pasezi.
|
||||||
|
|
||||||
|
2. Rulează prin Bash tool. Output-ul vine pe stdout în UTF-8.
|
||||||
|
|
||||||
|
3. Afișează output-ul **as-is** către user. Nu reformata, nu re-rezuma, nu interpreta. Scriptul are deja format ales (tabel + secțiuni text).
|
||||||
|
|
||||||
|
4. **Interpretare** scurtă (max 3 propoziții) DACĂ user-ul cere explicit sau dacă observi ceva ce merită menționat:
|
||||||
|
- În modul backtest: Set-uri cu N ≥ 40 și Wilson lower bound > 50% → candidat pentru GO LIVE (vezi `STOPPING_RULE.md`).
|
||||||
|
- În modul `--calibration`: dacă există ≥10 perechi și mismatch rate > 10% pe câmpuri core (`entry/sl/tp0/1/2/outcome_path/max_reached/directie`) → P4 FAIL, vision agent are nevoie de fix (`.claude/agents/m2d-extractor.md`).
|
||||||
|
|
||||||
|
5. NU edita CSV. NU regenera MD (citire pură).
|
||||||
|
|
||||||
|
## Reguli
|
||||||
|
|
||||||
|
- Read-only. Această comandă nu scrie nimic.
|
||||||
|
- Output-ul scriptului e ground truth — nu inventezi numere.
|
||||||
|
- `calitate` e descriptor biased (post-outcome) — vezi `STOPPING_RULE.md` §3 — raportul îl afișează informational only. NU sugerezi user-ului să folosească `calitate` ca filtru pentru GO LIVE.
|
||||||
|
- Pentru calibration P4: minimum 10 perechi pentru ca verdictul să aibă sens. Sub 10 perechi → raportezi "insuficient pentru P4 — continuă să acumulezi calibrare".
|
||||||
801
scripts/stats.py
801
scripts/stats.py
@@ -1,21 +1,20 @@
|
|||||||
"""Backtest statistics for ``data/jurnal.csv``.
|
"""Backtest statistics for ``data/jurnal.csv``.
|
||||||
|
|
||||||
Outputs:
|
Public API:
|
||||||
- Overall + per-Set + per-calitate + per-instrument WR, expectancy.
|
- ``compute_stats(csv_path, overlay) -> dict``
|
||||||
- Wilson 95% CI for WR (closed form).
|
- ``render_stats(stats, overlay) -> str``
|
||||||
- Bootstrap percentile 95% CI for expectancy (deterministic via ``seed``).
|
- ``compute_calibration(csv_path) -> dict``
|
||||||
- ``--calibration`` mode: joins ``manual_calibration`` rows with their
|
- ``render_calibration(cal) -> str``
|
||||||
``vision_calibration`` counterparts on ``screenshot_file`` and reports
|
- ``main()`` — CLI entry point.
|
||||||
field-by-field mismatch rates for the P4 gate (see ``STOPPING_RULE.md``).
|
|
||||||
|
|
||||||
A "win" is any trade with ``pl_marius > 0``. Pending trades
|
A "win" is a closed trade with ``pl_overlay > 0`` (where ``pl_overlay`` is
|
||||||
(``pl_marius`` blank, i.e. ``outcome_path in {pending, TP0->pending}``) are
|
either ``pl_marius`` or ``pl_theoretical``). Pending trades — ``pl_marius``
|
||||||
excluded from both WR and expectancy: there is no realised outcome yet.
|
blank, i.e. ``outcome_path in {pending, TP0->pending}`` — are excluded from
|
||||||
|
both WR and expectancy: there is no realised outcome yet.
|
||||||
|
|
||||||
The ``calitate`` field is a known-biased descriptor (post-outcome
|
The ``calitate`` field is a known-biased descriptor: it is classified
|
||||||
classification — see ``STOPPING_RULE.md`` §3). It is reported as
|
post-outcome (see ``STOPPING_RULE.md`` §3). The per-``calitate`` split is
|
||||||
informational only and explicitly flagged as such; do NOT use it as a
|
reported with an explicit *descriptor only — biased post-outcome* caveat.
|
||||||
filter for GO LIVE decisions.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
@@ -23,32 +22,42 @@ from __future__ import annotations
|
|||||||
import argparse
|
import argparse
|
||||||
import csv
|
import csv
|
||||||
import math
|
import math
|
||||||
import random
|
|
||||||
import sys
|
import sys
|
||||||
from dataclasses import dataclass, field
|
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Iterable
|
from typing import Any, Iterable
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from scripts.append_row import CSV_COLUMNS
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"CORE_CALIBRATION_FIELDS",
|
|
||||||
"BACKTEST_SOURCES",
|
"BACKTEST_SOURCES",
|
||||||
"CALIBRATION_SOURCES",
|
"CALIBRATION_SOURCES",
|
||||||
"Trade",
|
"CORE_CALIBRATION_FIELDS",
|
||||||
"GroupStats",
|
"NUMERIC_CALIBRATION_FIELDS",
|
||||||
"load_trades",
|
"STOPPING_RULE_N",
|
||||||
"wilson_ci",
|
"wilson_ci",
|
||||||
"bootstrap_ci",
|
"bootstrap_expectancy_ci",
|
||||||
"win_rate",
|
"compute_stats",
|
||||||
"expectancy",
|
"render_stats",
|
||||||
"group_by",
|
"compute_calibration",
|
||||||
"compute_group_stats",
|
"render_calibration",
|
||||||
"calibration_mismatch",
|
|
||||||
"format_report",
|
|
||||||
"main",
|
"main",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
# Fields compared in the calibration mismatch gate (STOPPING_RULE.md §P4).
|
# ---------------------------------------------------------------------------
|
||||||
|
# Constants
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
BACKTEST_SOURCES: frozenset[str] = frozenset({"vision", "manual"})
|
||||||
|
CALIBRATION_SOURCES: frozenset[str] = frozenset(
|
||||||
|
{"manual_calibration", "vision_calibration"}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# Calibration P4 gate (STOPPING_RULE.md §P4) — explicitly reported per field.
|
||||||
CORE_CALIBRATION_FIELDS: tuple[str, ...] = (
|
CORE_CALIBRATION_FIELDS: tuple[str, ...] = (
|
||||||
"entry",
|
"entry",
|
||||||
"sl",
|
"sl",
|
||||||
@@ -58,315 +67,205 @@ CORE_CALIBRATION_FIELDS: tuple[str, ...] = (
|
|||||||
"outcome_path",
|
"outcome_path",
|
||||||
"max_reached",
|
"max_reached",
|
||||||
"directie",
|
"directie",
|
||||||
|
"instrument",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
BACKTEST_SOURCES: frozenset[str] = frozenset({"vision", "manual"})
|
NUMERIC_CALIBRATION_FIELDS: frozenset[str] = frozenset(
|
||||||
CALIBRATION_SOURCES: frozenset[str] = frozenset(
|
{"entry", "sl", "tp0", "tp1", "tp2"}
|
||||||
{"manual_calibration", "vision_calibration"}
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# STOPPING_RULE.md §"GO LIVE" gate: N >= 40 per Set.
|
||||||
|
STOPPING_RULE_N: int = 40
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Loading / typed access
|
# Loading
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
|
||||||
class Trade:
|
|
||||||
"""One realised (or pending) trade row, typed."""
|
|
||||||
|
|
||||||
id: int
|
|
||||||
screenshot_file: str
|
|
||||||
source: str
|
|
||||||
data: str
|
|
||||||
zi: str
|
|
||||||
ora_ro: str
|
|
||||||
instrument: str
|
|
||||||
directie: str
|
|
||||||
calitate: str
|
|
||||||
set: str
|
|
||||||
outcome_path: str
|
|
||||||
max_reached: str
|
|
||||||
be_moved: bool
|
|
||||||
pl_marius: float | None
|
|
||||||
pl_theoretical: float
|
|
||||||
raw: dict[str, str] = field(default_factory=dict)
|
|
||||||
|
|
||||||
@property
|
|
||||||
def is_pending(self) -> bool:
|
|
||||||
return self.pl_marius is None
|
|
||||||
|
|
||||||
@property
|
|
||||||
def is_win(self) -> bool:
|
|
||||||
return self.pl_marius is not None and self.pl_marius > 0
|
|
||||||
|
|
||||||
|
|
||||||
def _parse_optional_float(value: str) -> float | None:
|
def _parse_optional_float(value: str) -> float | None:
|
||||||
s = (value or "").strip()
|
s = (value or "").strip()
|
||||||
if s == "":
|
if s == "":
|
||||||
return None
|
return None
|
||||||
return float(s)
|
try:
|
||||||
|
return float(s)
|
||||||
|
except ValueError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def _parse_bool(value: str) -> bool:
|
def _load_rows(csv_path: Path | str) -> list[dict[str, str]]:
|
||||||
return (value or "").strip().lower() in {"true", "1", "yes", "da"}
|
|
||||||
|
|
||||||
|
|
||||||
def _row_to_trade(row: dict[str, str]) -> Trade:
|
|
||||||
return Trade(
|
|
||||||
id=int(row.get("id") or 0),
|
|
||||||
screenshot_file=row.get("screenshot_file", ""),
|
|
||||||
source=row.get("source", ""),
|
|
||||||
data=row.get("data", ""),
|
|
||||||
zi=row.get("zi", ""),
|
|
||||||
ora_ro=row.get("ora_ro", ""),
|
|
||||||
instrument=row.get("instrument", ""),
|
|
||||||
directie=row.get("directie", ""),
|
|
||||||
calitate=row.get("calitate", ""),
|
|
||||||
set=row.get("set", ""),
|
|
||||||
outcome_path=row.get("outcome_path", ""),
|
|
||||||
max_reached=row.get("max_reached", ""),
|
|
||||||
be_moved=_parse_bool(row.get("be_moved", "")),
|
|
||||||
pl_marius=_parse_optional_float(row.get("pl_marius", "")),
|
|
||||||
pl_theoretical=float(row.get("pl_theoretical") or 0.0),
|
|
||||||
raw=dict(row),
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def load_trades(csv_path: Path | str) -> list[Trade]:
|
|
||||||
"""Load all rows of ``csv_path`` as :class:`Trade` objects.
|
|
||||||
|
|
||||||
Returns ``[]`` if the file does not exist or is empty.
|
|
||||||
"""
|
|
||||||
p = Path(csv_path)
|
p = Path(csv_path)
|
||||||
if not p.exists() or p.stat().st_size == 0:
|
if not p.exists() or p.stat().st_size == 0:
|
||||||
return []
|
return []
|
||||||
with p.open("r", encoding="utf-8", newline="") as fh:
|
with p.open("r", encoding="utf-8", newline="") as fh:
|
||||||
reader = csv.DictReader(fh)
|
return list(csv.DictReader(fh))
|
||||||
return [_row_to_trade(r) for r in reader]
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Statistics primitives
|
# CI primitives
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def wilson_ci(wins: int, n: int, z: float = 1.96) -> tuple[float, float]:
|
def wilson_ci(wins: int, n: int, z: float = 1.96) -> tuple[float, float]:
|
||||||
"""Wilson score interval for a binomial proportion.
|
"""Wilson score interval for a binomial proportion.
|
||||||
|
|
||||||
Returns ``(lo, hi)`` as proportions in [0, 1]. For ``n == 0`` returns
|
Returns ``(lo, hi)`` clamped to ``[0.0, 1.0]``. For ``n == 0`` returns
|
||||||
``(0.0, 0.0)``. ``z = 1.96`` corresponds to a 95% CI.
|
``(0.0, 0.0)``. ``z = 1.96`` ≈ 95% confidence.
|
||||||
"""
|
"""
|
||||||
if n <= 0:
|
if n <= 0:
|
||||||
return (0.0, 0.0)
|
return (0.0, 0.0)
|
||||||
if wins < 0 or wins > n:
|
if wins < 0 or wins > n:
|
||||||
raise ValueError(f"wins={wins} out of range for n={n}")
|
raise ValueError(f"wins={wins} out of range for n={n}")
|
||||||
p_hat = wins / n
|
p = wins / n
|
||||||
denom = 1.0 + (z * z) / n
|
denom = 1.0 + (z * z) / n
|
||||||
center = p_hat + (z * z) / (2.0 * n)
|
center = (p + (z * z) / (2.0 * n)) / denom
|
||||||
half = z * math.sqrt((p_hat * (1.0 - p_hat) + (z * z) / (4.0 * n)) / n)
|
spread = z * math.sqrt(p * (1.0 - p) / n + (z * z) / (4.0 * n * n)) / denom
|
||||||
lo = (center - half) / denom
|
return (max(0.0, center - spread), min(1.0, center + spread))
|
||||||
hi = (center + half) / denom
|
|
||||||
return (max(0.0, lo), min(1.0, hi))
|
|
||||||
|
|
||||||
|
|
||||||
def bootstrap_ci(
|
def bootstrap_expectancy_ci(
|
||||||
values: list[float],
|
values: list[float] | np.ndarray,
|
||||||
*,
|
n_resamples: int = 5000,
|
||||||
iterations: int = 2000,
|
seed: int = 42,
|
||||||
alpha: float = 0.05,
|
|
||||||
seed: int | None = None,
|
|
||||||
) -> tuple[float, float]:
|
) -> tuple[float, float]:
|
||||||
"""Percentile-method bootstrap CI for the mean of ``values``.
|
"""Percentile-method bootstrap 95% CI for the mean of ``values``.
|
||||||
|
|
||||||
Deterministic when ``seed`` is provided. Returns ``(lo, hi)``. For
|
Deterministic for a given ``seed``. Empty input → ``(0.0, 0.0)``.
|
||||||
``len(values) < 2`` returns ``(mean, mean)``.
|
Single value → ``(value, value)`` (no variance to resample).
|
||||||
"""
|
"""
|
||||||
if not values:
|
arr = np.asarray(list(values), dtype=float)
|
||||||
|
if arr.size == 0:
|
||||||
return (0.0, 0.0)
|
return (0.0, 0.0)
|
||||||
n = len(values)
|
if arr.size == 1:
|
||||||
mean = sum(values) / n
|
v = float(arr[0])
|
||||||
if n < 2 or iterations <= 0:
|
return (v, v)
|
||||||
return (mean, mean)
|
rng = np.random.default_rng(seed)
|
||||||
|
boots = np.empty(n_resamples, dtype=float)
|
||||||
rng = random.Random(seed)
|
n = arr.size
|
||||||
means: list[float] = []
|
for i in range(n_resamples):
|
||||||
for _ in range(iterations):
|
idx = rng.integers(0, n, size=n)
|
||||||
s = 0.0
|
boots[i] = float(arr[idx].mean())
|
||||||
for _ in range(n):
|
lo = float(np.percentile(boots, 2.5))
|
||||||
s += values[rng.randrange(n)]
|
hi = float(np.percentile(boots, 97.5))
|
||||||
means.append(s / n)
|
return (lo, hi)
|
||||||
means.sort()
|
|
||||||
lo_idx = int(math.floor((alpha / 2.0) * iterations))
|
|
||||||
hi_idx = int(math.ceil((1.0 - alpha / 2.0) * iterations)) - 1
|
|
||||||
lo_idx = max(0, min(iterations - 1, lo_idx))
|
|
||||||
hi_idx = max(0, min(iterations - 1, hi_idx))
|
|
||||||
return (means[lo_idx], means[hi_idx])
|
|
||||||
|
|
||||||
|
|
||||||
def win_rate(trades: Iterable[Trade]) -> tuple[int, int, float]:
|
# ---------------------------------------------------------------------------
|
||||||
"""Return ``(wins, n_resolved, wr)`` ignoring pending trades."""
|
# compute_stats
|
||||||
resolved = [t for t in trades if not t.is_pending]
|
# ---------------------------------------------------------------------------
|
||||||
wins = sum(1 for t in resolved if t.is_win)
|
|
||||||
n = len(resolved)
|
|
||||||
|
def _group_stats(
|
||||||
|
overlay_values: list[float | None],
|
||||||
|
*,
|
||||||
|
include_ci: bool,
|
||||||
|
bootstrap_seed: int,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
closed = [v for v in overlay_values if v is not None]
|
||||||
|
n = len(closed)
|
||||||
|
wins = sum(1 for v in closed if v > 0)
|
||||||
wr = (wins / n) if n else 0.0
|
wr = (wins / n) if n else 0.0
|
||||||
return wins, n, wr
|
out: dict[str, Any] = {
|
||||||
|
"n": n,
|
||||||
|
"wr": wr,
|
||||||
def expectancy(trades: Iterable[Trade], overlay: str = "pl_marius") -> float:
|
"expectancy": (sum(closed) / n) if n else 0.0,
|
||||||
"""Mean P/L (in R) over non-pending trades, on the given overlay."""
|
}
|
||||||
if overlay not in {"pl_marius", "pl_theoretical"}:
|
if include_ci:
|
||||||
raise ValueError(f"unknown overlay {overlay!r}")
|
out["wr_ci_95"] = wilson_ci(wins, n)
|
||||||
if overlay == "pl_marius":
|
out["expectancy_ci_95"] = bootstrap_expectancy_ci(
|
||||||
vals = [t.pl_marius for t in trades if t.pl_marius is not None]
|
closed, seed=bootstrap_seed
|
||||||
else:
|
)
|
||||||
vals = [t.pl_theoretical for t in trades if not t.is_pending]
|
|
||||||
if not vals:
|
|
||||||
return 0.0
|
|
||||||
return sum(vals) / len(vals)
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Group stats
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
|
||||||
class GroupStats:
|
|
||||||
key: str
|
|
||||||
n_total: int
|
|
||||||
n_resolved: int
|
|
||||||
wins: int
|
|
||||||
wr: float
|
|
||||||
wr_ci_lo: float
|
|
||||||
wr_ci_hi: float
|
|
||||||
exp_marius: float
|
|
||||||
exp_marius_ci_lo: float
|
|
||||||
exp_marius_ci_hi: float
|
|
||||||
exp_theoretical: float
|
|
||||||
exp_theoretical_ci_lo: float
|
|
||||||
exp_theoretical_ci_hi: float
|
|
||||||
|
|
||||||
|
|
||||||
def group_by(trades: Iterable[Trade], field_name: str) -> dict[str, list[Trade]]:
|
|
||||||
out: dict[str, list[Trade]] = {}
|
|
||||||
for t in trades:
|
|
||||||
key = getattr(t, field_name, "") or "(blank)"
|
|
||||||
out.setdefault(key, []).append(t)
|
|
||||||
return out
|
return out
|
||||||
|
|
||||||
|
|
||||||
def compute_group_stats(
|
def _overlay_value(row: dict[str, str], overlay: str) -> float | None:
|
||||||
trades: list[Trade],
|
raw = row.get(overlay, "")
|
||||||
*,
|
return _parse_optional_float(raw)
|
||||||
label: str,
|
|
||||||
bootstrap_iterations: int = 2000,
|
|
||||||
seed: int | None = None,
|
|
||||||
) -> GroupStats:
|
|
||||||
wins, n_resolved, wr = win_rate(trades)
|
|
||||||
wr_lo, wr_hi = wilson_ci(wins, n_resolved)
|
|
||||||
|
|
||||||
pl_m_vals = [t.pl_marius for t in trades if t.pl_marius is not None]
|
|
||||||
exp_m = (sum(pl_m_vals) / len(pl_m_vals)) if pl_m_vals else 0.0
|
|
||||||
exp_m_lo, exp_m_hi = bootstrap_ci(
|
|
||||||
pl_m_vals, iterations=bootstrap_iterations, seed=seed
|
|
||||||
)
|
|
||||||
|
|
||||||
pl_t_vals = [t.pl_theoretical for t in trades if not t.is_pending]
|
|
||||||
exp_t = (sum(pl_t_vals) / len(pl_t_vals)) if pl_t_vals else 0.0
|
|
||||||
exp_t_lo, exp_t_hi = bootstrap_ci(
|
|
||||||
pl_t_vals,
|
|
||||||
iterations=bootstrap_iterations,
|
|
||||||
seed=None if seed is None else seed + 1,
|
|
||||||
)
|
|
||||||
|
|
||||||
return GroupStats(
|
|
||||||
key=label,
|
|
||||||
n_total=len(trades),
|
|
||||||
n_resolved=n_resolved,
|
|
||||||
wins=wins,
|
|
||||||
wr=wr,
|
|
||||||
wr_ci_lo=wr_lo,
|
|
||||||
wr_ci_hi=wr_hi,
|
|
||||||
exp_marius=exp_m,
|
|
||||||
exp_marius_ci_lo=exp_m_lo,
|
|
||||||
exp_marius_ci_hi=exp_m_hi,
|
|
||||||
exp_theoretical=exp_t,
|
|
||||||
exp_theoretical_ci_lo=exp_t_lo,
|
|
||||||
exp_theoretical_ci_hi=exp_t_hi,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
def compute_stats(
|
||||||
# Calibration mode
|
csv_path: Path | str = "data/jurnal.csv",
|
||||||
# ---------------------------------------------------------------------------
|
overlay: str = "pl_marius",
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Compute aggregate WR + expectancy stats over the backtest rows.
|
||||||
|
|
||||||
|
Calibration rows (``manual_calibration`` / ``vision_calibration``) are
|
||||||
|
excluded; use :func:`compute_calibration` for the P4 mismatch report.
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
``overlay`` selects the P/L column: ``"pl_marius"`` (default — the real
|
||||||
class CalibrationReport:
|
overlay Marius trades) or ``"pl_theoretical"`` (1/3-1/3-1/3 hold-to-TP2).
|
||||||
pairs: int
|
|
||||||
field_mismatches: dict[str, int]
|
|
||||||
total_comparisons: int
|
|
||||||
|
|
||||||
@property
|
|
||||||
def overall_mismatch_rate(self) -> float:
|
|
||||||
if self.total_comparisons == 0:
|
|
||||||
return 0.0
|
|
||||||
total = sum(self.field_mismatches.values())
|
|
||||||
return total / self.total_comparisons
|
|
||||||
|
|
||||||
|
|
||||||
def _normalise_for_compare(field_name: str, value: str) -> str:
|
|
||||||
s = (value or "").strip()
|
|
||||||
if field_name in {"entry", "sl", "tp0", "tp1", "tp2"}:
|
|
||||||
try:
|
|
||||||
return f"{float(s):.4f}"
|
|
||||||
except ValueError:
|
|
||||||
return s
|
|
||||||
return s
|
|
||||||
|
|
||||||
|
|
||||||
def calibration_mismatch(
|
|
||||||
trades: Iterable[Trade],
|
|
||||||
*,
|
|
||||||
fields: tuple[str, ...] = CORE_CALIBRATION_FIELDS,
|
|
||||||
) -> CalibrationReport:
|
|
||||||
"""Pair ``manual_calibration`` and ``vision_calibration`` rows by
|
|
||||||
``screenshot_file``, then count mismatches per ``fields``.
|
|
||||||
|
|
||||||
Returns a :class:`CalibrationReport`. Unpaired calibration rows are
|
|
||||||
silently ignored — they cannot contribute to a comparison.
|
|
||||||
"""
|
"""
|
||||||
manual: dict[str, Trade] = {}
|
if overlay not in {"pl_marius", "pl_theoretical"}:
|
||||||
vision: dict[str, Trade] = {}
|
raise ValueError(f"unknown overlay {overlay!r}")
|
||||||
for t in trades:
|
|
||||||
if t.source == "manual_calibration":
|
|
||||||
manual[t.screenshot_file] = t
|
|
||||||
elif t.source == "vision_calibration":
|
|
||||||
vision[t.screenshot_file] = t
|
|
||||||
|
|
||||||
paired_files = sorted(set(manual) & set(vision))
|
rows = [r for r in _load_rows(csv_path) if r.get("source", "") in BACKTEST_SOURCES]
|
||||||
field_mismatches: dict[str, int] = {f: 0 for f in fields}
|
|
||||||
for f in paired_files:
|
|
||||||
m = manual[f]
|
|
||||||
v = vision[f]
|
|
||||||
for fld in fields:
|
|
||||||
mv = _normalise_for_compare(fld, m.raw.get(fld, ""))
|
|
||||||
vv = _normalise_for_compare(fld, v.raw.get(fld, ""))
|
|
||||||
if mv != vv:
|
|
||||||
field_mismatches[fld] += 1
|
|
||||||
|
|
||||||
total_comparisons = len(paired_files) * len(fields)
|
if not rows:
|
||||||
return CalibrationReport(
|
return {
|
||||||
pairs=len(paired_files),
|
"n_total": 0,
|
||||||
field_mismatches=field_mismatches,
|
"n_pending": 0,
|
||||||
total_comparisons=total_comparisons,
|
"n_closed": 0,
|
||||||
|
"wr": 0.0,
|
||||||
|
"wr_ci_95": (0.0, 0.0),
|
||||||
|
"expectancy": 0.0,
|
||||||
|
"expectancy_ci_95": (0.0, 0.0),
|
||||||
|
"per_set": {},
|
||||||
|
"per_calitate": {},
|
||||||
|
"per_directie": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Pending status is overlay-independent: a trade is pending iff
|
||||||
|
# pl_marius is blank (outcome_path in {pending, TP0->pending}).
|
||||||
|
# pl_theoretical is concrete even for pending rows, so it would otherwise
|
||||||
|
# let pending trades sneak into the closed-trades stats — we mask those
|
||||||
|
# out explicitly here.
|
||||||
|
pending_mask = [_parse_optional_float(r.get("pl_marius", "")) is None for r in rows]
|
||||||
|
overlay_vals: list[float | None] = []
|
||||||
|
for r, is_pending in zip(rows, pending_mask):
|
||||||
|
overlay_vals.append(None if is_pending else _overlay_value(r, overlay))
|
||||||
|
n_total = len(rows)
|
||||||
|
n_pending = sum(1 for p in pending_mask if p)
|
||||||
|
n_closed = n_total - n_pending
|
||||||
|
|
||||||
|
overall = _group_stats(
|
||||||
|
overlay_vals, include_ci=True, bootstrap_seed=42
|
||||||
)
|
)
|
||||||
|
|
||||||
|
def _split(field: str, include_ci: bool) -> dict[str, dict[str, Any]]:
|
||||||
|
groups: dict[str, list[float | None]] = {}
|
||||||
|
for r, v in zip(rows, overlay_vals):
|
||||||
|
key = r.get(field, "") or "(blank)"
|
||||||
|
groups.setdefault(key, []).append(v)
|
||||||
|
out: dict[str, dict[str, Any]] = {}
|
||||||
|
for k in sorted(groups):
|
||||||
|
sub_seed = 42 + (abs(hash(("split", field, k))) % 1_000_000)
|
||||||
|
out[k] = _group_stats(
|
||||||
|
groups[k], include_ci=include_ci, bootstrap_seed=sub_seed
|
||||||
|
)
|
||||||
|
return out
|
||||||
|
|
||||||
|
return {
|
||||||
|
"n_total": n_total,
|
||||||
|
"n_pending": n_pending,
|
||||||
|
"n_closed": n_closed,
|
||||||
|
"wr": overall["wr"],
|
||||||
|
"wr_ci_95": overall["wr_ci_95"],
|
||||||
|
"expectancy": overall["expectancy"],
|
||||||
|
"expectancy_ci_95": overall["expectancy_ci_95"],
|
||||||
|
"per_set": _split("set", include_ci=True),
|
||||||
|
"per_calitate": _split("calitate", include_ci=True),
|
||||||
|
# per_directie skips CI per spec (no wr_ci_95 / expectancy_ci_95 keys).
|
||||||
|
"per_directie": {
|
||||||
|
k: {"n": v["n"], "wr": v["wr"], "expectancy": v["expectancy"]}
|
||||||
|
for k, v in _split("directie", include_ci=False).items()
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Reporting
|
# render_stats
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
@@ -375,110 +274,228 @@ def _fmt_pct(p: float) -> str:
|
|||||||
|
|
||||||
|
|
||||||
def _fmt_r(x: float) -> str:
|
def _fmt_r(x: float) -> str:
|
||||||
return f"{x:+.3f}R"
|
return f"{x:+.2f} R"
|
||||||
|
|
||||||
|
|
||||||
def _fmt_stats_row(s: GroupStats) -> str:
|
def _set_sort_key(name: str) -> tuple[int, str]:
|
||||||
return (
|
order = ["A1", "A2", "A3", "B", "C", "D", "Other"]
|
||||||
f"{s.key:<14} N={s.n_total:>3} (resolved {s.n_resolved:>3}) "
|
return (order.index(name), name) if name in order else (len(order), name)
|
||||||
f"WR={_fmt_pct(s.wr)} [{_fmt_pct(s.wr_ci_lo)}, {_fmt_pct(s.wr_ci_hi)}] "
|
|
||||||
f"E_marius={_fmt_r(s.exp_marius)} "
|
|
||||||
f"[{_fmt_r(s.exp_marius_ci_lo)}, {_fmt_r(s.exp_marius_ci_hi)}] "
|
|
||||||
f"E_theor={_fmt_r(s.exp_theoretical)}"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def format_report(
|
def render_stats(stats: dict[str, Any], overlay: str) -> str:
|
||||||
trades: list[Trade],
|
|
||||||
*,
|
|
||||||
bootstrap_iterations: int = 2000,
|
|
||||||
seed: int | None = None,
|
|
||||||
) -> str:
|
|
||||||
"""Render the main stats report.
|
|
||||||
|
|
||||||
Only ``source in {vision, manual}`` rows are included in the WR /
|
|
||||||
expectancy computations; calibration rows are reported separately via
|
|
||||||
``--calibration``.
|
|
||||||
"""
|
|
||||||
backtest = [t for t in trades if t.source in BACKTEST_SOURCES]
|
|
||||||
lines: list[str] = []
|
lines: list[str] = []
|
||||||
lines.append("=== M2D Backtest Stats ===")
|
lines.append(f"=== Stats jurnal.csv (overlay: {overlay}) ===")
|
||||||
lines.append(f"Backtest rows: {len(backtest)} (calibration excluded)")
|
|
||||||
lines.append("")
|
|
||||||
|
|
||||||
if not backtest:
|
|
||||||
lines.append("(no backtest trades yet)")
|
|
||||||
return "\n".join(lines)
|
|
||||||
|
|
||||||
overall = compute_group_stats(
|
|
||||||
backtest,
|
|
||||||
label="OVERALL",
|
|
||||||
bootstrap_iterations=bootstrap_iterations,
|
|
||||||
seed=seed,
|
|
||||||
)
|
|
||||||
lines.append("-- Overall --")
|
|
||||||
lines.append(_fmt_stats_row(overall))
|
|
||||||
lines.append("")
|
|
||||||
|
|
||||||
def _emit_group(title: str, field_name: str, key_order: list[str] | None = None) -> None:
|
|
||||||
lines.append(f"-- By {title} --")
|
|
||||||
groups = group_by(backtest, field_name)
|
|
||||||
keys = key_order if key_order is not None else sorted(groups)
|
|
||||||
for k in keys:
|
|
||||||
if k not in groups:
|
|
||||||
continue
|
|
||||||
sub_seed = None if seed is None else seed + abs(hash(k)) % 10_000
|
|
||||||
s = compute_group_stats(
|
|
||||||
groups[k],
|
|
||||||
label=k,
|
|
||||||
bootstrap_iterations=bootstrap_iterations,
|
|
||||||
seed=sub_seed,
|
|
||||||
)
|
|
||||||
lines.append(_fmt_stats_row(s))
|
|
||||||
lines.append("")
|
|
||||||
|
|
||||||
_emit_group(
|
|
||||||
"Set",
|
|
||||||
"set",
|
|
||||||
key_order=["A1", "A2", "A3", "B", "C", "D", "Other"],
|
|
||||||
)
|
|
||||||
_emit_group("Instrument", "instrument")
|
|
||||||
lines.append(
|
lines.append(
|
||||||
"[!] By calitate — descriptor only (post-outcome, biased; do not use "
|
f"Trade-uri totale: {stats['n_total']} | "
|
||||||
"as a GO LIVE filter — see STOPPING_RULE.md §3)."
|
f"închise: {stats['n_closed']} | pending: {stats['n_pending']}"
|
||||||
)
|
|
||||||
_emit_group(
|
|
||||||
"calitate",
|
|
||||||
"calitate",
|
|
||||||
key_order=["Clară", "Mai mare ca impuls", "Slabă", "n/a"],
|
|
||||||
)
|
)
|
||||||
|
|
||||||
return "\n".join(lines).rstrip() + "\n"
|
if stats["n_total"] == 0:
|
||||||
|
lines.append("")
|
||||||
|
lines.append("(nu sunt trade-uri backtest în CSV)")
|
||||||
def format_calibration_report(trades: list[Trade]) -> str:
|
|
||||||
cal = calibration_mismatch(trades)
|
|
||||||
lines: list[str] = []
|
|
||||||
lines.append("=== Calibration P4 gate ===")
|
|
||||||
lines.append(f"Paired screenshots (manual ∩ vision): {cal.pairs}")
|
|
||||||
if cal.pairs == 0:
|
|
||||||
lines.append("(no calibration pairs yet)")
|
|
||||||
return "\n".join(lines) + "\n"
|
return "\n".join(lines) + "\n"
|
||||||
|
|
||||||
lines.append("")
|
lines.append("")
|
||||||
lines.append(f"{'field':<14} mismatches / pairs rate")
|
lo, hi = stats["wr_ci_95"]
|
||||||
for fld in CORE_CALIBRATION_FIELDS:
|
e_lo, e_hi = stats["expectancy_ci_95"]
|
||||||
m = cal.field_mismatches.get(fld, 0)
|
lines.append(f"GLOBAL (n={stats['n_closed']}):")
|
||||||
rate = (m / cal.pairs) if cal.pairs else 0.0
|
|
||||||
lines.append(f"{fld:<14} {m:>3} / {cal.pairs:<3} {_fmt_pct(rate)}")
|
|
||||||
lines.append("")
|
|
||||||
lines.append(
|
lines.append(
|
||||||
f"Overall mismatch rate: {_fmt_pct(cal.overall_mismatch_rate)} "
|
f" WR: {_fmt_pct(stats['wr'])} "
|
||||||
f"({sum(cal.field_mismatches.values())} of {cal.total_comparisons} comparisons)"
|
f"[95% CI: {_fmt_pct(lo)}, {_fmt_pct(hi)}]"
|
||||||
)
|
)
|
||||||
threshold = 0.10
|
lines.append(
|
||||||
verdict = "PASS" if cal.overall_mismatch_rate <= threshold else "FAIL"
|
f" Expectancy: {_fmt_r(stats['expectancy'])} "
|
||||||
lines.append(f"P4 gate (<= 10%): {verdict}")
|
f"[95% CI: {_fmt_r(e_lo)}, {_fmt_r(e_hi)}]"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
def _emit_split(
|
||||||
|
title: str,
|
||||||
|
data: dict[str, dict[str, Any]],
|
||||||
|
*,
|
||||||
|
sort_keys: list[str] | None = None,
|
||||||
|
include_ci: bool = True,
|
||||||
|
) -> None:
|
||||||
|
lines.append(title)
|
||||||
|
keys = sort_keys if sort_keys is not None else sorted(data)
|
||||||
|
for k in keys:
|
||||||
|
if k not in data:
|
||||||
|
continue
|
||||||
|
d = data[k]
|
||||||
|
if include_ci and "wr_ci_95" in d:
|
||||||
|
clo, chi = d["wr_ci_95"]
|
||||||
|
lines.append(
|
||||||
|
f" {k:<14} n={d['n']:>3} "
|
||||||
|
f"WR {_fmt_pct(d['wr'])} "
|
||||||
|
f"[{_fmt_pct(clo)}, {_fmt_pct(chi)}] "
|
||||||
|
f"E {_fmt_r(d['expectancy'])}"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
lines.append(
|
||||||
|
f" {k:<14} n={d['n']:>3} "
|
||||||
|
f"WR {_fmt_pct(d['wr'])} "
|
||||||
|
f"E {_fmt_r(d['expectancy'])}"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
_emit_split(
|
||||||
|
"PER SET:",
|
||||||
|
stats["per_set"],
|
||||||
|
sort_keys=sorted(stats["per_set"], key=_set_sort_key),
|
||||||
|
)
|
||||||
|
|
||||||
|
lines.append(
|
||||||
|
"PER CALITATE (⚠️ DESCRIPTOR ONLY — biased post-outcome, NU folosi ca filtru):"
|
||||||
|
)
|
||||||
|
cal_order = ["Clară", "Mai mare ca impuls", "Slabă", "n/a"]
|
||||||
|
keys = [k for k in cal_order if k in stats["per_calitate"]] + [
|
||||||
|
k for k in sorted(stats["per_calitate"]) if k not in cal_order
|
||||||
|
]
|
||||||
|
for k in keys:
|
||||||
|
d = stats["per_calitate"][k]
|
||||||
|
clo, chi = d["wr_ci_95"]
|
||||||
|
lines.append(
|
||||||
|
f" {k:<20} n={d['n']:>3} "
|
||||||
|
f"WR {_fmt_pct(d['wr'])} "
|
||||||
|
f"[{_fmt_pct(clo)}, {_fmt_pct(chi)}] "
|
||||||
|
f"E {_fmt_r(d['expectancy'])}"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
_emit_split("PER DIRECȚIE:", stats["per_directie"], include_ci=False)
|
||||||
|
|
||||||
|
# STOPPING_RULE gate check — flag every Set that hasn't crossed N>=40.
|
||||||
|
lines.append(f"⚠️ STOPPING RULE check (vezi STOPPING_RULE.md, N>={STOPPING_RULE_N}):")
|
||||||
|
set_keys = sorted(stats["per_set"], key=_set_sort_key)
|
||||||
|
any_flagged = False
|
||||||
|
for k in set_keys:
|
||||||
|
n = stats["per_set"][k]["n"]
|
||||||
|
if n < STOPPING_RULE_N:
|
||||||
|
lines.append(f" {k}: N={n} < {STOPPING_RULE_N} → NEEDS MORE DATA")
|
||||||
|
any_flagged = True
|
||||||
|
if not any_flagged:
|
||||||
|
lines.append(f" toate Set-urile au N>={STOPPING_RULE_N} (eligibile pentru GO LIVE check).")
|
||||||
|
|
||||||
|
return "\n".join(lines) + "\n"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# compute_calibration
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _calibration_match(field: str, m_val: str, v_val: str, tol: float = 0.01) -> bool:
|
||||||
|
if field in NUMERIC_CALIBRATION_FIELDS:
|
||||||
|
try:
|
||||||
|
return abs(float(m_val) - float(v_val)) <= tol
|
||||||
|
except ValueError:
|
||||||
|
return (m_val or "").strip() == (v_val or "").strip()
|
||||||
|
return (m_val or "").strip() == (v_val or "").strip()
|
||||||
|
|
||||||
|
|
||||||
|
def compute_calibration(
|
||||||
|
csv_path: Path | str = "data/jurnal.csv",
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Pair calibration legs by ``screenshot_file`` and report per-field mismatch.
|
||||||
|
|
||||||
|
Returns a dict ``{"n_pairs": int, "fields": {field: {match, mismatch,
|
||||||
|
match_rate, mismatch_examples}}}``. ``mismatch_examples`` holds up to 3
|
||||||
|
strings ``"<screenshot_file>: manual=X vs vision=Y"`` per field.
|
||||||
|
|
||||||
|
Numeric fields (``entry/sl/tp0/tp1/tp2``) use a tolerance of 0.01;
|
||||||
|
everything else is exact-string equality after strip.
|
||||||
|
"""
|
||||||
|
rows = _load_rows(csv_path)
|
||||||
|
manual: dict[str, dict[str, str]] = {}
|
||||||
|
vision: dict[str, dict[str, str]] = {}
|
||||||
|
for r in rows:
|
||||||
|
src = r.get("source", "")
|
||||||
|
if src == "manual_calibration":
|
||||||
|
manual[r.get("screenshot_file", "")] = r
|
||||||
|
elif src == "vision_calibration":
|
||||||
|
vision[r.get("screenshot_file", "")] = r
|
||||||
|
|
||||||
|
paired_files = sorted(set(manual) & set(vision))
|
||||||
|
fields_report: dict[str, dict[str, Any]] = {
|
||||||
|
f: {
|
||||||
|
"match": 0,
|
||||||
|
"mismatch": 0,
|
||||||
|
"match_rate": 0.0,
|
||||||
|
"mismatch_examples": [],
|
||||||
|
}
|
||||||
|
for f in CORE_CALIBRATION_FIELDS
|
||||||
|
}
|
||||||
|
|
||||||
|
for f in paired_files:
|
||||||
|
m = manual[f]
|
||||||
|
v = vision[f]
|
||||||
|
for fld in CORE_CALIBRATION_FIELDS:
|
||||||
|
mv = m.get(fld, "")
|
||||||
|
vv = v.get(fld, "")
|
||||||
|
if _calibration_match(fld, mv, vv):
|
||||||
|
fields_report[fld]["match"] += 1
|
||||||
|
else:
|
||||||
|
fields_report[fld]["mismatch"] += 1
|
||||||
|
examples = fields_report[fld]["mismatch_examples"]
|
||||||
|
if len(examples) < 3:
|
||||||
|
examples.append(f"{f}: manual={mv!r} vs vision={vv!r}")
|
||||||
|
|
||||||
|
for fld, data in fields_report.items():
|
||||||
|
total = data["match"] + data["mismatch"]
|
||||||
|
data["match_rate"] = (data["match"] / total) if total else 0.0
|
||||||
|
|
||||||
|
return {"n_pairs": len(paired_files), "fields": fields_report}
|
||||||
|
|
||||||
|
|
||||||
|
def render_calibration(cal: dict[str, Any]) -> str:
|
||||||
|
lines: list[str] = []
|
||||||
|
lines.append("=== Calibration P4 gate (vezi STOPPING_RULE.md §P4) ===")
|
||||||
|
lines.append(f"Perechi calibration: {cal['n_pairs']}")
|
||||||
|
if cal["n_pairs"] == 0:
|
||||||
|
lines.append("(nu există perechi manual_calibration ∩ vision_calibration)")
|
||||||
|
return "\n".join(lines) + "\n"
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
lines.append(f"{'field':<14} match mismatch rate")
|
||||||
|
total_mismatches = 0
|
||||||
|
total_comparisons = 0
|
||||||
|
for fld in CORE_CALIBRATION_FIELDS:
|
||||||
|
d = cal["fields"][fld]
|
||||||
|
n = d["match"] + d["mismatch"]
|
||||||
|
total_mismatches += d["mismatch"]
|
||||||
|
total_comparisons += n
|
||||||
|
lines.append(
|
||||||
|
f"{fld:<14} {d['match']:>5} {d['mismatch']:>8} "
|
||||||
|
f"{_fmt_pct(d['match_rate'])}"
|
||||||
|
)
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
overall_match_rate = (
|
||||||
|
(total_comparisons - total_mismatches) / total_comparisons
|
||||||
|
if total_comparisons
|
||||||
|
else 0.0
|
||||||
|
)
|
||||||
|
overall_mismatch_rate = 1.0 - overall_match_rate
|
||||||
|
verdict = "PASS" if overall_mismatch_rate <= 0.10 else "FAIL"
|
||||||
|
lines.append(
|
||||||
|
f"Overall mismatch rate: {_fmt_pct(overall_mismatch_rate)} "
|
||||||
|
f"({total_mismatches}/{total_comparisons}) → P4 gate: {verdict}"
|
||||||
|
)
|
||||||
|
|
||||||
|
has_examples = any(
|
||||||
|
cal["fields"][f]["mismatch_examples"] for f in CORE_CALIBRATION_FIELDS
|
||||||
|
)
|
||||||
|
if has_examples:
|
||||||
|
lines.append("")
|
||||||
|
lines.append("Mismatch examples (max 3 per field):")
|
||||||
|
for fld in CORE_CALIBRATION_FIELDS:
|
||||||
|
ex = cal["fields"][fld]["mismatch_examples"]
|
||||||
|
if not ex:
|
||||||
|
continue
|
||||||
|
lines.append(f" [{fld}]")
|
||||||
|
for e in ex:
|
||||||
|
lines.append(f" - {e}")
|
||||||
|
|
||||||
return "\n".join(lines) + "\n"
|
return "\n".join(lines) + "\n"
|
||||||
|
|
||||||
|
|
||||||
@@ -498,43 +515,37 @@ def main(argv: list[str] | None = None) -> int:
|
|||||||
default=Path("data/jurnal.csv"),
|
default=Path("data/jurnal.csv"),
|
||||||
help="Path to the jurnal CSV (default: data/jurnal.csv).",
|
help="Path to the jurnal CSV (default: data/jurnal.csv).",
|
||||||
)
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--overlay",
|
||||||
|
choices=("pl_marius", "pl_theoretical"),
|
||||||
|
default="pl_marius",
|
||||||
|
help="Which P/L overlay to use (default: pl_marius).",
|
||||||
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--calibration",
|
"--calibration",
|
||||||
action="store_true",
|
action="store_true",
|
||||||
help="Show P4 calibration mismatch report instead of backtest stats.",
|
help="Show P4 calibration mismatch report instead of backtest stats.",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
|
||||||
"--bootstrap-iterations",
|
|
||||||
type=int,
|
|
||||||
default=2000,
|
|
||||||
help="Bootstrap iterations for expectancy CI (default: 2000).",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--seed",
|
|
||||||
type=int,
|
|
||||||
default=None,
|
|
||||||
help="Seed for the bootstrap RNG (set for deterministic output).",
|
|
||||||
)
|
|
||||||
args = parser.parse_args(argv)
|
args = parser.parse_args(argv)
|
||||||
|
|
||||||
trades = load_trades(args.csv)
|
|
||||||
if args.calibration:
|
|
||||||
out = format_calibration_report(trades)
|
|
||||||
else:
|
|
||||||
out = format_report(
|
|
||||||
trades,
|
|
||||||
bootstrap_iterations=args.bootstrap_iterations,
|
|
||||||
seed=args.seed,
|
|
||||||
)
|
|
||||||
# Force UTF-8 on stdout: the report contains diacritics ("Clară", "Slabă")
|
|
||||||
# and a console codepage like cp1252 would crash on those.
|
|
||||||
try:
|
try:
|
||||||
sys.stdout.reconfigure(encoding="utf-8") # type: ignore[attr-defined]
|
sys.stdout.reconfigure(encoding="utf-8") # type: ignore[attr-defined]
|
||||||
except (AttributeError, OSError):
|
except (AttributeError, OSError):
|
||||||
pass
|
pass
|
||||||
sys.stdout.write(out)
|
|
||||||
|
if args.calibration:
|
||||||
|
cal = compute_calibration(args.csv)
|
||||||
|
sys.stdout.write(render_calibration(cal))
|
||||||
|
else:
|
||||||
|
stats = compute_stats(args.csv, overlay=args.overlay)
|
||||||
|
sys.stdout.write(render_stats(stats, args.overlay))
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
raise SystemExit(main())
|
raise SystemExit(main())
|
||||||
|
|
||||||
|
|
||||||
|
# Ensure the canonical CSV schema is importable from one place — fail fast if
|
||||||
|
# someone removes append_row.CSV_COLUMNS that this module depends on.
|
||||||
|
assert CSV_COLUMNS is not None
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
"""Tests for scripts/stats.py."""
|
"""CSV-fixture tests for scripts.stats — compute_stats, render_stats,
|
||||||
|
compute_calibration, render_calibration, main()."""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
@@ -12,24 +13,17 @@ sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
|||||||
|
|
||||||
from scripts.append_row import CSV_COLUMNS # noqa: E402
|
from scripts.append_row import CSV_COLUMNS # noqa: E402
|
||||||
from scripts.stats import ( # noqa: E402
|
from scripts.stats import ( # noqa: E402
|
||||||
BACKTEST_SOURCES,
|
|
||||||
CORE_CALIBRATION_FIELDS,
|
CORE_CALIBRATION_FIELDS,
|
||||||
bootstrap_ci,
|
compute_calibration,
|
||||||
calibration_mismatch,
|
compute_stats,
|
||||||
compute_group_stats,
|
|
||||||
expectancy,
|
|
||||||
format_calibration_report,
|
|
||||||
format_report,
|
|
||||||
group_by,
|
|
||||||
load_trades,
|
|
||||||
main,
|
main,
|
||||||
win_rate,
|
render_calibration,
|
||||||
wilson_ci,
|
render_stats,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Synthetic CSV fixture: 30 trades
|
# Fixture row builder
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
@@ -78,55 +72,61 @@ def _write_csv(path: Path, rows: list[dict[str, str]]) -> None:
|
|||||||
w.writerow({k: r.get(k, "") for k in CSV_COLUMNS})
|
w.writerow({k: r.get(k, "") for k in CSV_COLUMNS})
|
||||||
|
|
||||||
|
|
||||||
def _synthetic_30(tmp_path: Path) -> Path:
|
# Outcome templates (P/L values) — match scripts.pl_calc tables.
|
||||||
"""30 vision-source trades engineered for known stats.
|
_SL = {"outcome_path": "SL", "max_reached": "SL_first", "be_moved": "False",
|
||||||
|
"pl_marius": "-1.0000", "pl_theoretical": "-1.0000"}
|
||||||
|
_TP0_SL_BE = {"outcome_path": "TP0→SL", "max_reached": "TP0", "be_moved": "True",
|
||||||
|
"pl_marius": "0.2000", "pl_theoretical": "0.1330"}
|
||||||
|
_TP0_TP1 = {"outcome_path": "TP0→TP1", "max_reached": "TP1", "be_moved": "True",
|
||||||
|
"pl_marius": "0.5000", "pl_theoretical": "0.3330"}
|
||||||
|
_TP0_TP2 = {"outcome_path": "TP0→TP2", "max_reached": "TP2", "be_moved": "True",
|
||||||
|
"pl_marius": "0.5000", "pl_theoretical": "0.6670"}
|
||||||
|
_PENDING = {"outcome_path": "pending", "max_reached": "TP0", "be_moved": "False",
|
||||||
|
"pl_marius": "", "pl_theoretical": "0.1330"}
|
||||||
|
|
||||||
Layout (by Set):
|
|
||||||
- A1: 10 trades — 6 wins TP0->TP1 (+0.5), 4 losses SL (-1.0) → WR 60%
|
|
||||||
- A2: 10 trades — 7 wins TP0->TP2 (+0.5), 3 losses SL (-1.0) → WR 70%
|
|
||||||
- A3: 10 trades — 4 wins TP0->TP1 (+0.5), 6 losses SL (-1.0) → WR 40%
|
|
||||||
|
|
||||||
Overall: 17 wins / 30, WR ≈ 56.67%.
|
def _synthetic_csv(tmp_path: Path) -> Path:
|
||||||
|
"""30-trade backtest fixture.
|
||||||
|
|
||||||
|
Set distribution:
|
||||||
|
A1: 8 rows (all closed; 3 SL, 2 TP0→SL, 2 TP0→TP1, 1 TP0→TP2)
|
||||||
|
A2: 10 rows (all closed; 4 SL, 3 TP0→SL, 2 TP0→TP1, 1 TP0→TP2)
|
||||||
|
B : 7 rows (2 pending, 5 closed; 2 SL, 2 TP0→TP1, 1 TP0→TP2)
|
||||||
|
D : 5 rows (3 pending, 2 closed; 1 SL, 1 TP0→TP1)
|
||||||
|
|
||||||
|
Totals: n_total=30, n_pending=5, n_closed=25.
|
||||||
|
|
||||||
|
Wins by pl_marius (>0): all TP0→SL_BE + TP0→TP1 + TP0→TP2
|
||||||
|
A1: 2 + 2 + 1 = 5 wins / 8
|
||||||
|
A2: 3 + 2 + 1 = 6 wins / 10
|
||||||
|
B : 0 + 2 + 1 = 3 wins / 5
|
||||||
|
D : 0 + 1 + 0 = 1 win / 2
|
||||||
|
Total wins = 15 / 25 = 60.0%.
|
||||||
|
|
||||||
|
Calitate distribution: half "Clară", half "Slabă" (alternating).
|
||||||
|
Directie distribution: 2/3 Buy, 1/3 Sell.
|
||||||
"""
|
"""
|
||||||
rows: list[dict[str, str]] = []
|
rows: list[dict[str, str]] = []
|
||||||
rid = 0
|
rid = 0
|
||||||
|
|
||||||
def add(set_label: str, n_win: int, n_loss: int, calitate: str = "Clară") -> None:
|
def add(set_label: str, outcomes: list[dict[str, str]]) -> None:
|
||||||
nonlocal rid
|
nonlocal rid
|
||||||
for _ in range(n_win):
|
for i, outcome in enumerate(outcomes):
|
||||||
rid += 1
|
rid += 1
|
||||||
rows.append(
|
row = _base_row(
|
||||||
_base_row(
|
id=rid,
|
||||||
id=rid,
|
screenshot_file=f"{set_label.lower()}-{rid}.png",
|
||||||
screenshot_file=f"win-{rid}.png",
|
set=set_label,
|
||||||
set=set_label,
|
calitate="Clară" if rid % 2 == 0 else "Slabă",
|
||||||
calitate=calitate,
|
directie="Buy" if rid % 3 != 0 else "Sell",
|
||||||
outcome_path="TP0→TP1",
|
|
||||||
max_reached="TP1",
|
|
||||||
be_moved="True",
|
|
||||||
pl_marius="0.5000",
|
|
||||||
pl_theoretical="0.3330",
|
|
||||||
)
|
|
||||||
)
|
|
||||||
for _ in range(n_loss):
|
|
||||||
rid += 1
|
|
||||||
rows.append(
|
|
||||||
_base_row(
|
|
||||||
id=rid,
|
|
||||||
screenshot_file=f"loss-{rid}.png",
|
|
||||||
set=set_label,
|
|
||||||
calitate=calitate,
|
|
||||||
outcome_path="SL",
|
|
||||||
max_reached="SL_first",
|
|
||||||
be_moved="False",
|
|
||||||
pl_marius="-1.0000",
|
|
||||||
pl_theoretical="-1.0000",
|
|
||||||
)
|
|
||||||
)
|
)
|
||||||
|
row.update({k: str(v) for k, v in outcome.items()})
|
||||||
|
rows.append(row)
|
||||||
|
|
||||||
add("A1", 6, 4)
|
add("A1", [_SL] * 3 + [_TP0_SL_BE] * 2 + [_TP0_TP1] * 2 + [_TP0_TP2] * 1)
|
||||||
add("A2", 7, 3)
|
add("A2", [_SL] * 4 + [_TP0_SL_BE] * 3 + [_TP0_TP1] * 2 + [_TP0_TP2] * 1)
|
||||||
add("A3", 4, 6)
|
add("B", [_PENDING] * 2 + [_SL] * 2 + [_TP0_TP1] * 2 + [_TP0_TP2] * 1)
|
||||||
|
add("D", [_PENDING] * 3 + [_SL] * 1 + [_TP0_TP1] * 1)
|
||||||
|
|
||||||
path = tmp_path / "jurnal.csv"
|
path = tmp_path / "jurnal.csv"
|
||||||
_write_csv(path, rows)
|
_write_csv(path, rows)
|
||||||
@@ -134,336 +134,314 @@ def _synthetic_30(tmp_path: Path) -> Path:
|
|||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Wilson CI — reference values
|
# compute_stats — core
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
class TestWilsonCI:
|
class TestComputeStats:
|
||||||
def test_n_zero(self) -> None:
|
def test_compute_stats_n_pending(self, tmp_path: Path) -> None:
|
||||||
assert wilson_ci(0, 0) == (0.0, 0.0)
|
path = _synthetic_csv(tmp_path)
|
||||||
|
s = compute_stats(path)
|
||||||
|
assert s["n_total"] == 30
|
||||||
|
assert s["n_pending"] == 5
|
||||||
|
assert s["n_closed"] == 25
|
||||||
|
|
||||||
def test_50pct_at_n40(self) -> None:
|
def test_compute_stats_wr_correct(self, tmp_path: Path) -> None:
|
||||||
lo, hi = wilson_ci(20, 40)
|
"""Manual win count: 15 / 25 = 60.0%."""
|
||||||
assert lo == pytest.approx(0.3519927879709976, abs=1e-9)
|
path = _synthetic_csv(tmp_path)
|
||||||
assert hi == pytest.approx(0.6480072120290024, abs=1e-9)
|
s = compute_stats(path)
|
||||||
|
assert s["wr"] == pytest.approx(15 / 25)
|
||||||
|
lo, hi = s["wr_ci_95"]
|
||||||
|
assert 0.0 <= lo <= s["wr"] <= hi <= 1.0
|
||||||
|
|
||||||
def test_55pct_at_n40(self) -> None:
|
def test_compute_stats_per_set(self, tmp_path: Path) -> None:
|
||||||
lo, hi = wilson_ci(22, 40)
|
path = _synthetic_csv(tmp_path)
|
||||||
assert lo == pytest.approx(0.3982882988844078, abs=1e-9)
|
s = compute_stats(path)
|
||||||
assert hi == pytest.approx(0.6929492471905531, abs=1e-9)
|
a2 = s["per_set"]["A2"]
|
||||||
|
assert a2["n"] == 10 # 10 closed A2 trades
|
||||||
|
# A2 wins (pl_marius > 0): 3 BE + 2 TP1 + 1 TP2 = 6 / 10
|
||||||
|
assert a2["wr"] == pytest.approx(0.60)
|
||||||
|
|
||||||
def test_55pct_at_n100(self) -> None:
|
def test_per_set_b_pending_excluded(self, tmp_path: Path) -> None:
|
||||||
# Larger N tightens the CI; lower bound rises above 45%.
|
"""Set B has 7 total rows (2 pending + 5 closed). n must be 5."""
|
||||||
lo, hi = wilson_ci(55, 100)
|
path = _synthetic_csv(tmp_path)
|
||||||
assert lo == pytest.approx(0.4524442703164345, abs=1e-9)
|
s = compute_stats(path)
|
||||||
assert hi == pytest.approx(0.6438562489359655, abs=1e-9)
|
assert s["per_set"]["B"]["n"] == 5
|
||||||
assert lo > 0.45 # STOPPING_RULE GO-LIVE gate
|
# B wins: 0 BE + 2 TP1 + 1 TP2 = 3 / 5
|
||||||
|
assert s["per_set"]["B"]["wr"] == pytest.approx(0.60)
|
||||||
|
|
||||||
def test_zero_wins(self) -> None:
|
def test_per_directie_no_ci_keys(self, tmp_path: Path) -> None:
|
||||||
lo, hi = wilson_ci(0, 10)
|
"""per_directie omits CI fields per spec (only n / wr / expectancy)."""
|
||||||
assert lo == pytest.approx(0.0, abs=1e-12)
|
path = _synthetic_csv(tmp_path)
|
||||||
assert hi == pytest.approx(0.2775401687666165, abs=1e-9)
|
s = compute_stats(path)
|
||||||
|
for k, d in s["per_directie"].items():
|
||||||
|
assert set(d.keys()) == {"n", "wr", "expectancy"}, k
|
||||||
|
|
||||||
def test_all_wins(self) -> None:
|
def test_overlay_theoretical_vs_marius(self, tmp_path: Path) -> None:
|
||||||
lo, hi = wilson_ci(10, 10)
|
path = _synthetic_csv(tmp_path)
|
||||||
assert lo == pytest.approx(0.7224598312333834, abs=1e-9)
|
s_m = compute_stats(path, overlay="pl_marius")
|
||||||
assert hi == pytest.approx(1.0, abs=1e-12)
|
s_t = compute_stats(path, overlay="pl_theoretical")
|
||||||
|
# Same N, but different expectancy.
|
||||||
|
assert s_m["n_closed"] == s_t["n_closed"]
|
||||||
|
assert s_m["expectancy"] != s_t["expectancy"]
|
||||||
|
|
||||||
def test_wins_out_of_range(self) -> None:
|
def test_unknown_overlay_raises(self, tmp_path: Path) -> None:
|
||||||
|
path = _synthetic_csv(tmp_path)
|
||||||
with pytest.raises(ValueError):
|
with pytest.raises(ValueError):
|
||||||
wilson_ci(11, 10)
|
compute_stats(path, overlay="pl_imaginary")
|
||||||
with pytest.raises(ValueError):
|
|
||||||
wilson_ci(-1, 10)
|
|
||||||
|
|
||||||
|
def test_empty_csv_no_crash(self, tmp_path: Path) -> None:
|
||||||
|
path = tmp_path / "empty.csv"
|
||||||
|
_write_csv(path, [])
|
||||||
|
s = compute_stats(path)
|
||||||
|
assert s["n_total"] == 0
|
||||||
|
assert s["n_closed"] == 0
|
||||||
|
assert s["per_set"] == {}
|
||||||
|
assert s["wr"] == 0.0
|
||||||
|
assert s["wr_ci_95"] == (0.0, 0.0)
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
def test_missing_csv_no_crash(self, tmp_path: Path) -> None:
|
||||||
# Bootstrap CI — determinism + sanity
|
# Nonexistent path: treat as empty, do not raise.
|
||||||
# ---------------------------------------------------------------------------
|
s = compute_stats(tmp_path / "ghost.csv")
|
||||||
|
assert s["n_total"] == 0
|
||||||
|
|
||||||
|
def test_calibration_rows_excluded(self, tmp_path: Path) -> None:
|
||||||
class TestBootstrapCI:
|
|
||||||
def test_deterministic_with_seed(self) -> None:
|
|
||||||
vals = [0.5, -1.0, 0.5, 0.5, -1.0, 0.2, -0.3, 0.5, -1.0, 0.5]
|
|
||||||
lo1, hi1 = bootstrap_ci(vals, iterations=500, seed=42)
|
|
||||||
lo2, hi2 = bootstrap_ci(vals, iterations=500, seed=42)
|
|
||||||
assert (lo1, hi1) == (lo2, hi2)
|
|
||||||
|
|
||||||
def test_different_seed_different_result(self) -> None:
|
|
||||||
vals = [0.5, -1.0, 0.5, 0.5, -1.0, 0.2, -0.3, 0.5, -1.0, 0.5]
|
|
||||||
r1 = bootstrap_ci(vals, iterations=500, seed=1)
|
|
||||||
r2 = bootstrap_ci(vals, iterations=500, seed=2)
|
|
||||||
assert r1 != r2
|
|
||||||
|
|
||||||
def test_brackets_the_mean(self) -> None:
|
|
||||||
vals = [0.5, -1.0, 0.5, 0.5, -1.0, 0.2, -0.3, 0.5, -1.0, 0.5] * 5
|
|
||||||
mean = sum(vals) / len(vals)
|
|
||||||
lo, hi = bootstrap_ci(vals, iterations=1000, seed=7)
|
|
||||||
assert lo <= mean <= hi
|
|
||||||
|
|
||||||
def test_empty_input(self) -> None:
|
|
||||||
assert bootstrap_ci([], iterations=100, seed=0) == (0.0, 0.0)
|
|
||||||
|
|
||||||
def test_single_value(self) -> None:
|
|
||||||
lo, hi = bootstrap_ci([0.5], iterations=100, seed=0)
|
|
||||||
# No variance with n=1: short-circuited to (mean, mean).
|
|
||||||
assert lo == pytest.approx(0.5)
|
|
||||||
assert hi == pytest.approx(0.5)
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Loading + group stats on the 30-trade fixture
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
class TestSyntheticFixture:
|
|
||||||
def test_load_30(self, tmp_path: Path) -> None:
|
|
||||||
path = _synthetic_30(tmp_path)
|
|
||||||
trades = load_trades(path)
|
|
||||||
assert len(trades) == 30
|
|
||||||
assert all(t.source == "vision" for t in trades)
|
|
||||||
|
|
||||||
def test_overall_wr(self, tmp_path: Path) -> None:
|
|
||||||
trades = load_trades(_synthetic_30(tmp_path))
|
|
||||||
wins, n, wr = win_rate(trades)
|
|
||||||
assert wins == 17
|
|
||||||
assert n == 30
|
|
||||||
assert wr == pytest.approx(17 / 30)
|
|
||||||
|
|
||||||
def test_overall_expectancy(self, tmp_path: Path) -> None:
|
|
||||||
trades = load_trades(_synthetic_30(tmp_path))
|
|
||||||
# 17 wins * 0.5 + 13 losses * -1.0 = 8.5 - 13.0 = -4.5 → mean = -0.15
|
|
||||||
assert expectancy(trades) == pytest.approx(-0.15, abs=1e-9)
|
|
||||||
|
|
||||||
def test_per_set_wr(self, tmp_path: Path) -> None:
|
|
||||||
trades = load_trades(_synthetic_30(tmp_path))
|
|
||||||
by_set = group_by(trades, "set")
|
|
||||||
wr_a1 = win_rate(by_set["A1"])[2]
|
|
||||||
wr_a2 = win_rate(by_set["A2"])[2]
|
|
||||||
wr_a3 = win_rate(by_set["A3"])[2]
|
|
||||||
assert wr_a1 == pytest.approx(0.60)
|
|
||||||
assert wr_a2 == pytest.approx(0.70)
|
|
||||||
assert wr_a3 == pytest.approx(0.40)
|
|
||||||
|
|
||||||
def test_group_stats_a2(self, tmp_path: Path) -> None:
|
|
||||||
trades = load_trades(_synthetic_30(tmp_path))
|
|
||||||
a2 = [t for t in trades if t.set == "A2"]
|
|
||||||
s = compute_group_stats(
|
|
||||||
a2, label="A2", bootstrap_iterations=500, seed=11
|
|
||||||
)
|
|
||||||
assert s.n_total == 10
|
|
||||||
assert s.n_resolved == 10
|
|
||||||
assert s.wins == 7
|
|
||||||
assert s.wr == pytest.approx(0.70)
|
|
||||||
# Wilson 7/10
|
|
||||||
assert s.wr_ci_lo == pytest.approx(0.3967732199795652, abs=1e-9)
|
|
||||||
assert s.wr_ci_hi == pytest.approx(0.892210712513788, abs=1e-9)
|
|
||||||
# Expectancy A2 = 7*0.5 + 3*(-1.0) = 0.5 → mean = 0.05
|
|
||||||
assert s.exp_marius == pytest.approx(0.05, abs=1e-9)
|
|
||||||
assert s.exp_marius_ci_lo <= s.exp_marius <= s.exp_marius_ci_hi
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Pending-trade handling
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
class TestPendingHandling:
|
|
||||||
def test_pending_excluded_from_wr(self, tmp_path: Path) -> None:
|
|
||||||
rows = [
|
rows = [
|
||||||
_base_row(
|
_base_row(id=1, source="vision", screenshot_file="v.png"),
|
||||||
id=1, screenshot_file="a.png",
|
_base_row(id=2, source="manual_calibration", screenshot_file="c.png"),
|
||||||
outcome_path="TP0→TP1", max_reached="TP1",
|
_base_row(id=3, source="vision_calibration", screenshot_file="c.png"),
|
||||||
be_moved="True", pl_marius="0.5000", pl_theoretical="0.3330",
|
|
||||||
),
|
|
||||||
_base_row(
|
|
||||||
id=2, screenshot_file="b.png",
|
|
||||||
outcome_path="pending", max_reached="TP0",
|
|
||||||
be_moved="False", pl_marius="", pl_theoretical="0.1330",
|
|
||||||
),
|
|
||||||
_base_row(
|
|
||||||
id=3, screenshot_file="c.png",
|
|
||||||
outcome_path="SL", max_reached="SL_first",
|
|
||||||
be_moved="False", pl_marius="-1.0000", pl_theoretical="-1.0000",
|
|
||||||
),
|
|
||||||
]
|
]
|
||||||
p = tmp_path / "j.csv"
|
path = tmp_path / "j.csv"
|
||||||
_write_csv(p, rows)
|
_write_csv(path, rows)
|
||||||
trades = load_trades(p)
|
s = compute_stats(path)
|
||||||
|
assert s["n_total"] == 1 # calibration rows filtered out
|
||||||
wins, n, wr = win_rate(trades)
|
|
||||||
assert wins == 1
|
|
||||||
assert n == 2 # pending excluded
|
|
||||||
assert wr == pytest.approx(0.5)
|
|
||||||
# Expectancy on pl_marius averages only resolved rows: (0.5 + -1.0) / 2 = -0.25
|
|
||||||
assert expectancy(trades, "pl_marius") == pytest.approx(-0.25)
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Source filtering: calibration rows excluded from main report
|
# render_stats
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
class TestSourceFiltering:
|
class TestRenderStats:
|
||||||
def test_calibration_rows_excluded_from_backtest_stats(
|
def test_render_stats_no_crash(self, tmp_path: Path) -> None:
|
||||||
self, tmp_path: Path
|
path = _synthetic_csv(tmp_path)
|
||||||
) -> None:
|
s = compute_stats(path)
|
||||||
rows = [
|
out = render_stats(s, "pl_marius")
|
||||||
_base_row(id=1, source="vision", screenshot_file="v.png",
|
assert isinstance(out, str)
|
||||||
pl_marius="0.5000"),
|
assert out # non-empty
|
||||||
_base_row(id=2, source="manual", screenshot_file="m.png",
|
assert "STOPPING RULE" in out
|
||||||
pl_marius="0.5000"),
|
|
||||||
_base_row(id=3, source="manual_calibration", screenshot_file="c.png",
|
def test_render_stats_contains_sections(self, tmp_path: Path) -> None:
|
||||||
pl_marius="-1.0000"),
|
path = _synthetic_csv(tmp_path)
|
||||||
_base_row(id=4, source="vision_calibration", screenshot_file="c.png",
|
out = render_stats(compute_stats(path), "pl_marius")
|
||||||
pl_marius="-1.0000"),
|
for marker in (
|
||||||
]
|
"Stats jurnal.csv",
|
||||||
p = tmp_path / "j.csv"
|
"Trade-uri totale",
|
||||||
_write_csv(p, rows)
|
"GLOBAL",
|
||||||
trades = load_trades(p)
|
"PER SET:",
|
||||||
backtest = [t for t in trades if t.source in BACKTEST_SOURCES]
|
"PER CALITATE",
|
||||||
assert len(backtest) == 2
|
"PER DIRECȚIE",
|
||||||
wins, n, wr = win_rate(backtest)
|
"DESCRIPTOR ONLY",
|
||||||
assert (wins, n) == (2, 2)
|
):
|
||||||
assert wr == pytest.approx(1.0)
|
assert marker in out, f"missing section: {marker!r}"
|
||||||
|
|
||||||
|
def test_render_stats_flags_under_threshold(self, tmp_path: Path) -> None:
|
||||||
|
"""All Sets in synthetic fixture have N<40 → all should be flagged."""
|
||||||
|
path = _synthetic_csv(tmp_path)
|
||||||
|
out = render_stats(compute_stats(path), "pl_marius")
|
||||||
|
for k in ("A1", "A2", "B", "D"):
|
||||||
|
assert f"{k}: N=" in out
|
||||||
|
assert "NEEDS MORE DATA" in out
|
||||||
|
|
||||||
|
def test_render_stats_empty(self, tmp_path: Path) -> None:
|
||||||
|
path = tmp_path / "empty.csv"
|
||||||
|
_write_csv(path, [])
|
||||||
|
out = render_stats(compute_stats(path), "pl_marius")
|
||||||
|
assert "Trade-uri totale: 0" in out
|
||||||
|
# No crash, no per-Set table for an empty dataset.
|
||||||
|
assert "NEEDS MORE DATA" not in out
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Calibration mode: pairing + mismatch
|
# compute_calibration
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
class TestCalibration:
|
class TestComputeCalibration:
|
||||||
def test_pairs_and_zero_mismatch(self, tmp_path: Path) -> None:
|
def test_compute_calibration_pairs(self, tmp_path: Path) -> None:
|
||||||
m = _base_row(
|
rows: list[dict[str, str]] = []
|
||||||
id=1, source="manual_calibration", screenshot_file="cal-1.png"
|
for i in range(5):
|
||||||
)
|
f = f"cal-{i}.png"
|
||||||
v = _base_row(
|
rows.append(_base_row(
|
||||||
id=2, source="vision_calibration", screenshot_file="cal-1.png"
|
id=i * 2 + 1, source="manual_calibration", screenshot_file=f
|
||||||
)
|
))
|
||||||
p = tmp_path / "j.csv"
|
rows.append(_base_row(
|
||||||
_write_csv(p, [m, v])
|
id=i * 2 + 2, source="vision_calibration", screenshot_file=f
|
||||||
trades = load_trades(p)
|
))
|
||||||
rep = calibration_mismatch(trades)
|
path = tmp_path / "j.csv"
|
||||||
assert rep.pairs == 1
|
_write_csv(path, rows)
|
||||||
assert sum(rep.field_mismatches.values()) == 0
|
cal = compute_calibration(path)
|
||||||
assert rep.overall_mismatch_rate == 0.0
|
assert cal["n_pairs"] == 5
|
||||||
|
|
||||||
def test_one_field_mismatch(self, tmp_path: Path) -> None:
|
|
||||||
m = _base_row(
|
|
||||||
id=1, source="manual_calibration", screenshot_file="cal-1.png",
|
|
||||||
entry="400.0",
|
|
||||||
)
|
|
||||||
v = _base_row(
|
|
||||||
id=2, source="vision_calibration", screenshot_file="cal-1.png",
|
|
||||||
entry="400.10", # different entry
|
|
||||||
)
|
|
||||||
p = tmp_path / "j.csv"
|
|
||||||
_write_csv(p, [m, v])
|
|
||||||
trades = load_trades(p)
|
|
||||||
rep = calibration_mismatch(trades)
|
|
||||||
assert rep.pairs == 1
|
|
||||||
assert rep.field_mismatches["entry"] == 1
|
|
||||||
# all other core fields match
|
|
||||||
for fld in CORE_CALIBRATION_FIELDS:
|
for fld in CORE_CALIBRATION_FIELDS:
|
||||||
if fld == "entry":
|
assert fld in cal["fields"]
|
||||||
continue
|
# All identical → 5 matches, 0 mismatches per field.
|
||||||
assert rep.field_mismatches[fld] == 0
|
assert cal["fields"][fld]["match"] == 5
|
||||||
# 1 mismatch / (1 pair * 8 fields) = 12.5%
|
assert cal["fields"][fld]["mismatch"] == 0
|
||||||
assert rep.overall_mismatch_rate == pytest.approx(1.0 / len(CORE_CALIBRATION_FIELDS))
|
assert cal["fields"][fld]["match_rate"] == pytest.approx(1.0)
|
||||||
|
|
||||||
def test_unpaired_rows_ignored(self, tmp_path: Path) -> None:
|
def test_compute_calibration_mismatch_examples(self, tmp_path: Path) -> None:
|
||||||
# Only a manual leg — no pair → 0 pairs.
|
"""Modify entry on 2 pairs → mismatch_examples contains both."""
|
||||||
m = _base_row(
|
rows: list[dict[str, str]] = []
|
||||||
id=1, source="manual_calibration", screenshot_file="lonely.png"
|
for i in range(5):
|
||||||
)
|
f = f"cal-{i}.png"
|
||||||
p = tmp_path / "j.csv"
|
manual_entry = "400.0"
|
||||||
_write_csv(p, [m])
|
# First two pairs differ on entry; the rest match exactly.
|
||||||
trades = load_trades(p)
|
vision_entry = "401.5" if i < 2 else "400.0"
|
||||||
rep = calibration_mismatch(trades)
|
rows.append(_base_row(
|
||||||
assert rep.pairs == 0
|
id=i * 2 + 1, source="manual_calibration",
|
||||||
assert rep.total_comparisons == 0
|
screenshot_file=f, entry=manual_entry,
|
||||||
assert rep.overall_mismatch_rate == 0.0
|
))
|
||||||
|
rows.append(_base_row(
|
||||||
|
id=i * 2 + 2, source="vision_calibration",
|
||||||
|
screenshot_file=f, entry=vision_entry,
|
||||||
|
))
|
||||||
|
path = tmp_path / "j.csv"
|
||||||
|
_write_csv(path, rows)
|
||||||
|
cal = compute_calibration(path)
|
||||||
|
assert cal["n_pairs"] == 5
|
||||||
|
entry = cal["fields"]["entry"]
|
||||||
|
assert entry["match"] == 3
|
||||||
|
assert entry["mismatch"] == 2
|
||||||
|
assert entry["match_rate"] == pytest.approx(3 / 5)
|
||||||
|
assert len(entry["mismatch_examples"]) == 2
|
||||||
|
for ex in entry["mismatch_examples"]:
|
||||||
|
assert "manual=" in ex and "vision=" in ex
|
||||||
|
|
||||||
def test_numeric_equivalence_tolerated(self, tmp_path: Path) -> None:
|
def test_calibration_examples_capped_at_3(self, tmp_path: Path) -> None:
|
||||||
"""'400' and '400.0000' should NOT count as a mismatch on entry."""
|
"""5 mismatches but mismatch_examples is capped at 3."""
|
||||||
m = _base_row(
|
rows: list[dict[str, str]] = []
|
||||||
id=1, source="manual_calibration", screenshot_file="cal-1.png",
|
for i in range(5):
|
||||||
entry="400",
|
f = f"cal-{i}.png"
|
||||||
)
|
rows.append(_base_row(
|
||||||
v = _base_row(
|
id=i * 2 + 1, source="manual_calibration",
|
||||||
id=2, source="vision_calibration", screenshot_file="cal-1.png",
|
screenshot_file=f, entry="400.0",
|
||||||
entry="400.0000",
|
))
|
||||||
)
|
rows.append(_base_row(
|
||||||
p = tmp_path / "j.csv"
|
id=i * 2 + 2, source="vision_calibration",
|
||||||
_write_csv(p, [m, v])
|
screenshot_file=f, entry="500.0",
|
||||||
rep = calibration_mismatch(load_trades(p))
|
))
|
||||||
assert rep.field_mismatches["entry"] == 0
|
path = tmp_path / "j.csv"
|
||||||
|
_write_csv(path, rows)
|
||||||
|
cal = compute_calibration(path)
|
||||||
|
assert cal["fields"]["entry"]["mismatch"] == 5
|
||||||
|
assert len(cal["fields"]["entry"]["mismatch_examples"]) == 3
|
||||||
|
|
||||||
|
def test_calibration_numeric_tolerance(self, tmp_path: Path) -> None:
|
||||||
# ---------------------------------------------------------------------------
|
"""Floats within 0.01 must NOT count as a mismatch."""
|
||||||
# Report formatting + CLI
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
class TestReporting:
|
|
||||||
def test_format_report_contains_sections(self, tmp_path: Path) -> None:
|
|
||||||
out = format_report(
|
|
||||||
load_trades(_synthetic_30(tmp_path)),
|
|
||||||
bootstrap_iterations=200,
|
|
||||||
seed=0,
|
|
||||||
)
|
|
||||||
assert "M2D Backtest Stats" in out
|
|
||||||
assert "Overall" in out
|
|
||||||
assert "By Set" in out
|
|
||||||
assert "A1" in out and "A2" in out and "A3" in out
|
|
||||||
# calitate warning present
|
|
||||||
assert "descriptor only" in out.lower() or "biased" in out.lower()
|
|
||||||
|
|
||||||
def test_format_calibration_report(self, tmp_path: Path) -> None:
|
|
||||||
rows = [
|
rows = [
|
||||||
_base_row(
|
_base_row(
|
||||||
id=1, source="manual_calibration", screenshot_file="cal-1.png"
|
id=1, source="manual_calibration",
|
||||||
|
screenshot_file="cal-1.png", entry="400.005",
|
||||||
),
|
),
|
||||||
_base_row(
|
_base_row(
|
||||||
id=2, source="vision_calibration", screenshot_file="cal-1.png",
|
id=2, source="vision_calibration",
|
||||||
directie="Sell", # mismatch on directie
|
screenshot_file="cal-1.png", entry="400.010",
|
||||||
entry="400.0", sl="401.0", tp0="399.5", tp1="399.0", tp2="398.0",
|
|
||||||
),
|
),
|
||||||
]
|
]
|
||||||
p = tmp_path / "j.csv"
|
path = tmp_path / "j.csv"
|
||||||
_write_csv(p, rows)
|
_write_csv(path, rows)
|
||||||
out = format_calibration_report(load_trades(p))
|
cal = compute_calibration(path)
|
||||||
assert "Paired screenshots" in out
|
assert cal["fields"]["entry"]["match"] == 1
|
||||||
|
assert cal["fields"]["entry"]["mismatch"] == 0
|
||||||
|
|
||||||
|
def test_calibration_outside_tolerance(self, tmp_path: Path) -> None:
|
||||||
|
"""Floats > 0.01 apart DO count as a mismatch."""
|
||||||
|
rows = [
|
||||||
|
_base_row(
|
||||||
|
id=1, source="manual_calibration",
|
||||||
|
screenshot_file="cal-1.png", entry="400.00",
|
||||||
|
),
|
||||||
|
_base_row(
|
||||||
|
id=2, source="vision_calibration",
|
||||||
|
screenshot_file="cal-1.png", entry="400.05",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
path = tmp_path / "j.csv"
|
||||||
|
_write_csv(path, rows)
|
||||||
|
cal = compute_calibration(path)
|
||||||
|
assert cal["fields"]["entry"]["mismatch"] == 1
|
||||||
|
|
||||||
|
def test_calibration_no_pairs(self, tmp_path: Path) -> None:
|
||||||
|
"""No paired screenshot → n_pairs=0, all rates 0.0."""
|
||||||
|
path = tmp_path / "j.csv"
|
||||||
|
_write_csv(path, [
|
||||||
|
_base_row(id=1, source="manual_calibration", screenshot_file="lonely.png"),
|
||||||
|
])
|
||||||
|
cal = compute_calibration(path)
|
||||||
|
assert cal["n_pairs"] == 0
|
||||||
|
for fld in CORE_CALIBRATION_FIELDS:
|
||||||
|
assert cal["fields"][fld]["match"] == 0
|
||||||
|
assert cal["fields"][fld]["mismatch"] == 0
|
||||||
|
|
||||||
|
def test_render_calibration_no_crash(self, tmp_path: Path) -> None:
|
||||||
|
rows = [
|
||||||
|
_base_row(id=1, source="manual_calibration",
|
||||||
|
screenshot_file="cal-1.png", directie="Buy"),
|
||||||
|
_base_row(id=2, source="vision_calibration",
|
||||||
|
screenshot_file="cal-1.png", directie="Sell",
|
||||||
|
entry="400.0", sl="401.0", tp0="399.5",
|
||||||
|
tp1="399.0", tp2="398.0"),
|
||||||
|
]
|
||||||
|
path = tmp_path / "j.csv"
|
||||||
|
_write_csv(path, rows)
|
||||||
|
out = render_calibration(compute_calibration(path))
|
||||||
|
assert "Calibration P4" in out
|
||||||
assert "directie" in out
|
assert "directie" in out
|
||||||
# 1 mismatch (directie) of 8 fields = 12.5% → FAIL P4 gate
|
|
||||||
assert "FAIL" in out
|
|
||||||
|
|
||||||
def test_empty_csv_report(self, tmp_path: Path) -> None:
|
def test_render_calibration_empty(self, tmp_path: Path) -> None:
|
||||||
p = tmp_path / "empty.csv"
|
path = tmp_path / "empty.csv"
|
||||||
_write_csv(p, [])
|
_write_csv(path, [])
|
||||||
out = format_report(load_trades(p))
|
out = render_calibration(compute_calibration(path))
|
||||||
assert "no backtest trades" in out.lower()
|
assert "0" in out
|
||||||
|
assert "FAIL" not in out
|
||||||
|
assert "PASS" not in out
|
||||||
|
|
||||||
def test_main_cli_runs(
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestCLI:
|
||||||
|
def test_main_stats(
|
||||||
self, tmp_path: Path, capsys: pytest.CaptureFixture
|
self, tmp_path: Path, capsys: pytest.CaptureFixture
|
||||||
) -> None:
|
) -> None:
|
||||||
path = _synthetic_30(tmp_path)
|
path = _synthetic_csv(tmp_path)
|
||||||
rc = main(["--csv", str(path), "--seed", "0", "--bootstrap-iterations", "100"])
|
rc = main(["--csv", str(path)])
|
||||||
assert rc == 0
|
assert rc == 0
|
||||||
captured = capsys.readouterr()
|
assert "Stats jurnal.csv" in capsys.readouterr().out
|
||||||
assert "M2D Backtest Stats" in captured.out
|
|
||||||
|
|
||||||
def test_main_cli_calibration(
|
def test_main_overlay(
|
||||||
|
self, tmp_path: Path, capsys: pytest.CaptureFixture
|
||||||
|
) -> None:
|
||||||
|
path = _synthetic_csv(tmp_path)
|
||||||
|
rc = main(["--csv", str(path), "--overlay", "pl_theoretical"])
|
||||||
|
assert rc == 0
|
||||||
|
assert "pl_theoretical" in capsys.readouterr().out
|
||||||
|
|
||||||
|
def test_main_calibration(
|
||||||
self, tmp_path: Path, capsys: pytest.CaptureFixture
|
self, tmp_path: Path, capsys: pytest.CaptureFixture
|
||||||
) -> None:
|
) -> None:
|
||||||
rows = [
|
rows = [
|
||||||
_base_row(id=1, source="manual_calibration", screenshot_file="cal-1.png"),
|
_base_row(id=1, source="manual_calibration",
|
||||||
_base_row(id=2, source="vision_calibration", screenshot_file="cal-1.png"),
|
screenshot_file="cal-1.png"),
|
||||||
|
_base_row(id=2, source="vision_calibration",
|
||||||
|
screenshot_file="cal-1.png"),
|
||||||
]
|
]
|
||||||
p = tmp_path / "j.csv"
|
path = tmp_path / "j.csv"
|
||||||
_write_csv(p, rows)
|
_write_csv(path, rows)
|
||||||
rc = main(["--csv", str(p), "--calibration"])
|
rc = main(["--csv", str(path), "--calibration"])
|
||||||
assert rc == 0
|
assert rc == 0
|
||||||
out = capsys.readouterr().out
|
out = capsys.readouterr().out
|
||||||
assert "Calibration P4 gate" in out
|
assert "Calibration P4" in out
|
||||||
assert "PASS" in out # all fields match → PASS
|
assert "PASS" in out
|
||||||
|
|||||||
83
tests/test_stats_ci.py
Normal file
83
tests/test_stats_ci.py
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
"""Pure-math tests for stats CI primitives (no I/O)."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
|
||||||
|
|
||||||
|
from scripts.stats import bootstrap_expectancy_ci, wilson_ci # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Wilson CI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestWilsonCI:
|
||||||
|
def test_wilson_n_zero(self) -> None:
|
||||||
|
assert wilson_ci(0, 0) == (0.0, 0.0)
|
||||||
|
|
||||||
|
def test_wilson_perfect_winrate(self) -> None:
|
||||||
|
lo, hi = wilson_ci(10, 10)
|
||||||
|
assert lo > 0.65
|
||||||
|
assert hi == pytest.approx(1.0, abs=1e-12)
|
||||||
|
|
||||||
|
def test_wilson_reference_15_55(self) -> None:
|
||||||
|
"""wins=8, n=15 (WR≈53%) → CI approximately [29%, 76%] ±2%."""
|
||||||
|
lo, hi = wilson_ci(8, 15)
|
||||||
|
assert lo == pytest.approx(0.29, abs=0.02)
|
||||||
|
assert hi == pytest.approx(0.76, abs=0.02)
|
||||||
|
|
||||||
|
def test_wilson_all_losses(self) -> None:
|
||||||
|
lo, hi = wilson_ci(0, 10)
|
||||||
|
assert lo == pytest.approx(0.0, abs=1e-12)
|
||||||
|
assert hi < 0.35
|
||||||
|
|
||||||
|
def test_wilson_wins_out_of_range(self) -> None:
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
wilson_ci(11, 10)
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
wilson_ci(-1, 10)
|
||||||
|
|
||||||
|
def test_wilson_clamps_at_50pct_n40(self) -> None:
|
||||||
|
"""Reference at WR=50%, N=40: CI ≈ [35.2%, 64.8%]."""
|
||||||
|
lo, hi = wilson_ci(20, 40)
|
||||||
|
assert lo == pytest.approx(0.352, abs=0.005)
|
||||||
|
assert hi == pytest.approx(0.648, abs=0.005)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Bootstrap CI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestBootstrap:
|
||||||
|
def test_bootstrap_deterministic(self) -> None:
|
||||||
|
values = [1.0, -0.5, 0.5, -1.0]
|
||||||
|
a = bootstrap_expectancy_ci(values, n_resamples=1000, seed=42)
|
||||||
|
b = bootstrap_expectancy_ci(values, n_resamples=1000, seed=42)
|
||||||
|
assert a == b
|
||||||
|
|
||||||
|
def test_bootstrap_different_seed_different_result(self) -> None:
|
||||||
|
values = [1.0, -0.5, 0.5, -1.0, 0.2, -0.3, 0.5]
|
||||||
|
a = bootstrap_expectancy_ci(values, n_resamples=1000, seed=1)
|
||||||
|
b = bootstrap_expectancy_ci(values, n_resamples=1000, seed=2)
|
||||||
|
assert a != b
|
||||||
|
|
||||||
|
def test_bootstrap_empty(self) -> None:
|
||||||
|
assert bootstrap_expectancy_ci([], n_resamples=100, seed=0) == (0.0, 0.0)
|
||||||
|
|
||||||
|
def test_bootstrap_single_value(self) -> None:
|
||||||
|
lo, hi = bootstrap_expectancy_ci([0.5], n_resamples=100, seed=0)
|
||||||
|
assert lo == pytest.approx(0.5, abs=1e-9)
|
||||||
|
assert hi == pytest.approx(0.5, abs=1e-9)
|
||||||
|
|
||||||
|
def test_bootstrap_brackets_the_mean(self) -> None:
|
||||||
|
values = [0.5, -1.0, 0.5, 0.5, -1.0, 0.2, -0.3, 0.5, -1.0, 0.5] * 5
|
||||||
|
mean = sum(values) / len(values)
|
||||||
|
lo, hi = bootstrap_expectancy_ci(values, n_resamples=1000, seed=7)
|
||||||
|
assert lo <= mean <= hi
|
||||||
Reference in New Issue
Block a user