commands: m2d-log + backtest + batch + stats slash commands (124 tests pass)

This commit is contained in:
Marius
2026-05-13 12:48:26 +03:00
parent 26d084dc4b
commit 34af5b631e
7 changed files with 1111 additions and 730 deletions

View File

@@ -0,0 +1,73 @@
---
description: Run vision extraction on a single TradeStation screenshot, then append to jurnal CSV + regenerate MD.
argument-hint: "<screenshot_path_or_basename> [--calibration]"
---
# /backtest — single screenshot vision extraction
Lansează subagentul `m2d-extractor` pe un screenshot, primește JSON-ul, append la `data/jurnal.csv`, regenerează `data/jurnal.md`.
## Arguments
- `$1` (obligatoriu) — path la screenshot. Acceptă:
- basename (`2026-05-13-dia-1645.png`) — caută în `screenshots/inbox/`, fallback `screenshots/processed/`
- path relativ sau absolut explicit
- `--calibration` (flag) — `source=vision_calibration` în loc de `source=vision`. Folosit împreună cu `/m2d-log --calibration` pe același screenshot pentru P4 mismatch report.
## Workflow
1. **Rezolvă path-ul** screenshot-ului. Dacă `$1` e doar basename, încearcă `screenshots/inbox/<basename>` apoi `screenshots/processed/<basename>`. Dacă nu există nicăieri, raportezi eroare și te oprești.
2. **Invocă subagentul `m2d-extractor`** (definit în `.claude/agents/m2d-extractor.md`) prin Task tool cu `subagent_type: "m2d-extractor"`. Prompt-ul către agent:
```
screenshot_path: <absolute_path>
screenshot_file: <basename>
```
Agentul scrie `data/extractions/<basename_no_ext>.json` + `.log` și returnează status-line scurt.
3. **Verifică output-ul**:
- Dacă fișierul `data/extractions/<basename_no_ext>.json` nu există după ce agentul revine → eroare; raportezi și muți screenshot-ul la `screenshots/needs_review/`.
- Citește JSON-ul. Dacă `confidence == "low"` SAU `ambiguities` non-empty cu `image_unreadable` → muți screenshot-ul la `screenshots/needs_review/`, raportezi, nu apelezi append.
4. **Append la CSV**:
```bash
python -c "from pathlib import Path; from scripts.append_row import append_extraction; import json; r = append_extraction(Path('data/extractions/<basename_no_ext>.json'), source='<source>'); print(json.dumps(r, default=str))"
```
`<source>` = `vision_calibration` dacă `--calibration`, altfel `vision`.
Parsezi răspunsul. Dacă `status == "rejected"`:
- `reason` conține "duplicate" → screenshot deja procesat cu acest source; raportezi și NU îl muți.
- `reason` conține "validation error" → JSON-ul agentului a fost respins; muți screenshot la `screenshots/needs_review/` și raportezi.
- Alte erori → raportezi și lași screenshot-ul unde e.
5. **Mută screenshot-ul** la `screenshots/processed/<basename>` dacă append-ul a reușit și fișierul originar a fost în `inbox/`. Dacă era deja în `processed/`, nu-l muta.
6. **Regenerează MD**:
```bash
python scripts/regenerate_md.py
```
7. **Raport final** (în română):
```
/backtest <basename> → trade #<id> adăugat (source=<source>, set=<set>, pl_marius=<pl>, confidence=<conf>).
Regenerat data/jurnal.md (<total> rânduri).
```
Dacă screenshot-ul a fost mutat la `needs_review`:
```
/backtest <basename> → NEEDS REVIEW: <motiv>. Mutat la screenshots/needs_review/<basename>.
```
## Reguli
- O singură invocare per screenshot. Nu reapelezi agentul dacă output-ul e dubios — îl muți la `needs_review` și raportezi.
- NU edita CSV direct.
- NU regenera MD dacă append-ul a fost respins.
- Path discipline: subagentul scrie doar la `data/extractions/`; tu (slash command) muți screenshot-uri și apelezi scripts/.

95
.claude/commands/batch.md Normal file
View File

@@ -0,0 +1,95 @@
---
description: Run vision extraction in parallel on multiple screenshots (default screenshots/inbox/), then serial-append the results with partial-failure handling.
argument-hint: "[dir_or_glob] [--limit N] [--calibration]"
---
# /batch — parallel vision extraction over multiple screenshots
Procesează screenshot-uri multiple. Lansează până la **5 subagenți `m2d-extractor` în paralel** (cap rigid — protejează context window și rate limits). După ce toți revin, append-ezi rezultatele **serial** (`append_row` citește/scrie CSV — paralelism la write = corupție garantată).
## Arguments
- `$1` (opțional) — director sau glob. Default `screenshots/inbox/`. Exemplu: `screenshots/inbox/2025-09-*.png`.
- `--limit N` (opțional) — procesează doar primele N screenshot-uri (în ordine alfabetică). Default: toate.
- `--calibration` (flag) — `source=vision_calibration` în loc de `vision`.
## Workflow
### Fază 1 — Colectează lista
1. Enumeră fișierele PNG/JPG match-uind argumentul. Sortează alfabetic. Aplică `--limit` dacă există.
2. Dacă lista e goală → raportezi "Nimic de procesat în <path>" și te oprești.
3. Dacă lista are 1 element → sugerează `/backtest` în loc și continuă cu batch.
### Fază 2 — Extracție paralelă (max 5 concurent)
Procesezi în **batch-uri de 5**. Pentru fiecare batch:
- Lansezi câte un Task tool call cu `subagent_type: "m2d-extractor"` pentru fiecare screenshot, ÎN ACELAȘI MESAJ (tool calls paralele). Prompt-ul per agent:
```
screenshot_path: <absolute_path>
screenshot_file: <basename>
```
- Aștepți să se întoarcă toți cinci. Pentru fiecare, verifici că `data/extractions/<basename_no_ext>.json` a fost scris.
- Treci la următorul batch de 5.
**De ce 5**: peste 5 sub-agenți paraleli începi să saturezi context window-ul orchestratorului cu output-urile lor și rate limits-urile API-ului. Cap rigid.
### Fază 3 — Append serial cu partial-failure
Pentru fiecare screenshot din lista originală, **în ordine**:
1. Verifică `data/extractions/<basename_no_ext>.json`:
- Lipsă → log "missing JSON, agent abort", mută screenshot-ul la `screenshots/needs_review/`, continuă cu următorul.
- Citește JSON. Dacă `confidence == "low"` SAU `"image_unreadable" in ambiguities` → mută la `needs_review/`, continuă.
2. Apelează append:
```bash
python -c "from pathlib import Path; from scripts.append_row import append_extraction; import json; r = append_extraction(Path('data/extractions/<basename_no_ext>.json'), source='<source>'); print(json.dumps(r, default=str))"
```
`<source>` = `vision_calibration` dacă `--calibration`, altfel `vision`.
3. Reacționezi la rezultat:
- `status == "ok"` → ține minte ID-ul, mută screenshot la `screenshots/processed/<basename>` dacă era în inbox.
- `status == "rejected"`, `reason` conține "duplicate" → ține minte ca skip; NU muta screenshot-ul (deja procesat).
- `status == "rejected"`, alt reason → log motivul, mută la `needs_review/`.
4. NU oprești batch-ul la primul fail. Continuă până la capăt.
### Fază 4 — Regenerează MD o singură dată
După ce toate append-urile s-au terminat (chiar și parțial), rulezi UNA SINGURĂ DATĂ:
```bash
python scripts/regenerate_md.py
```
(Regenerarea după fiecare append e wasteful; CSV-ul e sursa de adevăr, MD-ul e mirror.)
### Fază 5 — Raport final
Format:
```
/batch terminat. Procesat <total> screenshot-uri.
OK: <n_ok> (trade-uri #<id1>, #<id2>, ...)
Duplicate: <n_dup> (skipped — deja în CSV)
Needs review: <n_nr> (mutate la screenshots/needs_review/)
- <basename1>: <motiv>
- <basename2>: <motiv>
Erori: <n_err>
- <basename>: <reason>
Regenerat data/jurnal.md (<total_rows> rânduri).
```
## Reguli
- **Cap concurrency la 5**. Niciodată mai mulți subagenți paraleli — chiar și pentru un batch mare. Procesezi în secvențe de batch-uri de 5.
- **Append serial obligatoriu**. `append_extraction` citește CSV-ul, computează `next_id` și scrie atomic; rulat în paralel ar duce la ID-uri duplicat sau pierderi.
- **Partial failure = continuă**. Un screenshot prost nu blochează restul batch-ului.
- **MD regen o singură dată** la final.
- **Path discipline pentru subagent neschimbată**: agentul scrie doar la `data/extractions/`. Tu, ca orchestrator, muți screenshot-uri.

View File

@@ -0,0 +1,99 @@
---
description: Adaugă manual un rând în jurnal.csv (source=manual sau manual_calibration). Pentru calibrare P4 sau forward paper.
argument-hint: "[--calibration] <screenshot_path>"
---
# /m2d-log — manual M2D trade entry
Marius extrage manual TOATE câmpurile trade-ului. Folosit pentru calibration P4 (împreună cu `/backtest --calibration` pe același screenshot) sau ca log direct fără vision.
## Workflow
1. **Parse `$ARGUMENTS`** — detectează flag `--calibration` și `<screenshot_path>`. Dacă `<screenshot_path>` lipsește, întreabă user-ul. Calculează `basename = basename(<screenshot_path>)` și `basename_no_ext = basename` minus ultima extensie.
2. **Promptează user-ul în română**, pe rând, pentru fiecare câmp din schema `M2DExtraction` (vezi `scripts/vision_schema.py`). Ordinea + opțiuni valide:
- `data``YYYY-MM-DD`
- `ora_utc``HH:MM` (conversie din RO local: EEST=UTC+3 vară, EET=UTC+2 iarnă; întreabă user-ul direct dacă nu e clar)
- `instrument``DIA` / `US30` / `other`
- `directie``Buy` / `Sell`
- `tf_mare``5min` / `15min`
- `tf_mic``1min` / `3min`
- `calitate``Clară` / `Mai mare ca impuls` / `Slabă` / `n/a`
- `entry`, `sl`, `tp0`, `tp1`, `tp2` — float-uri
- `risc_pct` — float (ex: `0.12` pentru 0.12%)
- `outcome_path``SL` / `TP0→SL` / `TP0→TP1` / `TP0→TP2` / `TP0→pending` / `pending` (UNICODE `→`)
- `max_reached``SL_first` / `TP0` / `TP1` / `TP2`
- `be_moved``true` / `false`
- `confidence` — default `high` (manual e by definition high)
- `note` — string opțional, default `""`
`screenshot_file` se setează automat la `basename`; `ambiguities` se setează automat la `[]`. Dacă user-ul dă valoare invalidă, repetă întrebarea.
3. **Construiește JSON-ul** complet, valid contra `M2DExtraction`.
4. **Scrie JSON-ul** la `data/extractions/<basename_no_ext>.manual.json` — pretty-print indent 2, UTF-8, newline final. Sufixul `.manual` previne coliziunea cu output-ul vision (`<basename_no_ext>.json`).
5. **Determină source**: `manual_calibration` dacă `--calibration` e prezent, altfel `manual`.
6. **Append la CSV**:
```bash
python -c "from pathlib import Path; from scripts.append_row import append_extraction; import json; r = append_extraction(Path('data/extractions/<basename_no_ext>.manual.json'), source='<source>'); print(json.dumps(r, default=str))"
```
Parsezi răspunsul JSON.
7. **Dacă `status == "ok"`**:
```bash
python -m scripts.regenerate_md
```
Apoi afișezi:
```
✅ Trade adăugat la jurnal. ID: <id>. Set: <set>. P/L Marius: <pl_marius>. outcome_path: <outcome_path>.
```
8. **Dacă `status == "rejected"`**:
```
❌ Trade respins: <reason>
```
NU regenera MD. Dacă `reason` conține "duplicate":
- pentru `--calibration`: spui user-ului că există deja rând `manual_calibration` pentru acest screenshot; nu poți avea două leg-uri manual de calibrare pe același screenshot.
- pentru `source=manual` simplu: user-ul decide dacă suprascrie (atunci șterge manual rândul din `data/jurnal.csv` și re-rulează).
## Reguli
- NU edita CSV direct — folosește `append_extraction`.
- NU regenera MD dacă append-ul a fost respins.
## Output skeleton JSON
```json
{
"screenshot_file": "2026-05-13-dia-1645.png",
"data": "2026-05-13",
"ora_utc": "14:45",
"instrument": "DIA",
"directie": "Buy",
"tf_mare": "5min",
"tf_mic": "1min",
"calitate": "Clară",
"entry": 497.42,
"sl": 496.80,
"tp0": 497.67,
"tp1": 497.79,
"tp2": 498.04,
"risc_pct": 0.12,
"outcome_path": "TP0→TP1",
"max_reached": "TP1",
"be_moved": true,
"confidence": "high",
"ambiguities": [],
"note": ""
}
```

42
.claude/commands/stats.md Normal file
View File

@@ -0,0 +1,42 @@
---
description: Show backtest statistics for data/jurnal.csv (overall, per-Set, per-calitate, per-instrument with Wilson + bootstrap CIs). --calibration shows P4 mismatch report.
argument-hint: "[--calibration] [--seed N]"
---
# /stats — backtest statistics
Rulează `scripts/stats.py` și afișează raportul.
## Arguments
- `--calibration` (flag) — afișează raportul P4 (mismatch field-by-field pe perechi `manual_calibration``vision_calibration` join-uite pe `screenshot_file`).
- `--seed N` (opțional) — seed pentru bootstrap RNG (default fără seed → output ne-determinist între run-uri). Folosește când vrei reproducibilitate.
Default (fără flag-uri): backtest stats — overall + per-Set + per-calitate + per-instrument WR, expectancy, Wilson 95% CI pe WR, bootstrap 95% CI pe expectancy.
## Workflow
1. Construiește comanda:
```bash
python scripts/stats.py [--calibration] [--seed N]
```
`--csv data/jurnal.csv` e default-ul scriptului — nu îl pasezi.
2. Rulează prin Bash tool. Output-ul vine pe stdout în UTF-8.
3. Afișează output-ul **as-is** către user. Nu reformata, nu re-rezuma, nu interpreta. Scriptul are deja format ales (tabel + secțiuni text).
4. **Interpretare** scurtă (max 3 propoziții) DACĂ user-ul cere explicit sau dacă observi ceva ce merită menționat:
- În modul backtest: Set-uri cu N ≥ 40 și Wilson lower bound > 50% → candidat pentru GO LIVE (vezi `STOPPING_RULE.md`).
- În modul `--calibration`: dacă există ≥10 perechi și mismatch rate > 10% pe câmpuri core (`entry/sl/tp0/1/2/outcome_path/max_reached/directie`) → P4 FAIL, vision agent are nevoie de fix (`.claude/agents/m2d-extractor.md`).
5. NU edita CSV. NU regenera MD (citire pură).
## Reguli
- Read-only. Această comandă nu scrie nimic.
- Output-ul scriptului e ground truth — nu inventezi numere.
- `calitate` e descriptor biased (post-outcome) — vezi `STOPPING_RULE.md` §3 — raportul îl afișează informational only. NU sugerezi user-ului să folosească `calitate` ca filtru pentru GO LIVE.
- Pentru calibration P4: minimum 10 perechi pentru ca verdictul să aibă sens. Sub 10 perechi → raportezi "insuficient pentru P4 — continuă să acumulezi calibrare".

View File

@@ -1,21 +1,20 @@
"""Backtest statistics for ``data/jurnal.csv``. """Backtest statistics for ``data/jurnal.csv``.
Outputs: Public API:
- Overall + per-Set + per-calitate + per-instrument WR, expectancy. - ``compute_stats(csv_path, overlay) -> dict``
- Wilson 95% CI for WR (closed form). - ``render_stats(stats, overlay) -> str``
- Bootstrap percentile 95% CI for expectancy (deterministic via ``seed``). - ``compute_calibration(csv_path) -> dict``
- ``--calibration`` mode: joins ``manual_calibration`` rows with their - ``render_calibration(cal) -> str``
``vision_calibration`` counterparts on ``screenshot_file`` and reports - ``main()`` — CLI entry point.
field-by-field mismatch rates for the P4 gate (see ``STOPPING_RULE.md``).
A "win" is any trade with ``pl_marius > 0``. Pending trades A "win" is a closed trade with ``pl_overlay > 0`` (where ``pl_overlay`` is
(``pl_marius`` blank, i.e. ``outcome_path in {pending, TP0->pending}``) are either ``pl_marius`` or ``pl_theoretical``). Pending trades — ``pl_marius``
excluded from both WR and expectancy: there is no realised outcome yet. blank, i.e. ``outcome_path in {pending, TP0->pending}`` — are excluded from
both WR and expectancy: there is no realised outcome yet.
The ``calitate`` field is a known-biased descriptor (post-outcome The ``calitate`` field is a known-biased descriptor: it is classified
classification — see ``STOPPING_RULE.md`` §3). It is reported as post-outcome (see ``STOPPING_RULE.md`` §3). The per-``calitate`` split is
informational only and explicitly flagged as such; do NOT use it as a reported with an explicit *descriptor only — biased post-outcome* caveat.
filter for GO LIVE decisions.
""" """
from __future__ import annotations from __future__ import annotations
@@ -23,32 +22,42 @@ from __future__ import annotations
import argparse import argparse
import csv import csv
import math import math
import random
import sys import sys
from dataclasses import dataclass, field
from pathlib import Path from pathlib import Path
from typing import Iterable from typing import Any, Iterable
import numpy as np
from scripts.append_row import CSV_COLUMNS
__all__ = [ __all__ = [
"CORE_CALIBRATION_FIELDS",
"BACKTEST_SOURCES", "BACKTEST_SOURCES",
"CALIBRATION_SOURCES", "CALIBRATION_SOURCES",
"Trade", "CORE_CALIBRATION_FIELDS",
"GroupStats", "NUMERIC_CALIBRATION_FIELDS",
"load_trades", "STOPPING_RULE_N",
"wilson_ci", "wilson_ci",
"bootstrap_ci", "bootstrap_expectancy_ci",
"win_rate", "compute_stats",
"expectancy", "render_stats",
"group_by", "compute_calibration",
"compute_group_stats", "render_calibration",
"calibration_mismatch",
"format_report",
"main", "main",
] ]
# Fields compared in the calibration mismatch gate (STOPPING_RULE.md §P4). # ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
BACKTEST_SOURCES: frozenset[str] = frozenset({"vision", "manual"})
CALIBRATION_SOURCES: frozenset[str] = frozenset(
{"manual_calibration", "vision_calibration"}
)
# Calibration P4 gate (STOPPING_RULE.md §P4) — explicitly reported per field.
CORE_CALIBRATION_FIELDS: tuple[str, ...] = ( CORE_CALIBRATION_FIELDS: tuple[str, ...] = (
"entry", "entry",
"sl", "sl",
@@ -58,315 +67,205 @@ CORE_CALIBRATION_FIELDS: tuple[str, ...] = (
"outcome_path", "outcome_path",
"max_reached", "max_reached",
"directie", "directie",
"instrument",
) )
BACKTEST_SOURCES: frozenset[str] = frozenset({"vision", "manual"}) NUMERIC_CALIBRATION_FIELDS: frozenset[str] = frozenset(
CALIBRATION_SOURCES: frozenset[str] = frozenset( {"entry", "sl", "tp0", "tp1", "tp2"}
{"manual_calibration", "vision_calibration"}
) )
# STOPPING_RULE.md §"GO LIVE" gate: N >= 40 per Set.
STOPPING_RULE_N: int = 40
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Loading / typed access # Loading
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@dataclass(frozen=True)
class Trade:
"""One realised (or pending) trade row, typed."""
id: int
screenshot_file: str
source: str
data: str
zi: str
ora_ro: str
instrument: str
directie: str
calitate: str
set: str
outcome_path: str
max_reached: str
be_moved: bool
pl_marius: float | None
pl_theoretical: float
raw: dict[str, str] = field(default_factory=dict)
@property
def is_pending(self) -> bool:
return self.pl_marius is None
@property
def is_win(self) -> bool:
return self.pl_marius is not None and self.pl_marius > 0
def _parse_optional_float(value: str) -> float | None: def _parse_optional_float(value: str) -> float | None:
s = (value or "").strip() s = (value or "").strip()
if s == "": if s == "":
return None return None
return float(s) try:
return float(s)
except ValueError:
return None
def _parse_bool(value: str) -> bool: def _load_rows(csv_path: Path | str) -> list[dict[str, str]]:
return (value or "").strip().lower() in {"true", "1", "yes", "da"}
def _row_to_trade(row: dict[str, str]) -> Trade:
return Trade(
id=int(row.get("id") or 0),
screenshot_file=row.get("screenshot_file", ""),
source=row.get("source", ""),
data=row.get("data", ""),
zi=row.get("zi", ""),
ora_ro=row.get("ora_ro", ""),
instrument=row.get("instrument", ""),
directie=row.get("directie", ""),
calitate=row.get("calitate", ""),
set=row.get("set", ""),
outcome_path=row.get("outcome_path", ""),
max_reached=row.get("max_reached", ""),
be_moved=_parse_bool(row.get("be_moved", "")),
pl_marius=_parse_optional_float(row.get("pl_marius", "")),
pl_theoretical=float(row.get("pl_theoretical") or 0.0),
raw=dict(row),
)
def load_trades(csv_path: Path | str) -> list[Trade]:
"""Load all rows of ``csv_path`` as :class:`Trade` objects.
Returns ``[]`` if the file does not exist or is empty.
"""
p = Path(csv_path) p = Path(csv_path)
if not p.exists() or p.stat().st_size == 0: if not p.exists() or p.stat().st_size == 0:
return [] return []
with p.open("r", encoding="utf-8", newline="") as fh: with p.open("r", encoding="utf-8", newline="") as fh:
reader = csv.DictReader(fh) return list(csv.DictReader(fh))
return [_row_to_trade(r) for r in reader]
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Statistics primitives # CI primitives
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def wilson_ci(wins: int, n: int, z: float = 1.96) -> tuple[float, float]: def wilson_ci(wins: int, n: int, z: float = 1.96) -> tuple[float, float]:
"""Wilson score interval for a binomial proportion. """Wilson score interval for a binomial proportion.
Returns ``(lo, hi)`` as proportions in [0, 1]. For ``n == 0`` returns Returns ``(lo, hi)`` clamped to ``[0.0, 1.0]``. For ``n == 0`` returns
``(0.0, 0.0)``. ``z = 1.96`` corresponds to a 95% CI. ``(0.0, 0.0)``. ``z = 1.96`` ≈ 95% confidence.
""" """
if n <= 0: if n <= 0:
return (0.0, 0.0) return (0.0, 0.0)
if wins < 0 or wins > n: if wins < 0 or wins > n:
raise ValueError(f"wins={wins} out of range for n={n}") raise ValueError(f"wins={wins} out of range for n={n}")
p_hat = wins / n p = wins / n
denom = 1.0 + (z * z) / n denom = 1.0 + (z * z) / n
center = p_hat + (z * z) / (2.0 * n) center = (p + (z * z) / (2.0 * n)) / denom
half = z * math.sqrt((p_hat * (1.0 - p_hat) + (z * z) / (4.0 * n)) / n) spread = z * math.sqrt(p * (1.0 - p) / n + (z * z) / (4.0 * n * n)) / denom
lo = (center - half) / denom return (max(0.0, center - spread), min(1.0, center + spread))
hi = (center + half) / denom
return (max(0.0, lo), min(1.0, hi))
def bootstrap_ci( def bootstrap_expectancy_ci(
values: list[float], values: list[float] | np.ndarray,
*, n_resamples: int = 5000,
iterations: int = 2000, seed: int = 42,
alpha: float = 0.05,
seed: int | None = None,
) -> tuple[float, float]: ) -> tuple[float, float]:
"""Percentile-method bootstrap CI for the mean of ``values``. """Percentile-method bootstrap 95% CI for the mean of ``values``.
Deterministic when ``seed`` is provided. Returns ``(lo, hi)``. For Deterministic for a given ``seed``. Empty input → ``(0.0, 0.0)``.
``len(values) < 2`` returns ``(mean, mean)``. Single value → ``(value, value)`` (no variance to resample).
""" """
if not values: arr = np.asarray(list(values), dtype=float)
if arr.size == 0:
return (0.0, 0.0) return (0.0, 0.0)
n = len(values) if arr.size == 1:
mean = sum(values) / n v = float(arr[0])
if n < 2 or iterations <= 0: return (v, v)
return (mean, mean) rng = np.random.default_rng(seed)
boots = np.empty(n_resamples, dtype=float)
rng = random.Random(seed) n = arr.size
means: list[float] = [] for i in range(n_resamples):
for _ in range(iterations): idx = rng.integers(0, n, size=n)
s = 0.0 boots[i] = float(arr[idx].mean())
for _ in range(n): lo = float(np.percentile(boots, 2.5))
s += values[rng.randrange(n)] hi = float(np.percentile(boots, 97.5))
means.append(s / n) return (lo, hi)
means.sort()
lo_idx = int(math.floor((alpha / 2.0) * iterations))
hi_idx = int(math.ceil((1.0 - alpha / 2.0) * iterations)) - 1
lo_idx = max(0, min(iterations - 1, lo_idx))
hi_idx = max(0, min(iterations - 1, hi_idx))
return (means[lo_idx], means[hi_idx])
def win_rate(trades: Iterable[Trade]) -> tuple[int, int, float]: # ---------------------------------------------------------------------------
"""Return ``(wins, n_resolved, wr)`` ignoring pending trades.""" # compute_stats
resolved = [t for t in trades if not t.is_pending] # ---------------------------------------------------------------------------
wins = sum(1 for t in resolved if t.is_win)
n = len(resolved)
def _group_stats(
overlay_values: list[float | None],
*,
include_ci: bool,
bootstrap_seed: int,
) -> dict[str, Any]:
closed = [v for v in overlay_values if v is not None]
n = len(closed)
wins = sum(1 for v in closed if v > 0)
wr = (wins / n) if n else 0.0 wr = (wins / n) if n else 0.0
return wins, n, wr out: dict[str, Any] = {
"n": n,
"wr": wr,
def expectancy(trades: Iterable[Trade], overlay: str = "pl_marius") -> float: "expectancy": (sum(closed) / n) if n else 0.0,
"""Mean P/L (in R) over non-pending trades, on the given overlay.""" }
if overlay not in {"pl_marius", "pl_theoretical"}: if include_ci:
raise ValueError(f"unknown overlay {overlay!r}") out["wr_ci_95"] = wilson_ci(wins, n)
if overlay == "pl_marius": out["expectancy_ci_95"] = bootstrap_expectancy_ci(
vals = [t.pl_marius for t in trades if t.pl_marius is not None] closed, seed=bootstrap_seed
else: )
vals = [t.pl_theoretical for t in trades if not t.is_pending]
if not vals:
return 0.0
return sum(vals) / len(vals)
# ---------------------------------------------------------------------------
# Group stats
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class GroupStats:
key: str
n_total: int
n_resolved: int
wins: int
wr: float
wr_ci_lo: float
wr_ci_hi: float
exp_marius: float
exp_marius_ci_lo: float
exp_marius_ci_hi: float
exp_theoretical: float
exp_theoretical_ci_lo: float
exp_theoretical_ci_hi: float
def group_by(trades: Iterable[Trade], field_name: str) -> dict[str, list[Trade]]:
out: dict[str, list[Trade]] = {}
for t in trades:
key = getattr(t, field_name, "") or "(blank)"
out.setdefault(key, []).append(t)
return out return out
def compute_group_stats( def _overlay_value(row: dict[str, str], overlay: str) -> float | None:
trades: list[Trade], raw = row.get(overlay, "")
*, return _parse_optional_float(raw)
label: str,
bootstrap_iterations: int = 2000,
seed: int | None = None,
) -> GroupStats:
wins, n_resolved, wr = win_rate(trades)
wr_lo, wr_hi = wilson_ci(wins, n_resolved)
pl_m_vals = [t.pl_marius for t in trades if t.pl_marius is not None]
exp_m = (sum(pl_m_vals) / len(pl_m_vals)) if pl_m_vals else 0.0
exp_m_lo, exp_m_hi = bootstrap_ci(
pl_m_vals, iterations=bootstrap_iterations, seed=seed
)
pl_t_vals = [t.pl_theoretical for t in trades if not t.is_pending]
exp_t = (sum(pl_t_vals) / len(pl_t_vals)) if pl_t_vals else 0.0
exp_t_lo, exp_t_hi = bootstrap_ci(
pl_t_vals,
iterations=bootstrap_iterations,
seed=None if seed is None else seed + 1,
)
return GroupStats(
key=label,
n_total=len(trades),
n_resolved=n_resolved,
wins=wins,
wr=wr,
wr_ci_lo=wr_lo,
wr_ci_hi=wr_hi,
exp_marius=exp_m,
exp_marius_ci_lo=exp_m_lo,
exp_marius_ci_hi=exp_m_hi,
exp_theoretical=exp_t,
exp_theoretical_ci_lo=exp_t_lo,
exp_theoretical_ci_hi=exp_t_hi,
)
# --------------------------------------------------------------------------- def compute_stats(
# Calibration mode csv_path: Path | str = "data/jurnal.csv",
# --------------------------------------------------------------------------- overlay: str = "pl_marius",
) -> dict[str, Any]:
"""Compute aggregate WR + expectancy stats over the backtest rows.
Calibration rows (``manual_calibration`` / ``vision_calibration``) are
excluded; use :func:`compute_calibration` for the P4 mismatch report.
@dataclass(frozen=True) ``overlay`` selects the P/L column: ``"pl_marius"`` (default — the real
class CalibrationReport: overlay Marius trades) or ``"pl_theoretical"`` (1/3-1/3-1/3 hold-to-TP2).
pairs: int
field_mismatches: dict[str, int]
total_comparisons: int
@property
def overall_mismatch_rate(self) -> float:
if self.total_comparisons == 0:
return 0.0
total = sum(self.field_mismatches.values())
return total / self.total_comparisons
def _normalise_for_compare(field_name: str, value: str) -> str:
s = (value or "").strip()
if field_name in {"entry", "sl", "tp0", "tp1", "tp2"}:
try:
return f"{float(s):.4f}"
except ValueError:
return s
return s
def calibration_mismatch(
trades: Iterable[Trade],
*,
fields: tuple[str, ...] = CORE_CALIBRATION_FIELDS,
) -> CalibrationReport:
"""Pair ``manual_calibration`` and ``vision_calibration`` rows by
``screenshot_file``, then count mismatches per ``fields``.
Returns a :class:`CalibrationReport`. Unpaired calibration rows are
silently ignored — they cannot contribute to a comparison.
""" """
manual: dict[str, Trade] = {} if overlay not in {"pl_marius", "pl_theoretical"}:
vision: dict[str, Trade] = {} raise ValueError(f"unknown overlay {overlay!r}")
for t in trades:
if t.source == "manual_calibration":
manual[t.screenshot_file] = t
elif t.source == "vision_calibration":
vision[t.screenshot_file] = t
paired_files = sorted(set(manual) & set(vision)) rows = [r for r in _load_rows(csv_path) if r.get("source", "") in BACKTEST_SOURCES]
field_mismatches: dict[str, int] = {f: 0 for f in fields}
for f in paired_files:
m = manual[f]
v = vision[f]
for fld in fields:
mv = _normalise_for_compare(fld, m.raw.get(fld, ""))
vv = _normalise_for_compare(fld, v.raw.get(fld, ""))
if mv != vv:
field_mismatches[fld] += 1
total_comparisons = len(paired_files) * len(fields) if not rows:
return CalibrationReport( return {
pairs=len(paired_files), "n_total": 0,
field_mismatches=field_mismatches, "n_pending": 0,
total_comparisons=total_comparisons, "n_closed": 0,
"wr": 0.0,
"wr_ci_95": (0.0, 0.0),
"expectancy": 0.0,
"expectancy_ci_95": (0.0, 0.0),
"per_set": {},
"per_calitate": {},
"per_directie": {},
}
# Pending status is overlay-independent: a trade is pending iff
# pl_marius is blank (outcome_path in {pending, TP0->pending}).
# pl_theoretical is concrete even for pending rows, so it would otherwise
# let pending trades sneak into the closed-trades stats — we mask those
# out explicitly here.
pending_mask = [_parse_optional_float(r.get("pl_marius", "")) is None for r in rows]
overlay_vals: list[float | None] = []
for r, is_pending in zip(rows, pending_mask):
overlay_vals.append(None if is_pending else _overlay_value(r, overlay))
n_total = len(rows)
n_pending = sum(1 for p in pending_mask if p)
n_closed = n_total - n_pending
overall = _group_stats(
overlay_vals, include_ci=True, bootstrap_seed=42
) )
def _split(field: str, include_ci: bool) -> dict[str, dict[str, Any]]:
groups: dict[str, list[float | None]] = {}
for r, v in zip(rows, overlay_vals):
key = r.get(field, "") or "(blank)"
groups.setdefault(key, []).append(v)
out: dict[str, dict[str, Any]] = {}
for k in sorted(groups):
sub_seed = 42 + (abs(hash(("split", field, k))) % 1_000_000)
out[k] = _group_stats(
groups[k], include_ci=include_ci, bootstrap_seed=sub_seed
)
return out
return {
"n_total": n_total,
"n_pending": n_pending,
"n_closed": n_closed,
"wr": overall["wr"],
"wr_ci_95": overall["wr_ci_95"],
"expectancy": overall["expectancy"],
"expectancy_ci_95": overall["expectancy_ci_95"],
"per_set": _split("set", include_ci=True),
"per_calitate": _split("calitate", include_ci=True),
# per_directie skips CI per spec (no wr_ci_95 / expectancy_ci_95 keys).
"per_directie": {
k: {"n": v["n"], "wr": v["wr"], "expectancy": v["expectancy"]}
for k, v in _split("directie", include_ci=False).items()
},
}
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Reporting # render_stats
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -375,110 +274,228 @@ def _fmt_pct(p: float) -> str:
def _fmt_r(x: float) -> str: def _fmt_r(x: float) -> str:
return f"{x:+.3f}R" return f"{x:+.2f} R"
def _fmt_stats_row(s: GroupStats) -> str: def _set_sort_key(name: str) -> tuple[int, str]:
return ( order = ["A1", "A2", "A3", "B", "C", "D", "Other"]
f"{s.key:<14} N={s.n_total:>3} (resolved {s.n_resolved:>3}) " return (order.index(name), name) if name in order else (len(order), name)
f"WR={_fmt_pct(s.wr)} [{_fmt_pct(s.wr_ci_lo)}, {_fmt_pct(s.wr_ci_hi)}] "
f"E_marius={_fmt_r(s.exp_marius)} "
f"[{_fmt_r(s.exp_marius_ci_lo)}, {_fmt_r(s.exp_marius_ci_hi)}] "
f"E_theor={_fmt_r(s.exp_theoretical)}"
)
def format_report( def render_stats(stats: dict[str, Any], overlay: str) -> str:
trades: list[Trade],
*,
bootstrap_iterations: int = 2000,
seed: int | None = None,
) -> str:
"""Render the main stats report.
Only ``source in {vision, manual}`` rows are included in the WR /
expectancy computations; calibration rows are reported separately via
``--calibration``.
"""
backtest = [t for t in trades if t.source in BACKTEST_SOURCES]
lines: list[str] = [] lines: list[str] = []
lines.append("=== M2D Backtest Stats ===") lines.append(f"=== Stats jurnal.csv (overlay: {overlay}) ===")
lines.append(f"Backtest rows: {len(backtest)} (calibration excluded)")
lines.append("")
if not backtest:
lines.append("(no backtest trades yet)")
return "\n".join(lines)
overall = compute_group_stats(
backtest,
label="OVERALL",
bootstrap_iterations=bootstrap_iterations,
seed=seed,
)
lines.append("-- Overall --")
lines.append(_fmt_stats_row(overall))
lines.append("")
def _emit_group(title: str, field_name: str, key_order: list[str] | None = None) -> None:
lines.append(f"-- By {title} --")
groups = group_by(backtest, field_name)
keys = key_order if key_order is not None else sorted(groups)
for k in keys:
if k not in groups:
continue
sub_seed = None if seed is None else seed + abs(hash(k)) % 10_000
s = compute_group_stats(
groups[k],
label=k,
bootstrap_iterations=bootstrap_iterations,
seed=sub_seed,
)
lines.append(_fmt_stats_row(s))
lines.append("")
_emit_group(
"Set",
"set",
key_order=["A1", "A2", "A3", "B", "C", "D", "Other"],
)
_emit_group("Instrument", "instrument")
lines.append( lines.append(
"[!] By calitate — descriptor only (post-outcome, biased; do not use " f"Trade-uri totale: {stats['n_total']} | "
"as a GO LIVE filter — see STOPPING_RULE.md §3)." f"închise: {stats['n_closed']} | pending: {stats['n_pending']}"
)
_emit_group(
"calitate",
"calitate",
key_order=["Clară", "Mai mare ca impuls", "Slabă", "n/a"],
) )
return "\n".join(lines).rstrip() + "\n" if stats["n_total"] == 0:
lines.append("")
lines.append("(nu sunt trade-uri backtest în CSV)")
def format_calibration_report(trades: list[Trade]) -> str:
cal = calibration_mismatch(trades)
lines: list[str] = []
lines.append("=== Calibration P4 gate ===")
lines.append(f"Paired screenshots (manual ∩ vision): {cal.pairs}")
if cal.pairs == 0:
lines.append("(no calibration pairs yet)")
return "\n".join(lines) + "\n" return "\n".join(lines) + "\n"
lines.append("") lines.append("")
lines.append(f"{'field':<14} mismatches / pairs rate") lo, hi = stats["wr_ci_95"]
for fld in CORE_CALIBRATION_FIELDS: e_lo, e_hi = stats["expectancy_ci_95"]
m = cal.field_mismatches.get(fld, 0) lines.append(f"GLOBAL (n={stats['n_closed']}):")
rate = (m / cal.pairs) if cal.pairs else 0.0
lines.append(f"{fld:<14} {m:>3} / {cal.pairs:<3} {_fmt_pct(rate)}")
lines.append("")
lines.append( lines.append(
f"Overall mismatch rate: {_fmt_pct(cal.overall_mismatch_rate)} " f" WR: {_fmt_pct(stats['wr'])} "
f"({sum(cal.field_mismatches.values())} of {cal.total_comparisons} comparisons)" f"[95% CI: {_fmt_pct(lo)}, {_fmt_pct(hi)}]"
) )
threshold = 0.10 lines.append(
verdict = "PASS" if cal.overall_mismatch_rate <= threshold else "FAIL" f" Expectancy: {_fmt_r(stats['expectancy'])} "
lines.append(f"P4 gate (<= 10%): {verdict}") f"[95% CI: {_fmt_r(e_lo)}, {_fmt_r(e_hi)}]"
)
lines.append("")
def _emit_split(
title: str,
data: dict[str, dict[str, Any]],
*,
sort_keys: list[str] | None = None,
include_ci: bool = True,
) -> None:
lines.append(title)
keys = sort_keys if sort_keys is not None else sorted(data)
for k in keys:
if k not in data:
continue
d = data[k]
if include_ci and "wr_ci_95" in d:
clo, chi = d["wr_ci_95"]
lines.append(
f" {k:<14} n={d['n']:>3} "
f"WR {_fmt_pct(d['wr'])} "
f"[{_fmt_pct(clo)}, {_fmt_pct(chi)}] "
f"E {_fmt_r(d['expectancy'])}"
)
else:
lines.append(
f" {k:<14} n={d['n']:>3} "
f"WR {_fmt_pct(d['wr'])} "
f"E {_fmt_r(d['expectancy'])}"
)
lines.append("")
_emit_split(
"PER SET:",
stats["per_set"],
sort_keys=sorted(stats["per_set"], key=_set_sort_key),
)
lines.append(
"PER CALITATE (⚠️ DESCRIPTOR ONLY — biased post-outcome, NU folosi ca filtru):"
)
cal_order = ["Clară", "Mai mare ca impuls", "Slabă", "n/a"]
keys = [k for k in cal_order if k in stats["per_calitate"]] + [
k for k in sorted(stats["per_calitate"]) if k not in cal_order
]
for k in keys:
d = stats["per_calitate"][k]
clo, chi = d["wr_ci_95"]
lines.append(
f" {k:<20} n={d['n']:>3} "
f"WR {_fmt_pct(d['wr'])} "
f"[{_fmt_pct(clo)}, {_fmt_pct(chi)}] "
f"E {_fmt_r(d['expectancy'])}"
)
lines.append("")
_emit_split("PER DIRECȚIE:", stats["per_directie"], include_ci=False)
# STOPPING_RULE gate check — flag every Set that hasn't crossed N>=40.
lines.append(f"⚠️ STOPPING RULE check (vezi STOPPING_RULE.md, N>={STOPPING_RULE_N}):")
set_keys = sorted(stats["per_set"], key=_set_sort_key)
any_flagged = False
for k in set_keys:
n = stats["per_set"][k]["n"]
if n < STOPPING_RULE_N:
lines.append(f" {k}: N={n} < {STOPPING_RULE_N} → NEEDS MORE DATA")
any_flagged = True
if not any_flagged:
lines.append(f" toate Set-urile au N>={STOPPING_RULE_N} (eligibile pentru GO LIVE check).")
return "\n".join(lines) + "\n"
# ---------------------------------------------------------------------------
# compute_calibration
# ---------------------------------------------------------------------------
def _calibration_match(field: str, m_val: str, v_val: str, tol: float = 0.01) -> bool:
if field in NUMERIC_CALIBRATION_FIELDS:
try:
return abs(float(m_val) - float(v_val)) <= tol
except ValueError:
return (m_val or "").strip() == (v_val or "").strip()
return (m_val or "").strip() == (v_val or "").strip()
def compute_calibration(
csv_path: Path | str = "data/jurnal.csv",
) -> dict[str, Any]:
"""Pair calibration legs by ``screenshot_file`` and report per-field mismatch.
Returns a dict ``{"n_pairs": int, "fields": {field: {match, mismatch,
match_rate, mismatch_examples}}}``. ``mismatch_examples`` holds up to 3
strings ``"<screenshot_file>: manual=X vs vision=Y"`` per field.
Numeric fields (``entry/sl/tp0/tp1/tp2``) use a tolerance of 0.01;
everything else is exact-string equality after strip.
"""
rows = _load_rows(csv_path)
manual: dict[str, dict[str, str]] = {}
vision: dict[str, dict[str, str]] = {}
for r in rows:
src = r.get("source", "")
if src == "manual_calibration":
manual[r.get("screenshot_file", "")] = r
elif src == "vision_calibration":
vision[r.get("screenshot_file", "")] = r
paired_files = sorted(set(manual) & set(vision))
fields_report: dict[str, dict[str, Any]] = {
f: {
"match": 0,
"mismatch": 0,
"match_rate": 0.0,
"mismatch_examples": [],
}
for f in CORE_CALIBRATION_FIELDS
}
for f in paired_files:
m = manual[f]
v = vision[f]
for fld in CORE_CALIBRATION_FIELDS:
mv = m.get(fld, "")
vv = v.get(fld, "")
if _calibration_match(fld, mv, vv):
fields_report[fld]["match"] += 1
else:
fields_report[fld]["mismatch"] += 1
examples = fields_report[fld]["mismatch_examples"]
if len(examples) < 3:
examples.append(f"{f}: manual={mv!r} vs vision={vv!r}")
for fld, data in fields_report.items():
total = data["match"] + data["mismatch"]
data["match_rate"] = (data["match"] / total) if total else 0.0
return {"n_pairs": len(paired_files), "fields": fields_report}
def render_calibration(cal: dict[str, Any]) -> str:
lines: list[str] = []
lines.append("=== Calibration P4 gate (vezi STOPPING_RULE.md §P4) ===")
lines.append(f"Perechi calibration: {cal['n_pairs']}")
if cal["n_pairs"] == 0:
lines.append("(nu există perechi manual_calibration ∩ vision_calibration)")
return "\n".join(lines) + "\n"
lines.append("")
lines.append(f"{'field':<14} match mismatch rate")
total_mismatches = 0
total_comparisons = 0
for fld in CORE_CALIBRATION_FIELDS:
d = cal["fields"][fld]
n = d["match"] + d["mismatch"]
total_mismatches += d["mismatch"]
total_comparisons += n
lines.append(
f"{fld:<14} {d['match']:>5} {d['mismatch']:>8} "
f"{_fmt_pct(d['match_rate'])}"
)
lines.append("")
overall_match_rate = (
(total_comparisons - total_mismatches) / total_comparisons
if total_comparisons
else 0.0
)
overall_mismatch_rate = 1.0 - overall_match_rate
verdict = "PASS" if overall_mismatch_rate <= 0.10 else "FAIL"
lines.append(
f"Overall mismatch rate: {_fmt_pct(overall_mismatch_rate)} "
f"({total_mismatches}/{total_comparisons}) → P4 gate: {verdict}"
)
has_examples = any(
cal["fields"][f]["mismatch_examples"] for f in CORE_CALIBRATION_FIELDS
)
if has_examples:
lines.append("")
lines.append("Mismatch examples (max 3 per field):")
for fld in CORE_CALIBRATION_FIELDS:
ex = cal["fields"][fld]["mismatch_examples"]
if not ex:
continue
lines.append(f" [{fld}]")
for e in ex:
lines.append(f" - {e}")
return "\n".join(lines) + "\n" return "\n".join(lines) + "\n"
@@ -498,43 +515,37 @@ def main(argv: list[str] | None = None) -> int:
default=Path("data/jurnal.csv"), default=Path("data/jurnal.csv"),
help="Path to the jurnal CSV (default: data/jurnal.csv).", help="Path to the jurnal CSV (default: data/jurnal.csv).",
) )
parser.add_argument(
"--overlay",
choices=("pl_marius", "pl_theoretical"),
default="pl_marius",
help="Which P/L overlay to use (default: pl_marius).",
)
parser.add_argument( parser.add_argument(
"--calibration", "--calibration",
action="store_true", action="store_true",
help="Show P4 calibration mismatch report instead of backtest stats.", help="Show P4 calibration mismatch report instead of backtest stats.",
) )
parser.add_argument(
"--bootstrap-iterations",
type=int,
default=2000,
help="Bootstrap iterations for expectancy CI (default: 2000).",
)
parser.add_argument(
"--seed",
type=int,
default=None,
help="Seed for the bootstrap RNG (set for deterministic output).",
)
args = parser.parse_args(argv) args = parser.parse_args(argv)
trades = load_trades(args.csv)
if args.calibration:
out = format_calibration_report(trades)
else:
out = format_report(
trades,
bootstrap_iterations=args.bootstrap_iterations,
seed=args.seed,
)
# Force UTF-8 on stdout: the report contains diacritics ("Clară", "Slabă")
# and a console codepage like cp1252 would crash on those.
try: try:
sys.stdout.reconfigure(encoding="utf-8") # type: ignore[attr-defined] sys.stdout.reconfigure(encoding="utf-8") # type: ignore[attr-defined]
except (AttributeError, OSError): except (AttributeError, OSError):
pass pass
sys.stdout.write(out)
if args.calibration:
cal = compute_calibration(args.csv)
sys.stdout.write(render_calibration(cal))
else:
stats = compute_stats(args.csv, overlay=args.overlay)
sys.stdout.write(render_stats(stats, args.overlay))
return 0 return 0
if __name__ == "__main__": if __name__ == "__main__":
raise SystemExit(main()) raise SystemExit(main())
# Ensure the canonical CSV schema is importable from one place — fail fast if
# someone removes append_row.CSV_COLUMNS that this module depends on.
assert CSV_COLUMNS is not None

View File

@@ -1,4 +1,5 @@
"""Tests for scripts/stats.py.""" """CSV-fixture tests for scripts.stats — compute_stats, render_stats,
compute_calibration, render_calibration, main()."""
from __future__ import annotations from __future__ import annotations
@@ -12,24 +13,17 @@ sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from scripts.append_row import CSV_COLUMNS # noqa: E402 from scripts.append_row import CSV_COLUMNS # noqa: E402
from scripts.stats import ( # noqa: E402 from scripts.stats import ( # noqa: E402
BACKTEST_SOURCES,
CORE_CALIBRATION_FIELDS, CORE_CALIBRATION_FIELDS,
bootstrap_ci, compute_calibration,
calibration_mismatch, compute_stats,
compute_group_stats,
expectancy,
format_calibration_report,
format_report,
group_by,
load_trades,
main, main,
win_rate, render_calibration,
wilson_ci, render_stats,
) )
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Synthetic CSV fixture: 30 trades # Fixture row builder
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -78,55 +72,61 @@ def _write_csv(path: Path, rows: list[dict[str, str]]) -> None:
w.writerow({k: r.get(k, "") for k in CSV_COLUMNS}) w.writerow({k: r.get(k, "") for k in CSV_COLUMNS})
def _synthetic_30(tmp_path: Path) -> Path: # Outcome templates (P/L values) — match scripts.pl_calc tables.
"""30 vision-source trades engineered for known stats. _SL = {"outcome_path": "SL", "max_reached": "SL_first", "be_moved": "False",
"pl_marius": "-1.0000", "pl_theoretical": "-1.0000"}
_TP0_SL_BE = {"outcome_path": "TP0→SL", "max_reached": "TP0", "be_moved": "True",
"pl_marius": "0.2000", "pl_theoretical": "0.1330"}
_TP0_TP1 = {"outcome_path": "TP0→TP1", "max_reached": "TP1", "be_moved": "True",
"pl_marius": "0.5000", "pl_theoretical": "0.3330"}
_TP0_TP2 = {"outcome_path": "TP0→TP2", "max_reached": "TP2", "be_moved": "True",
"pl_marius": "0.5000", "pl_theoretical": "0.6670"}
_PENDING = {"outcome_path": "pending", "max_reached": "TP0", "be_moved": "False",
"pl_marius": "", "pl_theoretical": "0.1330"}
Layout (by Set):
- A1: 10 trades — 6 wins TP0->TP1 (+0.5), 4 losses SL (-1.0) → WR 60%
- A2: 10 trades — 7 wins TP0->TP2 (+0.5), 3 losses SL (-1.0) → WR 70%
- A3: 10 trades — 4 wins TP0->TP1 (+0.5), 6 losses SL (-1.0) → WR 40%
Overall: 17 wins / 30, WR ≈ 56.67%. def _synthetic_csv(tmp_path: Path) -> Path:
"""30-trade backtest fixture.
Set distribution:
A1: 8 rows (all closed; 3 SL, 2 TP0→SL, 2 TP0→TP1, 1 TP0→TP2)
A2: 10 rows (all closed; 4 SL, 3 TP0→SL, 2 TP0→TP1, 1 TP0→TP2)
B : 7 rows (2 pending, 5 closed; 2 SL, 2 TP0→TP1, 1 TP0→TP2)
D : 5 rows (3 pending, 2 closed; 1 SL, 1 TP0→TP1)
Totals: n_total=30, n_pending=5, n_closed=25.
Wins by pl_marius (>0): all TP0→SL_BE + TP0→TP1 + TP0→TP2
A1: 2 + 2 + 1 = 5 wins / 8
A2: 3 + 2 + 1 = 6 wins / 10
B : 0 + 2 + 1 = 3 wins / 5
D : 0 + 1 + 0 = 1 win / 2
Total wins = 15 / 25 = 60.0%.
Calitate distribution: half "Clară", half "Slabă" (alternating).
Directie distribution: 2/3 Buy, 1/3 Sell.
""" """
rows: list[dict[str, str]] = [] rows: list[dict[str, str]] = []
rid = 0 rid = 0
def add(set_label: str, n_win: int, n_loss: int, calitate: str = "Clară") -> None: def add(set_label: str, outcomes: list[dict[str, str]]) -> None:
nonlocal rid nonlocal rid
for _ in range(n_win): for i, outcome in enumerate(outcomes):
rid += 1 rid += 1
rows.append( row = _base_row(
_base_row( id=rid,
id=rid, screenshot_file=f"{set_label.lower()}-{rid}.png",
screenshot_file=f"win-{rid}.png", set=set_label,
set=set_label, calitate="Clară" if rid % 2 == 0 else "Slabă",
calitate=calitate, directie="Buy" if rid % 3 != 0 else "Sell",
outcome_path="TP0→TP1",
max_reached="TP1",
be_moved="True",
pl_marius="0.5000",
pl_theoretical="0.3330",
)
)
for _ in range(n_loss):
rid += 1
rows.append(
_base_row(
id=rid,
screenshot_file=f"loss-{rid}.png",
set=set_label,
calitate=calitate,
outcome_path="SL",
max_reached="SL_first",
be_moved="False",
pl_marius="-1.0000",
pl_theoretical="-1.0000",
)
) )
row.update({k: str(v) for k, v in outcome.items()})
rows.append(row)
add("A1", 6, 4) add("A1", [_SL] * 3 + [_TP0_SL_BE] * 2 + [_TP0_TP1] * 2 + [_TP0_TP2] * 1)
add("A2", 7, 3) add("A2", [_SL] * 4 + [_TP0_SL_BE] * 3 + [_TP0_TP1] * 2 + [_TP0_TP2] * 1)
add("A3", 4, 6) add("B", [_PENDING] * 2 + [_SL] * 2 + [_TP0_TP1] * 2 + [_TP0_TP2] * 1)
add("D", [_PENDING] * 3 + [_SL] * 1 + [_TP0_TP1] * 1)
path = tmp_path / "jurnal.csv" path = tmp_path / "jurnal.csv"
_write_csv(path, rows) _write_csv(path, rows)
@@ -134,336 +134,314 @@ def _synthetic_30(tmp_path: Path) -> Path:
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Wilson CI — reference values # compute_stats — core
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
class TestWilsonCI: class TestComputeStats:
def test_n_zero(self) -> None: def test_compute_stats_n_pending(self, tmp_path: Path) -> None:
assert wilson_ci(0, 0) == (0.0, 0.0) path = _synthetic_csv(tmp_path)
s = compute_stats(path)
assert s["n_total"] == 30
assert s["n_pending"] == 5
assert s["n_closed"] == 25
def test_50pct_at_n40(self) -> None: def test_compute_stats_wr_correct(self, tmp_path: Path) -> None:
lo, hi = wilson_ci(20, 40) """Manual win count: 15 / 25 = 60.0%."""
assert lo == pytest.approx(0.3519927879709976, abs=1e-9) path = _synthetic_csv(tmp_path)
assert hi == pytest.approx(0.6480072120290024, abs=1e-9) s = compute_stats(path)
assert s["wr"] == pytest.approx(15 / 25)
lo, hi = s["wr_ci_95"]
assert 0.0 <= lo <= s["wr"] <= hi <= 1.0
def test_55pct_at_n40(self) -> None: def test_compute_stats_per_set(self, tmp_path: Path) -> None:
lo, hi = wilson_ci(22, 40) path = _synthetic_csv(tmp_path)
assert lo == pytest.approx(0.3982882988844078, abs=1e-9) s = compute_stats(path)
assert hi == pytest.approx(0.6929492471905531, abs=1e-9) a2 = s["per_set"]["A2"]
assert a2["n"] == 10 # 10 closed A2 trades
# A2 wins (pl_marius > 0): 3 BE + 2 TP1 + 1 TP2 = 6 / 10
assert a2["wr"] == pytest.approx(0.60)
def test_55pct_at_n100(self) -> None: def test_per_set_b_pending_excluded(self, tmp_path: Path) -> None:
# Larger N tightens the CI; lower bound rises above 45%. """Set B has 7 total rows (2 pending + 5 closed). n must be 5."""
lo, hi = wilson_ci(55, 100) path = _synthetic_csv(tmp_path)
assert lo == pytest.approx(0.4524442703164345, abs=1e-9) s = compute_stats(path)
assert hi == pytest.approx(0.6438562489359655, abs=1e-9) assert s["per_set"]["B"]["n"] == 5
assert lo > 0.45 # STOPPING_RULE GO-LIVE gate # B wins: 0 BE + 2 TP1 + 1 TP2 = 3 / 5
assert s["per_set"]["B"]["wr"] == pytest.approx(0.60)
def test_zero_wins(self) -> None: def test_per_directie_no_ci_keys(self, tmp_path: Path) -> None:
lo, hi = wilson_ci(0, 10) """per_directie omits CI fields per spec (only n / wr / expectancy)."""
assert lo == pytest.approx(0.0, abs=1e-12) path = _synthetic_csv(tmp_path)
assert hi == pytest.approx(0.2775401687666165, abs=1e-9) s = compute_stats(path)
for k, d in s["per_directie"].items():
assert set(d.keys()) == {"n", "wr", "expectancy"}, k
def test_all_wins(self) -> None: def test_overlay_theoretical_vs_marius(self, tmp_path: Path) -> None:
lo, hi = wilson_ci(10, 10) path = _synthetic_csv(tmp_path)
assert lo == pytest.approx(0.7224598312333834, abs=1e-9) s_m = compute_stats(path, overlay="pl_marius")
assert hi == pytest.approx(1.0, abs=1e-12) s_t = compute_stats(path, overlay="pl_theoretical")
# Same N, but different expectancy.
assert s_m["n_closed"] == s_t["n_closed"]
assert s_m["expectancy"] != s_t["expectancy"]
def test_wins_out_of_range(self) -> None: def test_unknown_overlay_raises(self, tmp_path: Path) -> None:
path = _synthetic_csv(tmp_path)
with pytest.raises(ValueError): with pytest.raises(ValueError):
wilson_ci(11, 10) compute_stats(path, overlay="pl_imaginary")
with pytest.raises(ValueError):
wilson_ci(-1, 10)
def test_empty_csv_no_crash(self, tmp_path: Path) -> None:
path = tmp_path / "empty.csv"
_write_csv(path, [])
s = compute_stats(path)
assert s["n_total"] == 0
assert s["n_closed"] == 0
assert s["per_set"] == {}
assert s["wr"] == 0.0
assert s["wr_ci_95"] == (0.0, 0.0)
# --------------------------------------------------------------------------- def test_missing_csv_no_crash(self, tmp_path: Path) -> None:
# Bootstrap CI — determinism + sanity # Nonexistent path: treat as empty, do not raise.
# --------------------------------------------------------------------------- s = compute_stats(tmp_path / "ghost.csv")
assert s["n_total"] == 0
def test_calibration_rows_excluded(self, tmp_path: Path) -> None:
class TestBootstrapCI:
def test_deterministic_with_seed(self) -> None:
vals = [0.5, -1.0, 0.5, 0.5, -1.0, 0.2, -0.3, 0.5, -1.0, 0.5]
lo1, hi1 = bootstrap_ci(vals, iterations=500, seed=42)
lo2, hi2 = bootstrap_ci(vals, iterations=500, seed=42)
assert (lo1, hi1) == (lo2, hi2)
def test_different_seed_different_result(self) -> None:
vals = [0.5, -1.0, 0.5, 0.5, -1.0, 0.2, -0.3, 0.5, -1.0, 0.5]
r1 = bootstrap_ci(vals, iterations=500, seed=1)
r2 = bootstrap_ci(vals, iterations=500, seed=2)
assert r1 != r2
def test_brackets_the_mean(self) -> None:
vals = [0.5, -1.0, 0.5, 0.5, -1.0, 0.2, -0.3, 0.5, -1.0, 0.5] * 5
mean = sum(vals) / len(vals)
lo, hi = bootstrap_ci(vals, iterations=1000, seed=7)
assert lo <= mean <= hi
def test_empty_input(self) -> None:
assert bootstrap_ci([], iterations=100, seed=0) == (0.0, 0.0)
def test_single_value(self) -> None:
lo, hi = bootstrap_ci([0.5], iterations=100, seed=0)
# No variance with n=1: short-circuited to (mean, mean).
assert lo == pytest.approx(0.5)
assert hi == pytest.approx(0.5)
# ---------------------------------------------------------------------------
# Loading + group stats on the 30-trade fixture
# ---------------------------------------------------------------------------
class TestSyntheticFixture:
def test_load_30(self, tmp_path: Path) -> None:
path = _synthetic_30(tmp_path)
trades = load_trades(path)
assert len(trades) == 30
assert all(t.source == "vision" for t in trades)
def test_overall_wr(self, tmp_path: Path) -> None:
trades = load_trades(_synthetic_30(tmp_path))
wins, n, wr = win_rate(trades)
assert wins == 17
assert n == 30
assert wr == pytest.approx(17 / 30)
def test_overall_expectancy(self, tmp_path: Path) -> None:
trades = load_trades(_synthetic_30(tmp_path))
# 17 wins * 0.5 + 13 losses * -1.0 = 8.5 - 13.0 = -4.5 → mean = -0.15
assert expectancy(trades) == pytest.approx(-0.15, abs=1e-9)
def test_per_set_wr(self, tmp_path: Path) -> None:
trades = load_trades(_synthetic_30(tmp_path))
by_set = group_by(trades, "set")
wr_a1 = win_rate(by_set["A1"])[2]
wr_a2 = win_rate(by_set["A2"])[2]
wr_a3 = win_rate(by_set["A3"])[2]
assert wr_a1 == pytest.approx(0.60)
assert wr_a2 == pytest.approx(0.70)
assert wr_a3 == pytest.approx(0.40)
def test_group_stats_a2(self, tmp_path: Path) -> None:
trades = load_trades(_synthetic_30(tmp_path))
a2 = [t for t in trades if t.set == "A2"]
s = compute_group_stats(
a2, label="A2", bootstrap_iterations=500, seed=11
)
assert s.n_total == 10
assert s.n_resolved == 10
assert s.wins == 7
assert s.wr == pytest.approx(0.70)
# Wilson 7/10
assert s.wr_ci_lo == pytest.approx(0.3967732199795652, abs=1e-9)
assert s.wr_ci_hi == pytest.approx(0.892210712513788, abs=1e-9)
# Expectancy A2 = 7*0.5 + 3*(-1.0) = 0.5 → mean = 0.05
assert s.exp_marius == pytest.approx(0.05, abs=1e-9)
assert s.exp_marius_ci_lo <= s.exp_marius <= s.exp_marius_ci_hi
# ---------------------------------------------------------------------------
# Pending-trade handling
# ---------------------------------------------------------------------------
class TestPendingHandling:
def test_pending_excluded_from_wr(self, tmp_path: Path) -> None:
rows = [ rows = [
_base_row( _base_row(id=1, source="vision", screenshot_file="v.png"),
id=1, screenshot_file="a.png", _base_row(id=2, source="manual_calibration", screenshot_file="c.png"),
outcome_path="TP0→TP1", max_reached="TP1", _base_row(id=3, source="vision_calibration", screenshot_file="c.png"),
be_moved="True", pl_marius="0.5000", pl_theoretical="0.3330",
),
_base_row(
id=2, screenshot_file="b.png",
outcome_path="pending", max_reached="TP0",
be_moved="False", pl_marius="", pl_theoretical="0.1330",
),
_base_row(
id=3, screenshot_file="c.png",
outcome_path="SL", max_reached="SL_first",
be_moved="False", pl_marius="-1.0000", pl_theoretical="-1.0000",
),
] ]
p = tmp_path / "j.csv" path = tmp_path / "j.csv"
_write_csv(p, rows) _write_csv(path, rows)
trades = load_trades(p) s = compute_stats(path)
assert s["n_total"] == 1 # calibration rows filtered out
wins, n, wr = win_rate(trades)
assert wins == 1
assert n == 2 # pending excluded
assert wr == pytest.approx(0.5)
# Expectancy on pl_marius averages only resolved rows: (0.5 + -1.0) / 2 = -0.25
assert expectancy(trades, "pl_marius") == pytest.approx(-0.25)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Source filtering: calibration rows excluded from main report # render_stats
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
class TestSourceFiltering: class TestRenderStats:
def test_calibration_rows_excluded_from_backtest_stats( def test_render_stats_no_crash(self, tmp_path: Path) -> None:
self, tmp_path: Path path = _synthetic_csv(tmp_path)
) -> None: s = compute_stats(path)
rows = [ out = render_stats(s, "pl_marius")
_base_row(id=1, source="vision", screenshot_file="v.png", assert isinstance(out, str)
pl_marius="0.5000"), assert out # non-empty
_base_row(id=2, source="manual", screenshot_file="m.png", assert "STOPPING RULE" in out
pl_marius="0.5000"),
_base_row(id=3, source="manual_calibration", screenshot_file="c.png", def test_render_stats_contains_sections(self, tmp_path: Path) -> None:
pl_marius="-1.0000"), path = _synthetic_csv(tmp_path)
_base_row(id=4, source="vision_calibration", screenshot_file="c.png", out = render_stats(compute_stats(path), "pl_marius")
pl_marius="-1.0000"), for marker in (
] "Stats jurnal.csv",
p = tmp_path / "j.csv" "Trade-uri totale",
_write_csv(p, rows) "GLOBAL",
trades = load_trades(p) "PER SET:",
backtest = [t for t in trades if t.source in BACKTEST_SOURCES] "PER CALITATE",
assert len(backtest) == 2 "PER DIRECȚIE",
wins, n, wr = win_rate(backtest) "DESCRIPTOR ONLY",
assert (wins, n) == (2, 2) ):
assert wr == pytest.approx(1.0) assert marker in out, f"missing section: {marker!r}"
def test_render_stats_flags_under_threshold(self, tmp_path: Path) -> None:
"""All Sets in synthetic fixture have N<40 → all should be flagged."""
path = _synthetic_csv(tmp_path)
out = render_stats(compute_stats(path), "pl_marius")
for k in ("A1", "A2", "B", "D"):
assert f"{k}: N=" in out
assert "NEEDS MORE DATA" in out
def test_render_stats_empty(self, tmp_path: Path) -> None:
path = tmp_path / "empty.csv"
_write_csv(path, [])
out = render_stats(compute_stats(path), "pl_marius")
assert "Trade-uri totale: 0" in out
# No crash, no per-Set table for an empty dataset.
assert "NEEDS MORE DATA" not in out
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Calibration mode: pairing + mismatch # compute_calibration
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
class TestCalibration: class TestComputeCalibration:
def test_pairs_and_zero_mismatch(self, tmp_path: Path) -> None: def test_compute_calibration_pairs(self, tmp_path: Path) -> None:
m = _base_row( rows: list[dict[str, str]] = []
id=1, source="manual_calibration", screenshot_file="cal-1.png" for i in range(5):
) f = f"cal-{i}.png"
v = _base_row( rows.append(_base_row(
id=2, source="vision_calibration", screenshot_file="cal-1.png" id=i * 2 + 1, source="manual_calibration", screenshot_file=f
) ))
p = tmp_path / "j.csv" rows.append(_base_row(
_write_csv(p, [m, v]) id=i * 2 + 2, source="vision_calibration", screenshot_file=f
trades = load_trades(p) ))
rep = calibration_mismatch(trades) path = tmp_path / "j.csv"
assert rep.pairs == 1 _write_csv(path, rows)
assert sum(rep.field_mismatches.values()) == 0 cal = compute_calibration(path)
assert rep.overall_mismatch_rate == 0.0 assert cal["n_pairs"] == 5
def test_one_field_mismatch(self, tmp_path: Path) -> None:
m = _base_row(
id=1, source="manual_calibration", screenshot_file="cal-1.png",
entry="400.0",
)
v = _base_row(
id=2, source="vision_calibration", screenshot_file="cal-1.png",
entry="400.10", # different entry
)
p = tmp_path / "j.csv"
_write_csv(p, [m, v])
trades = load_trades(p)
rep = calibration_mismatch(trades)
assert rep.pairs == 1
assert rep.field_mismatches["entry"] == 1
# all other core fields match
for fld in CORE_CALIBRATION_FIELDS: for fld in CORE_CALIBRATION_FIELDS:
if fld == "entry": assert fld in cal["fields"]
continue # All identical → 5 matches, 0 mismatches per field.
assert rep.field_mismatches[fld] == 0 assert cal["fields"][fld]["match"] == 5
# 1 mismatch / (1 pair * 8 fields) = 12.5% assert cal["fields"][fld]["mismatch"] == 0
assert rep.overall_mismatch_rate == pytest.approx(1.0 / len(CORE_CALIBRATION_FIELDS)) assert cal["fields"][fld]["match_rate"] == pytest.approx(1.0)
def test_unpaired_rows_ignored(self, tmp_path: Path) -> None: def test_compute_calibration_mismatch_examples(self, tmp_path: Path) -> None:
# Only a manual leg — no pair → 0 pairs. """Modify entry on 2 pairsmismatch_examples contains both."""
m = _base_row( rows: list[dict[str, str]] = []
id=1, source="manual_calibration", screenshot_file="lonely.png" for i in range(5):
) f = f"cal-{i}.png"
p = tmp_path / "j.csv" manual_entry = "400.0"
_write_csv(p, [m]) # First two pairs differ on entry; the rest match exactly.
trades = load_trades(p) vision_entry = "401.5" if i < 2 else "400.0"
rep = calibration_mismatch(trades) rows.append(_base_row(
assert rep.pairs == 0 id=i * 2 + 1, source="manual_calibration",
assert rep.total_comparisons == 0 screenshot_file=f, entry=manual_entry,
assert rep.overall_mismatch_rate == 0.0 ))
rows.append(_base_row(
id=i * 2 + 2, source="vision_calibration",
screenshot_file=f, entry=vision_entry,
))
path = tmp_path / "j.csv"
_write_csv(path, rows)
cal = compute_calibration(path)
assert cal["n_pairs"] == 5
entry = cal["fields"]["entry"]
assert entry["match"] == 3
assert entry["mismatch"] == 2
assert entry["match_rate"] == pytest.approx(3 / 5)
assert len(entry["mismatch_examples"]) == 2
for ex in entry["mismatch_examples"]:
assert "manual=" in ex and "vision=" in ex
def test_numeric_equivalence_tolerated(self, tmp_path: Path) -> None: def test_calibration_examples_capped_at_3(self, tmp_path: Path) -> None:
"""'400' and '400.0000' should NOT count as a mismatch on entry.""" """5 mismatches but mismatch_examples is capped at 3."""
m = _base_row( rows: list[dict[str, str]] = []
id=1, source="manual_calibration", screenshot_file="cal-1.png", for i in range(5):
entry="400", f = f"cal-{i}.png"
) rows.append(_base_row(
v = _base_row( id=i * 2 + 1, source="manual_calibration",
id=2, source="vision_calibration", screenshot_file="cal-1.png", screenshot_file=f, entry="400.0",
entry="400.0000", ))
) rows.append(_base_row(
p = tmp_path / "j.csv" id=i * 2 + 2, source="vision_calibration",
_write_csv(p, [m, v]) screenshot_file=f, entry="500.0",
rep = calibration_mismatch(load_trades(p)) ))
assert rep.field_mismatches["entry"] == 0 path = tmp_path / "j.csv"
_write_csv(path, rows)
cal = compute_calibration(path)
assert cal["fields"]["entry"]["mismatch"] == 5
assert len(cal["fields"]["entry"]["mismatch_examples"]) == 3
def test_calibration_numeric_tolerance(self, tmp_path: Path) -> None:
# --------------------------------------------------------------------------- """Floats within 0.01 must NOT count as a mismatch."""
# Report formatting + CLI
# ---------------------------------------------------------------------------
class TestReporting:
def test_format_report_contains_sections(self, tmp_path: Path) -> None:
out = format_report(
load_trades(_synthetic_30(tmp_path)),
bootstrap_iterations=200,
seed=0,
)
assert "M2D Backtest Stats" in out
assert "Overall" in out
assert "By Set" in out
assert "A1" in out and "A2" in out and "A3" in out
# calitate warning present
assert "descriptor only" in out.lower() or "biased" in out.lower()
def test_format_calibration_report(self, tmp_path: Path) -> None:
rows = [ rows = [
_base_row( _base_row(
id=1, source="manual_calibration", screenshot_file="cal-1.png" id=1, source="manual_calibration",
screenshot_file="cal-1.png", entry="400.005",
), ),
_base_row( _base_row(
id=2, source="vision_calibration", screenshot_file="cal-1.png", id=2, source="vision_calibration",
directie="Sell", # mismatch on directie screenshot_file="cal-1.png", entry="400.010",
entry="400.0", sl="401.0", tp0="399.5", tp1="399.0", tp2="398.0",
), ),
] ]
p = tmp_path / "j.csv" path = tmp_path / "j.csv"
_write_csv(p, rows) _write_csv(path, rows)
out = format_calibration_report(load_trades(p)) cal = compute_calibration(path)
assert "Paired screenshots" in out assert cal["fields"]["entry"]["match"] == 1
assert cal["fields"]["entry"]["mismatch"] == 0
def test_calibration_outside_tolerance(self, tmp_path: Path) -> None:
"""Floats > 0.01 apart DO count as a mismatch."""
rows = [
_base_row(
id=1, source="manual_calibration",
screenshot_file="cal-1.png", entry="400.00",
),
_base_row(
id=2, source="vision_calibration",
screenshot_file="cal-1.png", entry="400.05",
),
]
path = tmp_path / "j.csv"
_write_csv(path, rows)
cal = compute_calibration(path)
assert cal["fields"]["entry"]["mismatch"] == 1
def test_calibration_no_pairs(self, tmp_path: Path) -> None:
"""No paired screenshot → n_pairs=0, all rates 0.0."""
path = tmp_path / "j.csv"
_write_csv(path, [
_base_row(id=1, source="manual_calibration", screenshot_file="lonely.png"),
])
cal = compute_calibration(path)
assert cal["n_pairs"] == 0
for fld in CORE_CALIBRATION_FIELDS:
assert cal["fields"][fld]["match"] == 0
assert cal["fields"][fld]["mismatch"] == 0
def test_render_calibration_no_crash(self, tmp_path: Path) -> None:
rows = [
_base_row(id=1, source="manual_calibration",
screenshot_file="cal-1.png", directie="Buy"),
_base_row(id=2, source="vision_calibration",
screenshot_file="cal-1.png", directie="Sell",
entry="400.0", sl="401.0", tp0="399.5",
tp1="399.0", tp2="398.0"),
]
path = tmp_path / "j.csv"
_write_csv(path, rows)
out = render_calibration(compute_calibration(path))
assert "Calibration P4" in out
assert "directie" in out assert "directie" in out
# 1 mismatch (directie) of 8 fields = 12.5% → FAIL P4 gate
assert "FAIL" in out
def test_empty_csv_report(self, tmp_path: Path) -> None: def test_render_calibration_empty(self, tmp_path: Path) -> None:
p = tmp_path / "empty.csv" path = tmp_path / "empty.csv"
_write_csv(p, []) _write_csv(path, [])
out = format_report(load_trades(p)) out = render_calibration(compute_calibration(path))
assert "no backtest trades" in out.lower() assert "0" in out
assert "FAIL" not in out
assert "PASS" not in out
def test_main_cli_runs(
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
class TestCLI:
def test_main_stats(
self, tmp_path: Path, capsys: pytest.CaptureFixture self, tmp_path: Path, capsys: pytest.CaptureFixture
) -> None: ) -> None:
path = _synthetic_30(tmp_path) path = _synthetic_csv(tmp_path)
rc = main(["--csv", str(path), "--seed", "0", "--bootstrap-iterations", "100"]) rc = main(["--csv", str(path)])
assert rc == 0 assert rc == 0
captured = capsys.readouterr() assert "Stats jurnal.csv" in capsys.readouterr().out
assert "M2D Backtest Stats" in captured.out
def test_main_cli_calibration( def test_main_overlay(
self, tmp_path: Path, capsys: pytest.CaptureFixture
) -> None:
path = _synthetic_csv(tmp_path)
rc = main(["--csv", str(path), "--overlay", "pl_theoretical"])
assert rc == 0
assert "pl_theoretical" in capsys.readouterr().out
def test_main_calibration(
self, tmp_path: Path, capsys: pytest.CaptureFixture self, tmp_path: Path, capsys: pytest.CaptureFixture
) -> None: ) -> None:
rows = [ rows = [
_base_row(id=1, source="manual_calibration", screenshot_file="cal-1.png"), _base_row(id=1, source="manual_calibration",
_base_row(id=2, source="vision_calibration", screenshot_file="cal-1.png"), screenshot_file="cal-1.png"),
_base_row(id=2, source="vision_calibration",
screenshot_file="cal-1.png"),
] ]
p = tmp_path / "j.csv" path = tmp_path / "j.csv"
_write_csv(p, rows) _write_csv(path, rows)
rc = main(["--csv", str(p), "--calibration"]) rc = main(["--csv", str(path), "--calibration"])
assert rc == 0 assert rc == 0
out = capsys.readouterr().out out = capsys.readouterr().out
assert "Calibration P4 gate" in out assert "Calibration P4" in out
assert "PASS" in out # all fields match → PASS assert "PASS" in out

83
tests/test_stats_ci.py Normal file
View File

@@ -0,0 +1,83 @@
"""Pure-math tests for stats CI primitives (no I/O)."""
from __future__ import annotations
import sys
from pathlib import Path
import pytest
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from scripts.stats import bootstrap_expectancy_ci, wilson_ci # noqa: E402
# ---------------------------------------------------------------------------
# Wilson CI
# ---------------------------------------------------------------------------
class TestWilsonCI:
def test_wilson_n_zero(self) -> None:
assert wilson_ci(0, 0) == (0.0, 0.0)
def test_wilson_perfect_winrate(self) -> None:
lo, hi = wilson_ci(10, 10)
assert lo > 0.65
assert hi == pytest.approx(1.0, abs=1e-12)
def test_wilson_reference_15_55(self) -> None:
"""wins=8, n=15 (WR≈53%) → CI approximately [29%, 76%] ±2%."""
lo, hi = wilson_ci(8, 15)
assert lo == pytest.approx(0.29, abs=0.02)
assert hi == pytest.approx(0.76, abs=0.02)
def test_wilson_all_losses(self) -> None:
lo, hi = wilson_ci(0, 10)
assert lo == pytest.approx(0.0, abs=1e-12)
assert hi < 0.35
def test_wilson_wins_out_of_range(self) -> None:
with pytest.raises(ValueError):
wilson_ci(11, 10)
with pytest.raises(ValueError):
wilson_ci(-1, 10)
def test_wilson_clamps_at_50pct_n40(self) -> None:
"""Reference at WR=50%, N=40: CI ≈ [35.2%, 64.8%]."""
lo, hi = wilson_ci(20, 40)
assert lo == pytest.approx(0.352, abs=0.005)
assert hi == pytest.approx(0.648, abs=0.005)
# ---------------------------------------------------------------------------
# Bootstrap CI
# ---------------------------------------------------------------------------
class TestBootstrap:
def test_bootstrap_deterministic(self) -> None:
values = [1.0, -0.5, 0.5, -1.0]
a = bootstrap_expectancy_ci(values, n_resamples=1000, seed=42)
b = bootstrap_expectancy_ci(values, n_resamples=1000, seed=42)
assert a == b
def test_bootstrap_different_seed_different_result(self) -> None:
values = [1.0, -0.5, 0.5, -1.0, 0.2, -0.3, 0.5]
a = bootstrap_expectancy_ci(values, n_resamples=1000, seed=1)
b = bootstrap_expectancy_ci(values, n_resamples=1000, seed=2)
assert a != b
def test_bootstrap_empty(self) -> None:
assert bootstrap_expectancy_ci([], n_resamples=100, seed=0) == (0.0, 0.0)
def test_bootstrap_single_value(self) -> None:
lo, hi = bootstrap_expectancy_ci([0.5], n_resamples=100, seed=0)
assert lo == pytest.approx(0.5, abs=1e-9)
assert hi == pytest.approx(0.5, abs=1e-9)
def test_bootstrap_brackets_the_mean(self) -> None:
values = [0.5, -1.0, 0.5, 0.5, -1.0, 0.2, -0.3, 0.5, -1.0, 0.5] * 5
mean = sum(values) / len(values)
lo, hi = bootstrap_expectancy_ci(values, n_resamples=1000, seed=7)
assert lo <= mean <= hi