atm/docs/partitioned-honking-unicorn.md

# Plan: ATM — Automated Trading Monitor (M2D, Faza 1) — ENG-REVIEWED

**Source plan:** `/home/claude/.claude/plans/swirling-drifting-starfish.md`
**CEO plan artifact:** `~/.gstack/projects/romfast-workspace/ceo-plans/2026-04-15-atm-trading.md`
**Eng review mode:** FULL_REVIEW (4 decisions made, 0 unresolved)
**Design doc:** `~/.gstack/projects/romfast-workspace/claude-master-design-20260415-atm-trading.md` (APPROVED)
**Eng test plan:** `~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md`

---

## Context

User trades M2D strategy manually on DIA (TradeStation) with execution on TradeLocker US30 CFD (prop firm). Same strategy on GLD → XAUUSD. 4h/evening dual-screen monitoring. Faza 1 goal: bot auto-detects M2D trigger, sends Discord/Telegram notification with screenshot + SL/TP1/TP2 levels; user executes manually in TradeLocker. Faza 2 (auto-execution) deferred until prop firm TOS verified and Faza 1 proven over 20+ sessions.

**Review changed two things from the original plan:**
1. **State machine spec corrected.** Original "last 3 consecutive non-gray dots" is wrong. Actual M2D is phased: Phase 1 arming (turquoise → gray/dark-green) → Phase 2 trigger (light-green).
2. **Levels extraction corrected.** Original plan had levels.py extracting SL/TP at trigger. But those lines only appear on TradeStation chart *after* user enters trade in TradeLocker. Corrected to two-phase: spec-math at trigger, chart-scan after entry.

Plus 5 accepted expansions (labeled corpus, level fallback, layout canary, trade journal, TOS checklist).

---

## Approach: B (Structured Python service, dry-run, audit log) + CEO-reviewed additions

Runs on Windows machine alongside TradeStation. `mss` screenshots → ROI color-sample on M2D MAPS strip → phased state machine → Discord webhook + Telegram bot → JSONL audit + trade journal → dry-run replay against labeled corpus.

---

## State Machine Spec (corrected + exhaustive)

States:
- `IDLE`
- `ARMED_BUY` — turquoise seen
- `PRIMED_BUY` — turquoise + at least one dark-green seen
- `ARMED_SELL` — yellow seen
- `PRIMED_SELL` — yellow + at least one dark-red seen

**Default rule:** any (state, event) pair not listed below → stay in current state, no action, log as `noise`.

Transitions — BUY side:

| From | Event | To | Action |
|------|-------|-----|--------|
| IDLE | turquoise | ARMED_BUY | log arm_ts |
| IDLE | yellow | ARMED_SELL | log arm_ts (sell) |
| IDLE | dark-green / dark-red / light-green / light-red / gray | IDLE | noise (log phase-skip if light-green/light-red) |
| ARMED_BUY | gray | ARMED_BUY | persist |
| ARMED_BUY | turquoise | ARMED_BUY | refresh arm_ts |
| ARMED_BUY | dark-green | PRIMED_BUY | log prime_ts |
| ARMED_BUY | yellow | ARMED_SELL | opposite rearm |
| ARMED_BUY | dark-red | ARMED_BUY | ignore (minority noise) |
| ARMED_BUY | light-green | IDLE | **skip detected** — no FIRE, log phase_skip |
| ARMED_BUY | light-red | IDLE | skip detected, log |
| PRIMED_BUY | dark-green | PRIMED_BUY | accumulate |
| PRIMED_BUY | dark-red | PRIMED_BUY | ignore (minority noise) |
| PRIMED_BUY | **light-green** | IDLE | **FIRE BUY**, lockout(BUY)=4min |
| PRIMED_BUY | light-red | IDLE | skip detected (wrong trigger) |
| PRIMED_BUY | gray | IDLE | **COOLED** — signal dead, log |
| PRIMED_BUY | turquoise | ARMED_BUY | rearm fresh |
| PRIMED_BUY | yellow | ARMED_SELL | opposite rearm |

SELL side mirrors exactly: swap turquoise↔yellow, dark-green↔dark-red, light-green↔light-red, BUY↔SELL.

Notes:
- No time-based TTL on ARMED/PRIMED. State persists until trigger fires, cooled by gray after PRIMED, opposite-color rearm, or process restart (Windows Task Scheduler stops bot at session end → natural session-boundary reset).
- Cooling rule: "gray after dark-green" = signal racit (user's term). Gray during ARMED_BUY (before any dark-green) is OK.
- After FIRE: 4-minute lockout per-direction. BUY lockout doesn't block SELL and vice versa. Single timestamp per direction.
- Opposite-color-Phase-1 triggers rearm to opposite side (captures direction flip).
- Phase-skip (arming color → trigger color with no phase-2 step) → IDLE, no FIRE, logged. Would be legitimate only if indicator collapses phases, which it doesn't per observed behavior.

---

## Detection Details

- **Loop interval:** 5 seconds (36 cycles per 3-min bar; stays well inside notification-latency target).
- **Rightmost-dot detection:** scan ROI from right edge leftward, find first non-background pixel cluster → that's the rightmost dot. Don't hardcode x-pixel positions (chart scrolls; hardcoded positions drift).
- **Debounce:** configurable `debounce_depth` in config.toml (default `1` — single-read acceptance). Increase if future sessions show mid-bar color flicker. Screenshot-in-notification is the user's visual verification on top.
- **Rolling window:** keep last 20 classified dots with their detection timestamps. State machine consumes the newest *accepted* (post-debounce) dot per cycle.
- **Classification:** nearest-color match in RGB Euclidean distance, per-color tolerance from calibration. Report confidence = `1 - distance_nearest / distance_second_nearest`. Log confidence every cycle. If all distances > tolerance → `UNKNOWN`, state unchanged.

---

## Levels Extraction (two-phase, simplified)

**Phase A — at trigger (immediate alert to Discord + Telegram):**
- No entry-price compute. No spec-math SL/TP. User places a manual 0.6% SL in TradeLocker at entry; actual TP1/TP2/SL come in Phase B from the chart.
- Notification: `🟢 BUY signal DIA→US30 | 22:47:03` + annotated screenshot (detected dot highlighted).

**Phase B — after user trades (chart-scan confirmation):**
- After Phase A fires, detector keeps watching the chart ROI for horizontal colored lines (red=SL, green=TP1/TP2).
- When lines appear (user has entered trade in TradeLocker and TradeStation drew them) → scan y-pixels via Hough + color mask, convert via y-axis calibration → send second alert to both channels: `✅ Levels: SL=484.35 | TP1=485.20 | TP2=485.88`.
- If chart-line scan times out (no lines in 10 min) → silent (user didn't trade).
- If only 2 lines detected (user didn't set TP2 or line not rendered yet) → partial-result alert.
- Phase B overlap with next signal: guarded by per-direction lockout + Phase-B completion flag; a new FIRE cannot issue until prior Phase B closes (timeout or success).

---

## Dedup / Lockout

- Time-based lockout: after any FIRE, block re-fire for 4 minutes (one 3-min bar + 1 min safety).
- Tracked per-direction: BUY lockout doesn't block SELL.
- Stored as single timestamp per direction (not pixel-keyed).

---

## Observability

- **Heartbeat:** every 30 min to a separate Discord thread (not main alerts channel): `🟢 22:00 alive | 0 triggers | confidence avg 0.85 | chart OK`. Silence >35 min = watchdog concern (user notices).
- **Layout canary:** every 60 cycles (5 min), hash a stable reference region (axis labels, chart border). Stored baseline in config. On significant divergence (>threshold) → `⚠️ Layout changed — auto-paused, recalibrate` to alerts channel. Bot pauses detection until operator acknowledges (touch a pause-file or restart).
- **Low-confidence alert:** 3+ consecutive cycles with confidence below threshold → `⚠️ Bot lost sight` (already in original plan).
- **Window-lost alert:** TradeStation window not found for 60s → `⚠️ Cannot find chart`.
- **Audit JSONL:** per-cycle, daily rotation (`logs/YYYY-MM-DD.jsonl`), fields: `{ts, window_found, roi_ok, rightmost_dot_color, confidence, state, transition, trigger, notified, reason}`.

---

## Files to Create

- `/workspace/atm/pyproject.toml` — Python 3.11+ required. Deps: `mss`, `opencv-python`, `numpy`, `requests`, `pygetwindow`, `pywin32` (DPI + window capture), `rich` (CLI), `pillow` (screenshot annotation). **No `tomli` — use stdlib `tomllib`.**
- `/workspace/atm/config.toml` — populated by calibration tool (ROI coords, per-color RGB + tolerance, `debounce_depth`, y-axis scale, canary-region baseline hash, Discord webhook URL, Telegram bot token + chat_id)
- `/workspace/atm/src/atm/config.py` — **[ENG-REVIEW]** `@dataclass Config` with `Config.load(path)` that validates on load (RGB tuples, positive tolerances, both notifier credentials present, y-axis 2-point pair). Fail fast at startup.
- `/workspace/atm/src/atm/vision.py` — **[ENG-REVIEW]** shared primitives: ROI crop, perceptual hash, pixel-to-price linear interp, Hough line detection with color mask. Used by detector/canary/levels to avoid drift.
- `/workspace/atm/src/atm/detector.py` — screenshot loop, rightmost-dot scan, color classification, rolling window, debounce
- `/workspace/atm/src/atm/state_machine.py` — explicit phased state machine (spec above), exhaustive transition table
- `/workspace/atm/src/atm/levels.py` — Phase B chart-scan only (Phase A entry-price compute removed after ENG-REVIEW)
- `/workspace/atm/src/atm/canary.py` — layout fingerprint hash + drift check + auto-pause
- `/workspace/atm/src/atm/notifier/__init__.py` — abstract `Notifier` protocol: `send_alert()`, `send_heartbeat()`, `send_levels_confirm()`
- `/workspace/atm/src/atm/notifier/fanout.py` — **[ENG-REVIEW]** `FanoutNotifier` wraps N backends, each with its own worker thread + bounded queue (size 50, drop-oldest on overflow) + retry with exponential backoff + dead-letter file on total failure. Main loop never blocks.
- `/workspace/atm/src/atm/notifier/discord.py` — webhook POST, annotated screenshot upload (multipart)
- `/workspace/atm/src/atm/notifier/telegram.py` — **[ENG-REVIEW]** built in parallel with Discord (no longer deferred); bot API, photo upload
- `/workspace/atm/src/atm/audit.py` — JSONL logger with daily local-midnight rotation, line-buffered write for crash safety
- `/workspace/atm/src/atm/calibrate.py` — Tkinter: window pick → DPI check → ROI corners → per-color sample → y-axis scale → canary region → save versioned config
- `/workspace/atm/src/atm/labeler.py` — **[EXPANSION]** Tkinter label UI → `labels.json`
- `/workspace/atm/src/atm/dryrun.py` — replay with precision/recall/confusion matrix when labels present
- `/workspace/atm/src/atm/journal.py` — **[EXPANSION]** `atm journal` CLI → `trades.jsonl`
- `/workspace/atm/src/atm/report.py` — **[EXPANSION]** weekly aggregation
- `/workspace/atm/src/atm/main.py` — CLI: `atm calibrate`, `atm label <dir>`, `atm dryrun <dir>`, `atm run [--duration Xh]`, `atm journal`, `atm report [--week YYYY-WW]`
- `/workspace/atm/tests/` — **[ENG-REVIEW]** unit + E2E per test plan at `~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md`
- `/workspace/atm/samples/`, `/workspace/atm/logs/`
- `/workspace/atm/configs/` — versioned config archive. **[ENG-REVIEW]** No symlink (Windows admin-required); use `configs/current.txt` marker file storing the active filename. `Config.load()` reads the marker.
- `/workspace/atm/docs/phase2-prop-firm-audit.md` — structured TOS checklist
- `/workspace/atm/README.md` — setup, calibration workflow, per-session operating checklist, DPI/multi-monitor notes

---

## Build Order

1. **`pyproject.toml` + package scaffold** — Python 3.11+, `pip install -e .`, `atm --help` works.
2. **Standalone screenshot-dump script** — `mss` timer dumps to `samples/` every 5s during trading sessions. Build corpus in parallel.
3. **`config.py` + `vision.py`** — Config dataclass with validation; shared vision primitives. Ship with unit tests for config load + pixel-to-price interp.
4. **`calibrate.py`** — versioned config in `configs/YYYY-MM-DD-HHMM.toml`; `configs/current.txt` marker file points at active. DPI check + canary region capture.
5. **`labeler.py`** — once ~30 samples exist, tag them. `labels.json` is ground truth.
6. **`state_machine.py`** + **unit tests** (clean BUY, clean SELL, cooling, opposite-rearm, lockout per-direction, noise, phase-skip, all state×color pairs via parameterized test).
7. **`detector.py`** + **unit tests** (empty/background ROI, rightmost-cluster, rolling window FIFO, debounce depth=1, classification edges including UNKNOWN).
8. **`canary.py`** + **unit tests** (drift threshold, pause-file gating).
9. **`levels.py`** (Phase B only) + **unit tests** (Hough line detection with color mask, 2 vs 3 lines, 10-min timeout, pixel-to-price roundtrip).
10. **`notifier/fanout.py` + `discord.py` + `telegram.py`** + **unit tests** (queue overflow drop-oldest, 429 backoff, dead-letter on total failure, fanout: one backend down still delivers). Both channels built in parallel — fire together from day 1.
11. **`audit.py`** + **unit tests** (daily rotation at local midnight, line-buffered flush crash safety).
12. **`dryrun.py`** — replay on `samples/` against `labels.json`. **Acceptance gate before live: precision = 100%, recall ≥ 95%.**
13. **E2E replay test** — feed `samples/` through detector → state_machine → notifier-mock → in-memory audit; assert labels match FIREs.
14. **`journal.py`**, **`report.py`**, **`main.py`** (unified CLI).
15. **Windows Task Scheduler setup** — 16:30→18:30, 21:00→23:00. `atm run --duration 2h`. Manual DST check twice yearly.
16. **`docs/phase2-prop-firm-audit.md`** — TOS checklist template.

---

## Existing Utilities to Reuse

Greenfield Python project. No internal utilities. External libs: `mss` (screenshot), `pygetwindow` (window locate), `opencv-python` (line detection in Phase B), `numpy` (color math), `requests` (Discord webhook), `tomli` (config parsing), `pillow` (annotated screenshots).

---

## Verification

End-to-end, in build order:

1. **State machine unit tests:** `pytest tests/test_state_machine.py` — all scenarios (clean BUY, clean SELL, cooling, rearm, lockout, noise) pass.
2. **Calibration:** `atm calibrate` → step through → `config.toml` populated with plausible RGBs for described colors + y-axis scale sane + canary region picked.
3. **Labeled corpus:** ≥30 screenshots in `samples/`, `atm label ./samples` tags each.
4. **Dry-run with metrics:** `atm dryrun ./samples` → precision + recall + confusion matrix printed. **Acceptance gate:** precision = 100%, recall ≥ 95%. If not met → tune tolerances, re-run.
5. **Live test notification-only (2 sessions):** `atm run`. Verify:
   - Discord + Telegram notifications within 5s of trigger, both channels receive.
   - Phase A message: direction + timestamp + annotated screenshot.
   - Phase B levels-alert fires once TradeStation draws SL/TP lines; correct SL/TP1/TP2 prices.
   - Heartbeat messages every 30 min in thread.
   - Audit JSONL complete, state transitions visible.
   - Kill one notifier (e.g. wrong token) → other still delivers, dead-letter file for failed one.
6. **Canary test:** manually move TradeStation window during session → layout-changed alert within 5 min. Move back → restart bot → resumes.
7. **Scheduler test:** Windows Task Scheduler starts bot at 16:30, stops at 18:30 cleanly, log rotates at midnight.
8. **Journal test:** after real trade, `atm journal` → prompt flow complete → `trades.jsonl` entry present.
9. **Report test:** after 1 week of live use, `atm report --week 2026-16` → precision per color, slippage distribution, P&L summary.

---

## Risk Register

- **Prop firm TOS (Faza 2 blocker):** read TOS using `docs/phase2-prop-firm-audit.md` checklist before any auto-execution work. If EA/automation prohibited → Faza 2 dead, stay on Faza 1 permanently.
- **TradeStation layout change:** canary catches it within 5 min → auto-pause. Recalibrate. Losing a session to a layout change is acceptable cost.
- **Calibration drift over time:** versioned configs in `configs/` let you roll back to last-known-good if new calibration misfires.
- **DIA↔US30 price divergence:** accepted (user's judgment). Phase 1 journal captures slippage per signal, feeding Faza 2 go/no-go.
- **Screen sharing / RDP during trading:** overlay can break classification. Low prob, documented in README as operator hygiene.
- **Windows Task Scheduler DST transitions:** twice per year, schedule may misfire. Manual check first week of each DST change.

---

## Out of Scope (Faza 1)

- Any automated click in TradeLocker (Faza 2 work)
- Multi-symbol concurrent monitoring (single chart at a time; user switches manually between DIA and GLD)
- Backtesting on historical data (strategy already manually validated)
- Web UI / dashboard (headless + Discord/Telegram only)
- Ack feedback loop (react-on-notification labeling) — deferred to TODOS.md as `P2-ack-loop`: shipping baseline first, adding feedback once detection quality verified
- Telegram notifier — built only after Discord is stable 5+ sessions

---

## Accepted Expansions (CEO review, SELECTIVE mode)

1. ✅ **Labeled sample corpus + dry-run metrics** — `labeler.py`, `labels.json`, automated precision/recall in dryrun. Makes acceptance criteria ("false-positives = 0, false-negatives ≤ 5%") machine-checkable.
2. ✅ **Level-extractor fallback (spec-math)** — Phase A always uses spec-math; Phase B validates against chart. Redundancy on fragile piece.
3. ✅ **Layout canary + auto-pause** — `canary.py` hashes stable UI region, auto-pauses on drift. Catches silent classification-with-wrong-positions failure mode.
4. ✅ **Trade journal CLI** — `atm journal` + `trades.jsonl` + weekly report. Data for Faza 2 go/no-go decision.
5. ✅ **Prop-firm TOS audit checklist** — `docs/phase2-prop-firm-audit.md`. Structured Faza 2 evaluation framework shipped now.

## Deferred to TODOS.md

- **Ack feedback loop** — Discord reaction emojis feeding precision tuning. High value, operationally heavier (bot vs webhook). Add after Faza 1 baseline stable.

---

## GSTACK REVIEW REPORT

| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR (SELECTIVE EXPANSION) | 6 proposals, 5 accepted, 1 deferred; 2 arch corrections |
| Codex Review | `/codex review` | Independent 2nd opinion | 0 | — | — |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 1 | CLEAR (FULL_REVIEW) | 9 issues found, 0 critical gaps; 4 decisions made, 0 unresolved |
| Design Review | `/plan-design-review` | UI/UX gaps | 0 | — | SKIPPED (no UI scope — CLI + Discord/Telegram) |
| DX Review | `/plan-devex-review` | Developer experience gaps | 0 | — | SKIPPED (personal tool, single user) |

**UNRESOLVED:** 0

**ENG REVIEW DECISIONS:**
1. **Bar flicker** → debounce depth=1 (configurable), rely on screenshot-in-notification for visual verification.
2. **Phase A entry price** → dropped. User places manual 0.6% SL in TradeLocker at entry. Phase A = direction + screenshot only. Phase B = real SL/TP1/TP2 from chart.
3. **Notifier blocking** → fire-and-forget worker threads per backend, bounded queue (size 50, drop-oldest), retry w/ backoff, dead-letter on total failure.
4. **Alert SPoF** → Discord + Telegram built in parallel from day 1, both fire together.

**ENG REVIEW OBVIOUS FIXES (stated, no decision):**
- Exhaustive state transition table (all state×color pairs, default-noise rule, SELL mirror explicit).
- Python 3.11+ pin, drop `tomli` dep, use stdlib `tomllib`.
- Windows symlink → `configs/current.txt` marker file.
- Shared `vision.py` module (ROI, hash, interp, Hough).
- `@dataclass Config` with fail-fast load-time validation.
- DPI check + multi-monitor note in calibrate + README.

**ENG REVIEW TEST SCOPE (accepted: FULL):** unit tests for every module (state_machine, detector, levels Phase B, canary, audit, notifier fanout/retry, calibrate roundtrip, config validate) + 1 E2E replay harness asserting labeled-corpus precision/recall. Test plan artifact: `~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md`.

**VERDICT:** CEO + ENG CLEARED — ready to implement. Run `/ship` after implementation. No further reviews required before build.