Files
atm/docs/partitioned-honking-unicorn.md
Claude Agent 9207197a56 initial: scaffold atm trading monitor (Faza 1)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:03:36 +00:00

259 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Plan: ATM — Automated Trading Monitor (M2D, Faza 1) — ENG-REVIEWED
**Source plan:** `/home/claude/.claude/plans/swirling-drifting-starfish.md`
**CEO plan artifact:** `~/.gstack/projects/romfast-workspace/ceo-plans/2026-04-15-atm-trading.md`
**Eng review mode:** FULL_REVIEW (4 decisions made, 0 unresolved)
**Design doc:** `~/.gstack/projects/romfast-workspace/claude-master-design-20260415-atm-trading.md` (APPROVED)
**Eng test plan:** `~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md`
---
## Context
User trades M2D strategy manually on DIA (TradeStation) with execution on TradeLocker US30 CFD (prop firm). Same strategy on GLD → XAUUSD. 4h/evening dual-screen monitoring. Faza 1 goal: bot auto-detects M2D trigger, sends Discord/Telegram notification with screenshot + SL/TP1/TP2 levels; user executes manually in TradeLocker. Faza 2 (auto-execution) deferred until prop firm TOS verified and Faza 1 proven over 20+ sessions.
**Review changed two things from the original plan:**
1. **State machine spec corrected.** Original "last 3 consecutive non-gray dots" is wrong. Actual M2D is phased: Phase 1 arming (turquoise → gray/dark-green) → Phase 2 trigger (light-green).
2. **Levels extraction corrected.** Original plan had levels.py extracting SL/TP at trigger. But those lines only appear on TradeStation chart *after* user enters trade in TradeLocker. Corrected to two-phase: spec-math at trigger, chart-scan after entry.
Plus 5 accepted expansions (labeled corpus, level fallback, layout canary, trade journal, TOS checklist).
---
## Approach: B (Structured Python service, dry-run, audit log) + CEO-reviewed additions
Runs on Windows machine alongside TradeStation. `mss` screenshots → ROI color-sample on M2D MAPS strip → phased state machine → Discord webhook + Telegram bot → JSONL audit + trade journal → dry-run replay against labeled corpus.
---
## State Machine Spec (corrected + exhaustive)
States:
- `IDLE`
- `ARMED_BUY` — turquoise seen
- `PRIMED_BUY` — turquoise + at least one dark-green seen
- `ARMED_SELL` — yellow seen
- `PRIMED_SELL` — yellow + at least one dark-red seen
**Default rule:** any (state, event) pair not listed below → stay in current state, no action, log as `noise`.
Transitions — BUY side:
| From | Event | To | Action |
|------|-------|-----|--------|
| IDLE | turquoise | ARMED_BUY | log arm_ts |
| IDLE | yellow | ARMED_SELL | log arm_ts (sell) |
| IDLE | dark-green / dark-red / light-green / light-red / gray | IDLE | noise (log phase-skip if light-green/light-red) |
| ARMED_BUY | gray | ARMED_BUY | persist |
| ARMED_BUY | turquoise | ARMED_BUY | refresh arm_ts |
| ARMED_BUY | dark-green | PRIMED_BUY | log prime_ts |
| ARMED_BUY | yellow | ARMED_SELL | opposite rearm |
| ARMED_BUY | dark-red | ARMED_BUY | ignore (minority noise) |
| ARMED_BUY | light-green | IDLE | **skip detected** — no FIRE, log phase_skip |
| ARMED_BUY | light-red | IDLE | skip detected, log |
| PRIMED_BUY | dark-green | PRIMED_BUY | accumulate |
| PRIMED_BUY | dark-red | PRIMED_BUY | ignore (minority noise) |
| PRIMED_BUY | **light-green** | IDLE | **FIRE BUY**, lockout(BUY)=4min |
| PRIMED_BUY | light-red | IDLE | skip detected (wrong trigger) |
| PRIMED_BUY | gray | IDLE | **COOLED** — signal dead, log |
| PRIMED_BUY | turquoise | ARMED_BUY | rearm fresh |
| PRIMED_BUY | yellow | ARMED_SELL | opposite rearm |
SELL side mirrors exactly: swap turquoise↔yellow, dark-green↔dark-red, light-green↔light-red, BUY↔SELL.
Notes:
- No time-based TTL on ARMED/PRIMED. State persists until trigger fires, cooled by gray after PRIMED, opposite-color rearm, or process restart (Windows Task Scheduler stops bot at session end → natural session-boundary reset).
- Cooling rule: "gray after dark-green" = signal racit (user's term). Gray during ARMED_BUY (before any dark-green) is OK.
- After FIRE: 4-minute lockout per-direction. BUY lockout doesn't block SELL and vice versa. Single timestamp per direction.
- Opposite-color-Phase-1 triggers rearm to opposite side (captures direction flip).
- Phase-skip (arming color → trigger color with no phase-2 step) → IDLE, no FIRE, logged. Would be legitimate only if indicator collapses phases, which it doesn't per observed behavior.
---
## Detection Details
- **Loop interval:** 5 seconds (36 cycles per 3-min bar; stays well inside notification-latency target).
- **Rightmost-dot detection:** scan ROI from right edge leftward, find first non-background pixel cluster → that's the rightmost dot. Don't hardcode x-pixel positions (chart scrolls; hardcoded positions drift).
- **Debounce:** configurable `debounce_depth` in config.toml (default `1` — single-read acceptance). Increase if future sessions show mid-bar color flicker. Screenshot-in-notification is the user's visual verification on top.
- **Rolling window:** keep last 20 classified dots with their detection timestamps. State machine consumes the newest *accepted* (post-debounce) dot per cycle.
- **Classification:** nearest-color match in RGB Euclidean distance, per-color tolerance from calibration. Report confidence = `1 - distance_nearest / distance_second_nearest`. Log confidence every cycle. If all distances > tolerance → `UNKNOWN`, state unchanged.
---
## Levels Extraction (two-phase, simplified)
**Phase A — at trigger (immediate alert to Discord + Telegram):**
- No entry-price compute. No spec-math SL/TP. User places a manual 0.6% SL in TradeLocker at entry; actual TP1/TP2/SL come in Phase B from the chart.
- Notification: `🟢 BUY signal DIA→US30 | 22:47:03` + annotated screenshot (detected dot highlighted).
**Phase B — after user trades (chart-scan confirmation):**
- After Phase A fires, detector keeps watching the chart ROI for horizontal colored lines (red=SL, green=TP1/TP2).
- When lines appear (user has entered trade in TradeLocker and TradeStation drew them) → scan y-pixels via Hough + color mask, convert via y-axis calibration → send second alert to both channels: `✅ Levels: SL=484.35 | TP1=485.20 | TP2=485.88`.
- If chart-line scan times out (no lines in 10 min) → silent (user didn't trade).
- If only 2 lines detected (user didn't set TP2 or line not rendered yet) → partial-result alert.
- Phase B overlap with next signal: guarded by per-direction lockout + Phase-B completion flag; a new FIRE cannot issue until prior Phase B closes (timeout or success).
---
## Dedup / Lockout
- Time-based lockout: after any FIRE, block re-fire for 4 minutes (one 3-min bar + 1 min safety).
- Tracked per-direction: BUY lockout doesn't block SELL.
- Stored as single timestamp per direction (not pixel-keyed).
---
## Observability
- **Heartbeat:** every 30 min to a separate Discord thread (not main alerts channel): `🟢 22:00 alive | 0 triggers | confidence avg 0.85 | chart OK`. Silence >35 min = watchdog concern (user notices).
- **Layout canary:** every 60 cycles (5 min), hash a stable reference region (axis labels, chart border). Stored baseline in config. On significant divergence (>threshold) → `⚠️ Layout changed — auto-paused, recalibrate` to alerts channel. Bot pauses detection until operator acknowledges (touch a pause-file or restart).
- **Low-confidence alert:** 3+ consecutive cycles with confidence below threshold → `⚠️ Bot lost sight` (already in original plan).
- **Window-lost alert:** TradeStation window not found for 60s → `⚠️ Cannot find chart`.
- **Audit JSONL:** per-cycle, daily rotation (`logs/YYYY-MM-DD.jsonl`), fields: `{ts, window_found, roi_ok, rightmost_dot_color, confidence, state, transition, trigger, notified, reason}`.
---
## Files to Create
- `/workspace/atm/pyproject.toml` — Python 3.11+ required. Deps: `mss`, `opencv-python`, `numpy`, `requests`, `pygetwindow`, `pywin32` (DPI + window capture), `rich` (CLI), `pillow` (screenshot annotation). **No `tomli` — use stdlib `tomllib`.**
- `/workspace/atm/config.toml` — populated by calibration tool (ROI coords, per-color RGB + tolerance, `debounce_depth`, y-axis scale, canary-region baseline hash, Discord webhook URL, Telegram bot token + chat_id)
- `/workspace/atm/src/atm/config.py`**[ENG-REVIEW]** `@dataclass Config` with `Config.load(path)` that validates on load (RGB tuples, positive tolerances, both notifier credentials present, y-axis 2-point pair). Fail fast at startup.
- `/workspace/atm/src/atm/vision.py`**[ENG-REVIEW]** shared primitives: ROI crop, perceptual hash, pixel-to-price linear interp, Hough line detection with color mask. Used by detector/canary/levels to avoid drift.
- `/workspace/atm/src/atm/detector.py` — screenshot loop, rightmost-dot scan, color classification, rolling window, debounce
- `/workspace/atm/src/atm/state_machine.py` — explicit phased state machine (spec above), exhaustive transition table
- `/workspace/atm/src/atm/levels.py` — Phase B chart-scan only (Phase A entry-price compute removed after ENG-REVIEW)
- `/workspace/atm/src/atm/canary.py` — layout fingerprint hash + drift check + auto-pause
- `/workspace/atm/src/atm/notifier/__init__.py` — abstract `Notifier` protocol: `send_alert()`, `send_heartbeat()`, `send_levels_confirm()`
- `/workspace/atm/src/atm/notifier/fanout.py`**[ENG-REVIEW]** `FanoutNotifier` wraps N backends, each with its own worker thread + bounded queue (size 50, drop-oldest on overflow) + retry with exponential backoff + dead-letter file on total failure. Main loop never blocks.
- `/workspace/atm/src/atm/notifier/discord.py` — webhook POST, annotated screenshot upload (multipart)
- `/workspace/atm/src/atm/notifier/telegram.py`**[ENG-REVIEW]** built in parallel with Discord (no longer deferred); bot API, photo upload
- `/workspace/atm/src/atm/audit.py` — JSONL logger with daily local-midnight rotation, line-buffered write for crash safety
- `/workspace/atm/src/atm/calibrate.py` — Tkinter: window pick → DPI check → ROI corners → per-color sample → y-axis scale → canary region → save versioned config
- `/workspace/atm/src/atm/labeler.py`**[EXPANSION]** Tkinter label UI → `labels.json`
- `/workspace/atm/src/atm/dryrun.py` — replay with precision/recall/confusion matrix when labels present
- `/workspace/atm/src/atm/journal.py`**[EXPANSION]** `atm journal` CLI → `trades.jsonl`
- `/workspace/atm/src/atm/report.py`**[EXPANSION]** weekly aggregation
- `/workspace/atm/src/atm/main.py` — CLI: `atm calibrate`, `atm label <dir>`, `atm dryrun <dir>`, `atm run [--duration Xh]`, `atm journal`, `atm report [--week YYYY-WW]`
- `/workspace/atm/tests/`**[ENG-REVIEW]** unit + E2E per test plan at `~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md`
- `/workspace/atm/samples/`, `/workspace/atm/logs/`
- `/workspace/atm/configs/` — versioned config archive. **[ENG-REVIEW]** No symlink (Windows admin-required); use `configs/current.txt` marker file storing the active filename. `Config.load()` reads the marker.
- `/workspace/atm/docs/phase2-prop-firm-audit.md` — structured TOS checklist
- `/workspace/atm/README.md` — setup, calibration workflow, per-session operating checklist, DPI/multi-monitor notes
---
## Build Order
1. **`pyproject.toml` + package scaffold** — Python 3.11+, `pip install -e .`, `atm --help` works.
2. **Standalone screenshot-dump script**`mss` timer dumps to `samples/` every 5s during trading sessions. Build corpus in parallel.
3. **`config.py` + `vision.py`** — Config dataclass with validation; shared vision primitives. Ship with unit tests for config load + pixel-to-price interp.
4. **`calibrate.py`** — versioned config in `configs/YYYY-MM-DD-HHMM.toml`; `configs/current.txt` marker file points at active. DPI check + canary region capture.
5. **`labeler.py`** — once ~30 samples exist, tag them. `labels.json` is ground truth.
6. **`state_machine.py`** + **unit tests** (clean BUY, clean SELL, cooling, opposite-rearm, lockout per-direction, noise, phase-skip, all state×color pairs via parameterized test).
7. **`detector.py`** + **unit tests** (empty/background ROI, rightmost-cluster, rolling window FIFO, debounce depth=1, classification edges including UNKNOWN).
8. **`canary.py`** + **unit tests** (drift threshold, pause-file gating).
9. **`levels.py`** (Phase B only) + **unit tests** (Hough line detection with color mask, 2 vs 3 lines, 10-min timeout, pixel-to-price roundtrip).
10. **`notifier/fanout.py` + `discord.py` + `telegram.py`** + **unit tests** (queue overflow drop-oldest, 429 backoff, dead-letter on total failure, fanout: one backend down still delivers). Both channels built in parallel — fire together from day 1.
11. **`audit.py`** + **unit tests** (daily rotation at local midnight, line-buffered flush crash safety).
12. **`dryrun.py`** — replay on `samples/` against `labels.json`. **Acceptance gate before live: precision = 100%, recall ≥ 95%.**
13. **E2E replay test** — feed `samples/` through detector → state_machine → notifier-mock → in-memory audit; assert labels match FIREs.
14. **`journal.py`**, **`report.py`**, **`main.py`** (unified CLI).
15. **Windows Task Scheduler setup** — 16:30→18:30, 21:00→23:00. `atm run --duration 2h`. Manual DST check twice yearly.
16. **`docs/phase2-prop-firm-audit.md`** — TOS checklist template.
---
## Existing Utilities to Reuse
Greenfield Python project. No internal utilities. External libs: `mss` (screenshot), `pygetwindow` (window locate), `opencv-python` (line detection in Phase B), `numpy` (color math), `requests` (Discord webhook), `tomli` (config parsing), `pillow` (annotated screenshots).
---
## Verification
End-to-end, in build order:
1. **State machine unit tests:** `pytest tests/test_state_machine.py` — all scenarios (clean BUY, clean SELL, cooling, rearm, lockout, noise) pass.
2. **Calibration:** `atm calibrate` → step through → `config.toml` populated with plausible RGBs for described colors + y-axis scale sane + canary region picked.
3. **Labeled corpus:** ≥30 screenshots in `samples/`, `atm label ./samples` tags each.
4. **Dry-run with metrics:** `atm dryrun ./samples` → precision + recall + confusion matrix printed. **Acceptance gate:** precision = 100%, recall ≥ 95%. If not met → tune tolerances, re-run.
5. **Live test notification-only (2 sessions):** `atm run`. Verify:
- Discord + Telegram notifications within 5s of trigger, both channels receive.
- Phase A message: direction + timestamp + annotated screenshot.
- Phase B levels-alert fires once TradeStation draws SL/TP lines; correct SL/TP1/TP2 prices.
- Heartbeat messages every 30 min in thread.
- Audit JSONL complete, state transitions visible.
- Kill one notifier (e.g. wrong token) → other still delivers, dead-letter file for failed one.
6. **Canary test:** manually move TradeStation window during session → layout-changed alert within 5 min. Move back → restart bot → resumes.
7. **Scheduler test:** Windows Task Scheduler starts bot at 16:30, stops at 18:30 cleanly, log rotates at midnight.
8. **Journal test:** after real trade, `atm journal` → prompt flow complete → `trades.jsonl` entry present.
9. **Report test:** after 1 week of live use, `atm report --week 2026-16` → precision per color, slippage distribution, P&L summary.
---
## Risk Register
- **Prop firm TOS (Faza 2 blocker):** read TOS using `docs/phase2-prop-firm-audit.md` checklist before any auto-execution work. If EA/automation prohibited → Faza 2 dead, stay on Faza 1 permanently.
- **TradeStation layout change:** canary catches it within 5 min → auto-pause. Recalibrate. Losing a session to a layout change is acceptable cost.
- **Calibration drift over time:** versioned configs in `configs/` let you roll back to last-known-good if new calibration misfires.
- **DIA↔US30 price divergence:** accepted (user's judgment). Phase 1 journal captures slippage per signal, feeding Faza 2 go/no-go.
- **Screen sharing / RDP during trading:** overlay can break classification. Low prob, documented in README as operator hygiene.
- **Windows Task Scheduler DST transitions:** twice per year, schedule may misfire. Manual check first week of each DST change.
---
## Out of Scope (Faza 1)
- Any automated click in TradeLocker (Faza 2 work)
- Multi-symbol concurrent monitoring (single chart at a time; user switches manually between DIA and GLD)
- Backtesting on historical data (strategy already manually validated)
- Web UI / dashboard (headless + Discord/Telegram only)
- Ack feedback loop (react-on-notification labeling) — deferred to TODOS.md as `P2-ack-loop`: shipping baseline first, adding feedback once detection quality verified
- Telegram notifier — built only after Discord is stable 5+ sessions
---
## Accepted Expansions (CEO review, SELECTIVE mode)
1.**Labeled sample corpus + dry-run metrics**`labeler.py`, `labels.json`, automated precision/recall in dryrun. Makes acceptance criteria ("false-positives = 0, false-negatives ≤ 5%") machine-checkable.
2.**Level-extractor fallback (spec-math)** — Phase A always uses spec-math; Phase B validates against chart. Redundancy on fragile piece.
3.**Layout canary + auto-pause**`canary.py` hashes stable UI region, auto-pauses on drift. Catches silent classification-with-wrong-positions failure mode.
4.**Trade journal CLI**`atm journal` + `trades.jsonl` + weekly report. Data for Faza 2 go/no-go decision.
5.**Prop-firm TOS audit checklist**`docs/phase2-prop-firm-audit.md`. Structured Faza 2 evaluation framework shipped now.
## Deferred to TODOS.md
- **Ack feedback loop** — Discord reaction emojis feeding precision tuning. High value, operationally heavier (bot vs webhook). Add after Faza 1 baseline stable.
---
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR (SELECTIVE EXPANSION) | 6 proposals, 5 accepted, 1 deferred; 2 arch corrections |
| Codex Review | `/codex review` | Independent 2nd opinion | 0 | — | — |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 1 | CLEAR (FULL_REVIEW) | 9 issues found, 0 critical gaps; 4 decisions made, 0 unresolved |
| Design Review | `/plan-design-review` | UI/UX gaps | 0 | — | SKIPPED (no UI scope — CLI + Discord/Telegram) |
| DX Review | `/plan-devex-review` | Developer experience gaps | 0 | — | SKIPPED (personal tool, single user) |
**UNRESOLVED:** 0
**ENG REVIEW DECISIONS:**
1. **Bar flicker** → debounce depth=1 (configurable), rely on screenshot-in-notification for visual verification.
2. **Phase A entry price** → dropped. User places manual 0.6% SL in TradeLocker at entry. Phase A = direction + screenshot only. Phase B = real SL/TP1/TP2 from chart.
3. **Notifier blocking** → fire-and-forget worker threads per backend, bounded queue (size 50, drop-oldest), retry w/ backoff, dead-letter on total failure.
4. **Alert SPoF** → Discord + Telegram built in parallel from day 1, both fire together.
**ENG REVIEW OBVIOUS FIXES (stated, no decision):**
- Exhaustive state transition table (all state×color pairs, default-noise rule, SELL mirror explicit).
- Python 3.11+ pin, drop `tomli` dep, use stdlib `tomllib`.
- Windows symlink → `configs/current.txt` marker file.
- Shared `vision.py` module (ROI, hash, interp, Hough).
- `@dataclass Config` with fail-fast load-time validation.
- DPI check + multi-monitor note in calibrate + README.
**ENG REVIEW TEST SCOPE (accepted: FULL):** unit tests for every module (state_machine, detector, levels Phase B, canary, audit, notifier fanout/retry, calibrate roundtrip, config validate) + 1 E2E replay harness asserting labeled-corpus precision/recall. Test plan artifact: `~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md`.
**VERDICT:** CEO + ENG CLEARED — ready to implement. Run `/ship` after implementation. No further reviews required before build.