20 KiB
Plan: ATM — Automated Trading Monitor (M2D, Faza 1) — ENG-REVIEWED
Source plan: /home/claude/.claude/plans/swirling-drifting-starfish.md
CEO plan artifact: ~/.gstack/projects/romfast-workspace/ceo-plans/2026-04-15-atm-trading.md
Eng review mode: FULL_REVIEW (4 decisions made, 0 unresolved)
Design doc: ~/.gstack/projects/romfast-workspace/claude-master-design-20260415-atm-trading.md (APPROVED)
Eng test plan: ~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md
Context
User trades M2D strategy manually on DIA (TradeStation) with execution on TradeLocker US30 CFD (prop firm). Same strategy on GLD → XAUUSD. 4h/evening dual-screen monitoring. Faza 1 goal: bot auto-detects M2D trigger, sends Discord/Telegram notification with screenshot + SL/TP1/TP2 levels; user executes manually in TradeLocker. Faza 2 (auto-execution) deferred until prop firm TOS verified and Faza 1 proven over 20+ sessions.
Review changed two things from the original plan:
- State machine spec corrected. Original "last 3 consecutive non-gray dots" is wrong. Actual M2D is phased: Phase 1 arming (turquoise → gray/dark-green) → Phase 2 trigger (light-green).
- Levels extraction corrected. Original plan had levels.py extracting SL/TP at trigger. But those lines only appear on TradeStation chart after user enters trade in TradeLocker. Corrected to two-phase: spec-math at trigger, chart-scan after entry.
Plus 5 accepted expansions (labeled corpus, level fallback, layout canary, trade journal, TOS checklist).
Approach: B (Structured Python service, dry-run, audit log) + CEO-reviewed additions
Runs on Windows machine alongside TradeStation. mss screenshots → ROI color-sample on M2D MAPS strip → phased state machine → Discord webhook + Telegram bot → JSONL audit + trade journal → dry-run replay against labeled corpus.
State Machine Spec (corrected + exhaustive)
States:
IDLEARMED_BUY— turquoise seenPRIMED_BUY— turquoise + at least one dark-green seenARMED_SELL— yellow seenPRIMED_SELL— yellow + at least one dark-red seen
Default rule: any (state, event) pair not listed below → stay in current state, no action, log as noise.
Transitions — BUY side:
| From | Event | To | Action |
|---|---|---|---|
| IDLE | turquoise | ARMED_BUY | log arm_ts |
| IDLE | yellow | ARMED_SELL | log arm_ts (sell) |
| IDLE | dark-green / dark-red / light-green / light-red / gray | IDLE | noise (log phase-skip if light-green/light-red) |
| ARMED_BUY | gray | ARMED_BUY | persist |
| ARMED_BUY | turquoise | ARMED_BUY | refresh arm_ts |
| ARMED_BUY | dark-green | PRIMED_BUY | log prime_ts |
| ARMED_BUY | yellow | ARMED_SELL | opposite rearm |
| ARMED_BUY | dark-red | ARMED_BUY | ignore (minority noise) |
| ARMED_BUY | light-green | IDLE | skip detected — no FIRE, log phase_skip |
| ARMED_BUY | light-red | IDLE | skip detected, log |
| PRIMED_BUY | dark-green | PRIMED_BUY | accumulate |
| PRIMED_BUY | dark-red | PRIMED_BUY | ignore (minority noise) |
| PRIMED_BUY | light-green | IDLE | FIRE BUY, lockout(BUY)=4min |
| PRIMED_BUY | light-red | IDLE | skip detected (wrong trigger) |
| PRIMED_BUY | gray | IDLE | COOLED — signal dead, log |
| PRIMED_BUY | turquoise | ARMED_BUY | rearm fresh |
| PRIMED_BUY | yellow | ARMED_SELL | opposite rearm |
SELL side mirrors exactly: swap turquoise↔yellow, dark-green↔dark-red, light-green↔light-red, BUY↔SELL.
Notes:
- No time-based TTL on ARMED/PRIMED. State persists until trigger fires, cooled by gray after PRIMED, opposite-color rearm, or process restart (Windows Task Scheduler stops bot at session end → natural session-boundary reset).
- Cooling rule: "gray after dark-green" = signal racit (user's term). Gray during ARMED_BUY (before any dark-green) is OK.
- After FIRE: 4-minute lockout per-direction. BUY lockout doesn't block SELL and vice versa. Single timestamp per direction.
- Opposite-color-Phase-1 triggers rearm to opposite side (captures direction flip).
- Phase-skip (arming color → trigger color with no phase-2 step) → IDLE, no FIRE, logged. Would be legitimate only if indicator collapses phases, which it doesn't per observed behavior.
Detection Details
- Loop interval: 5 seconds (36 cycles per 3-min bar; stays well inside notification-latency target).
- Rightmost-dot detection: scan ROI from right edge leftward, find first non-background pixel cluster → that's the rightmost dot. Don't hardcode x-pixel positions (chart scrolls; hardcoded positions drift).
- Debounce: configurable
debounce_depthin config.toml (default1— single-read acceptance). Increase if future sessions show mid-bar color flicker. Screenshot-in-notification is the user's visual verification on top. - Rolling window: keep last 20 classified dots with their detection timestamps. State machine consumes the newest accepted (post-debounce) dot per cycle.
- Classification: nearest-color match in RGB Euclidean distance, per-color tolerance from calibration. Report confidence =
1 - distance_nearest / distance_second_nearest. Log confidence every cycle. If all distances > tolerance →UNKNOWN, state unchanged.
Levels Extraction (two-phase, simplified)
Phase A — at trigger (immediate alert to Discord + Telegram):
- No entry-price compute. No spec-math SL/TP. User places a manual 0.6% SL in TradeLocker at entry; actual TP1/TP2/SL come in Phase B from the chart.
- Notification:
🟢 BUY signal DIA→US30 | 22:47:03+ annotated screenshot (detected dot highlighted).
Phase B — after user trades (chart-scan confirmation):
- After Phase A fires, detector keeps watching the chart ROI for horizontal colored lines (red=SL, green=TP1/TP2).
- When lines appear (user has entered trade in TradeLocker and TradeStation drew them) → scan y-pixels via Hough + color mask, convert via y-axis calibration → send second alert to both channels:
✅ Levels: SL=484.35 | TP1=485.20 | TP2=485.88. - If chart-line scan times out (no lines in 10 min) → silent (user didn't trade).
- If only 2 lines detected (user didn't set TP2 or line not rendered yet) → partial-result alert.
- Phase B overlap with next signal: guarded by per-direction lockout + Phase-B completion flag; a new FIRE cannot issue until prior Phase B closes (timeout or success).
Dedup / Lockout
- Time-based lockout: after any FIRE, block re-fire for 4 minutes (one 3-min bar + 1 min safety).
- Tracked per-direction: BUY lockout doesn't block SELL.
- Stored as single timestamp per direction (not pixel-keyed).
Observability
- Heartbeat: every 30 min to a separate Discord thread (not main alerts channel):
🟢 22:00 alive | 0 triggers | confidence avg 0.85 | chart OK. Silence >35 min = watchdog concern (user notices). - Layout canary: every 60 cycles (5 min), hash a stable reference region (axis labels, chart border). Stored baseline in config. On significant divergence (>threshold) →
⚠️ Layout changed — auto-paused, recalibrateto alerts channel. Bot pauses detection until operator acknowledges (touch a pause-file or restart). - Low-confidence alert: 3+ consecutive cycles with confidence below threshold →
⚠️ Bot lost sight(already in original plan). - Window-lost alert: TradeStation window not found for 60s →
⚠️ Cannot find chart. - Audit JSONL: per-cycle, daily rotation (
logs/YYYY-MM-DD.jsonl), fields:{ts, window_found, roi_ok, rightmost_dot_color, confidence, state, transition, trigger, notified, reason}.
Files to Create
/workspace/atm/pyproject.toml— Python 3.11+ required. Deps:mss,opencv-python,numpy,requests,pygetwindow,pywin32(DPI + window capture),rich(CLI),pillow(screenshot annotation). Notomli— use stdlibtomllib./workspace/atm/config.toml— populated by calibration tool (ROI coords, per-color RGB + tolerance,debounce_depth, y-axis scale, canary-region baseline hash, Discord webhook URL, Telegram bot token + chat_id)/workspace/atm/src/atm/config.py— [ENG-REVIEW]@dataclass ConfigwithConfig.load(path)that validates on load (RGB tuples, positive tolerances, both notifier credentials present, y-axis 2-point pair). Fail fast at startup./workspace/atm/src/atm/vision.py— [ENG-REVIEW] shared primitives: ROI crop, perceptual hash, pixel-to-price linear interp, Hough line detection with color mask. Used by detector/canary/levels to avoid drift./workspace/atm/src/atm/detector.py— screenshot loop, rightmost-dot scan, color classification, rolling window, debounce/workspace/atm/src/atm/state_machine.py— explicit phased state machine (spec above), exhaustive transition table/workspace/atm/src/atm/levels.py— Phase B chart-scan only (Phase A entry-price compute removed after ENG-REVIEW)/workspace/atm/src/atm/canary.py— layout fingerprint hash + drift check + auto-pause/workspace/atm/src/atm/notifier/__init__.py— abstractNotifierprotocol:send_alert(),send_heartbeat(),send_levels_confirm()/workspace/atm/src/atm/notifier/fanout.py— [ENG-REVIEW]FanoutNotifierwraps N backends, each with its own worker thread + bounded queue (size 50, drop-oldest on overflow) + retry with exponential backoff + dead-letter file on total failure. Main loop never blocks./workspace/atm/src/atm/notifier/discord.py— webhook POST, annotated screenshot upload (multipart)/workspace/atm/src/atm/notifier/telegram.py— [ENG-REVIEW] built in parallel with Discord (no longer deferred); bot API, photo upload/workspace/atm/src/atm/audit.py— JSONL logger with daily local-midnight rotation, line-buffered write for crash safety/workspace/atm/src/atm/calibrate.py— Tkinter: window pick → DPI check → ROI corners → per-color sample → y-axis scale → canary region → save versioned config/workspace/atm/src/atm/labeler.py— [EXPANSION] Tkinter label UI →labels.json/workspace/atm/src/atm/dryrun.py— replay with precision/recall/confusion matrix when labels present/workspace/atm/src/atm/journal.py— [EXPANSION]atm journalCLI →trades.jsonl/workspace/atm/src/atm/report.py— [EXPANSION] weekly aggregation/workspace/atm/src/atm/main.py— CLI:atm calibrate,atm label <dir>,atm dryrun <dir>,atm run [--duration Xh],atm journal,atm report [--week YYYY-WW]/workspace/atm/tests/— [ENG-REVIEW] unit + E2E per test plan at~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md/workspace/atm/samples/,/workspace/atm/logs//workspace/atm/configs/— versioned config archive. [ENG-REVIEW] No symlink (Windows admin-required); useconfigs/current.txtmarker file storing the active filename.Config.load()reads the marker./workspace/atm/docs/phase2-prop-firm-audit.md— structured TOS checklist/workspace/atm/README.md— setup, calibration workflow, per-session operating checklist, DPI/multi-monitor notes
Build Order
pyproject.toml+ package scaffold — Python 3.11+,pip install -e .,atm --helpworks.- Standalone screenshot-dump script —
msstimer dumps tosamples/every 5s during trading sessions. Build corpus in parallel. config.py+vision.py— Config dataclass with validation; shared vision primitives. Ship with unit tests for config load + pixel-to-price interp.calibrate.py— versioned config inconfigs/YYYY-MM-DD-HHMM.toml;configs/current.txtmarker file points at active. DPI check + canary region capture.labeler.py— once ~30 samples exist, tag them.labels.jsonis ground truth.state_machine.py+ unit tests (clean BUY, clean SELL, cooling, opposite-rearm, lockout per-direction, noise, phase-skip, all state×color pairs via parameterized test).detector.py+ unit tests (empty/background ROI, rightmost-cluster, rolling window FIFO, debounce depth=1, classification edges including UNKNOWN).canary.py+ unit tests (drift threshold, pause-file gating).levels.py(Phase B only) + unit tests (Hough line detection with color mask, 2 vs 3 lines, 10-min timeout, pixel-to-price roundtrip).notifier/fanout.py+discord.py+telegram.py+ unit tests (queue overflow drop-oldest, 429 backoff, dead-letter on total failure, fanout: one backend down still delivers). Both channels built in parallel — fire together from day 1.audit.py+ unit tests (daily rotation at local midnight, line-buffered flush crash safety).dryrun.py— replay onsamples/againstlabels.json. Acceptance gate before live: precision = 100%, recall ≥ 95%.- E2E replay test — feed
samples/through detector → state_machine → notifier-mock → in-memory audit; assert labels match FIREs. journal.py,report.py,main.py(unified CLI).- Windows Task Scheduler setup — 16:30→18:30, 21:00→23:00.
atm run --duration 2h. Manual DST check twice yearly. docs/phase2-prop-firm-audit.md— TOS checklist template.
Existing Utilities to Reuse
Greenfield Python project. No internal utilities. External libs: mss (screenshot), pygetwindow (window locate), opencv-python (line detection in Phase B), numpy (color math), requests (Discord webhook), tomli (config parsing), pillow (annotated screenshots).
Verification
End-to-end, in build order:
- State machine unit tests:
pytest tests/test_state_machine.py— all scenarios (clean BUY, clean SELL, cooling, rearm, lockout, noise) pass. - Calibration:
atm calibrate→ step through →config.tomlpopulated with plausible RGBs for described colors + y-axis scale sane + canary region picked. - Labeled corpus: ≥30 screenshots in
samples/,atm label ./samplestags each. - Dry-run with metrics:
atm dryrun ./samples→ precision + recall + confusion matrix printed. Acceptance gate: precision = 100%, recall ≥ 95%. If not met → tune tolerances, re-run. - Live test notification-only (2 sessions):
atm run. Verify:- Discord + Telegram notifications within 5s of trigger, both channels receive.
- Phase A message: direction + timestamp + annotated screenshot.
- Phase B levels-alert fires once TradeStation draws SL/TP lines; correct SL/TP1/TP2 prices.
- Heartbeat messages every 30 min in thread.
- Audit JSONL complete, state transitions visible.
- Kill one notifier (e.g. wrong token) → other still delivers, dead-letter file for failed one.
- Canary test: manually move TradeStation window during session → layout-changed alert within 5 min. Move back → restart bot → resumes.
- Scheduler test: Windows Task Scheduler starts bot at 16:30, stops at 18:30 cleanly, log rotates at midnight.
- Journal test: after real trade,
atm journal→ prompt flow complete →trades.jsonlentry present. - Report test: after 1 week of live use,
atm report --week 2026-16→ precision per color, slippage distribution, P&L summary.
Risk Register
- Prop firm TOS (Faza 2 blocker): read TOS using
docs/phase2-prop-firm-audit.mdchecklist before any auto-execution work. If EA/automation prohibited → Faza 2 dead, stay on Faza 1 permanently. - TradeStation layout change: canary catches it within 5 min → auto-pause. Recalibrate. Losing a session to a layout change is acceptable cost.
- Calibration drift over time: versioned configs in
configs/let you roll back to last-known-good if new calibration misfires. - DIA↔US30 price divergence: accepted (user's judgment). Phase 1 journal captures slippage per signal, feeding Faza 2 go/no-go.
- Screen sharing / RDP during trading: overlay can break classification. Low prob, documented in README as operator hygiene.
- Windows Task Scheduler DST transitions: twice per year, schedule may misfire. Manual check first week of each DST change.
Out of Scope (Faza 1)
- Any automated click in TradeLocker (Faza 2 work)
- Multi-symbol concurrent monitoring (single chart at a time; user switches manually between DIA and GLD)
- Backtesting on historical data (strategy already manually validated)
- Web UI / dashboard (headless + Discord/Telegram only)
- Ack feedback loop (react-on-notification labeling) — deferred to TODOS.md as
P2-ack-loop: shipping baseline first, adding feedback once detection quality verified - Telegram notifier — built only after Discord is stable 5+ sessions
Accepted Expansions (CEO review, SELECTIVE mode)
- ✅ Labeled sample corpus + dry-run metrics —
labeler.py,labels.json, automated precision/recall in dryrun. Makes acceptance criteria ("false-positives = 0, false-negatives ≤ 5%") machine-checkable. - ✅ Level-extractor fallback (spec-math) — Phase A always uses spec-math; Phase B validates against chart. Redundancy on fragile piece.
- ✅ Layout canary + auto-pause —
canary.pyhashes stable UI region, auto-pauses on drift. Catches silent classification-with-wrong-positions failure mode. - ✅ Trade journal CLI —
atm journal+trades.jsonl+ weekly report. Data for Faza 2 go/no-go decision. - ✅ Prop-firm TOS audit checklist —
docs/phase2-prop-firm-audit.md. Structured Faza 2 evaluation framework shipped now.
Deferred to TODOS.md
- Ack feedback loop — Discord reaction emojis feeding precision tuning. High value, operationally heavier (bot vs webhook). Add after Faza 1 baseline stable.
GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|---|---|---|---|---|---|
| CEO Review | /plan-ceo-review |
Scope & strategy | 1 | CLEAR (SELECTIVE EXPANSION) | 6 proposals, 5 accepted, 1 deferred; 2 arch corrections |
| Codex Review | /codex review |
Independent 2nd opinion | 0 | — | — |
| Eng Review | /plan-eng-review |
Architecture & tests (required) | 1 | CLEAR (FULL_REVIEW) | 9 issues found, 0 critical gaps; 4 decisions made, 0 unresolved |
| Design Review | /plan-design-review |
UI/UX gaps | 0 | — | SKIPPED (no UI scope — CLI + Discord/Telegram) |
| DX Review | /plan-devex-review |
Developer experience gaps | 0 | — | SKIPPED (personal tool, single user) |
UNRESOLVED: 0
ENG REVIEW DECISIONS:
- Bar flicker → debounce depth=1 (configurable), rely on screenshot-in-notification for visual verification.
- Phase A entry price → dropped. User places manual 0.6% SL in TradeLocker at entry. Phase A = direction + screenshot only. Phase B = real SL/TP1/TP2 from chart.
- Notifier blocking → fire-and-forget worker threads per backend, bounded queue (size 50, drop-oldest), retry w/ backoff, dead-letter on total failure.
- Alert SPoF → Discord + Telegram built in parallel from day 1, both fire together.
ENG REVIEW OBVIOUS FIXES (stated, no decision):
- Exhaustive state transition table (all state×color pairs, default-noise rule, SELL mirror explicit).
- Python 3.11+ pin, drop
tomlidep, use stdlibtomllib. - Windows symlink →
configs/current.txtmarker file. - Shared
vision.pymodule (ROI, hash, interp, Hough). @dataclass Configwith fail-fast load-time validation.- DPI check + multi-monitor note in calibrate + README.
ENG REVIEW TEST SCOPE (accepted: FULL): unit tests for every module (state_machine, detector, levels Phase B, canary, audit, notifier fanout/retry, calibrate roundtrip, config validate) + 1 E2E replay harness asserting labeled-corpus precision/recall. Test plan artifact: ~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md.
VERDICT: CEO + ENG CLEARED — ready to implement. Run /ship after implementation. No further reviews required before build.