Files
atm/docs/partitioned-honking-unicorn.md
Claude Agent 9207197a56 initial: scaffold atm trading monitor (Faza 1)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:03:36 +00:00

20 KiB
Raw Permalink Blame History

Plan: ATM — Automated Trading Monitor (M2D, Faza 1) — ENG-REVIEWED

Source plan: /home/claude/.claude/plans/swirling-drifting-starfish.md CEO plan artifact: ~/.gstack/projects/romfast-workspace/ceo-plans/2026-04-15-atm-trading.md Eng review mode: FULL_REVIEW (4 decisions made, 0 unresolved) Design doc: ~/.gstack/projects/romfast-workspace/claude-master-design-20260415-atm-trading.md (APPROVED) Eng test plan: ~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md


Context

User trades M2D strategy manually on DIA (TradeStation) with execution on TradeLocker US30 CFD (prop firm). Same strategy on GLD → XAUUSD. 4h/evening dual-screen monitoring. Faza 1 goal: bot auto-detects M2D trigger, sends Discord/Telegram notification with screenshot + SL/TP1/TP2 levels; user executes manually in TradeLocker. Faza 2 (auto-execution) deferred until prop firm TOS verified and Faza 1 proven over 20+ sessions.

Review changed two things from the original plan:

  1. State machine spec corrected. Original "last 3 consecutive non-gray dots" is wrong. Actual M2D is phased: Phase 1 arming (turquoise → gray/dark-green) → Phase 2 trigger (light-green).
  2. Levels extraction corrected. Original plan had levels.py extracting SL/TP at trigger. But those lines only appear on TradeStation chart after user enters trade in TradeLocker. Corrected to two-phase: spec-math at trigger, chart-scan after entry.

Plus 5 accepted expansions (labeled corpus, level fallback, layout canary, trade journal, TOS checklist).


Approach: B (Structured Python service, dry-run, audit log) + CEO-reviewed additions

Runs on Windows machine alongside TradeStation. mss screenshots → ROI color-sample on M2D MAPS strip → phased state machine → Discord webhook + Telegram bot → JSONL audit + trade journal → dry-run replay against labeled corpus.


State Machine Spec (corrected + exhaustive)

States:

  • IDLE
  • ARMED_BUY — turquoise seen
  • PRIMED_BUY — turquoise + at least one dark-green seen
  • ARMED_SELL — yellow seen
  • PRIMED_SELL — yellow + at least one dark-red seen

Default rule: any (state, event) pair not listed below → stay in current state, no action, log as noise.

Transitions — BUY side:

From Event To Action
IDLE turquoise ARMED_BUY log arm_ts
IDLE yellow ARMED_SELL log arm_ts (sell)
IDLE dark-green / dark-red / light-green / light-red / gray IDLE noise (log phase-skip if light-green/light-red)
ARMED_BUY gray ARMED_BUY persist
ARMED_BUY turquoise ARMED_BUY refresh arm_ts
ARMED_BUY dark-green PRIMED_BUY log prime_ts
ARMED_BUY yellow ARMED_SELL opposite rearm
ARMED_BUY dark-red ARMED_BUY ignore (minority noise)
ARMED_BUY light-green IDLE skip detected — no FIRE, log phase_skip
ARMED_BUY light-red IDLE skip detected, log
PRIMED_BUY dark-green PRIMED_BUY accumulate
PRIMED_BUY dark-red PRIMED_BUY ignore (minority noise)
PRIMED_BUY light-green IDLE FIRE BUY, lockout(BUY)=4min
PRIMED_BUY light-red IDLE skip detected (wrong trigger)
PRIMED_BUY gray IDLE COOLED — signal dead, log
PRIMED_BUY turquoise ARMED_BUY rearm fresh
PRIMED_BUY yellow ARMED_SELL opposite rearm

SELL side mirrors exactly: swap turquoise↔yellow, dark-green↔dark-red, light-green↔light-red, BUY↔SELL.

Notes:

  • No time-based TTL on ARMED/PRIMED. State persists until trigger fires, cooled by gray after PRIMED, opposite-color rearm, or process restart (Windows Task Scheduler stops bot at session end → natural session-boundary reset).
  • Cooling rule: "gray after dark-green" = signal racit (user's term). Gray during ARMED_BUY (before any dark-green) is OK.
  • After FIRE: 4-minute lockout per-direction. BUY lockout doesn't block SELL and vice versa. Single timestamp per direction.
  • Opposite-color-Phase-1 triggers rearm to opposite side (captures direction flip).
  • Phase-skip (arming color → trigger color with no phase-2 step) → IDLE, no FIRE, logged. Would be legitimate only if indicator collapses phases, which it doesn't per observed behavior.

Detection Details

  • Loop interval: 5 seconds (36 cycles per 3-min bar; stays well inside notification-latency target).
  • Rightmost-dot detection: scan ROI from right edge leftward, find first non-background pixel cluster → that's the rightmost dot. Don't hardcode x-pixel positions (chart scrolls; hardcoded positions drift).
  • Debounce: configurable debounce_depth in config.toml (default 1 — single-read acceptance). Increase if future sessions show mid-bar color flicker. Screenshot-in-notification is the user's visual verification on top.
  • Rolling window: keep last 20 classified dots with their detection timestamps. State machine consumes the newest accepted (post-debounce) dot per cycle.
  • Classification: nearest-color match in RGB Euclidean distance, per-color tolerance from calibration. Report confidence = 1 - distance_nearest / distance_second_nearest. Log confidence every cycle. If all distances > tolerance → UNKNOWN, state unchanged.

Levels Extraction (two-phase, simplified)

Phase A — at trigger (immediate alert to Discord + Telegram):

  • No entry-price compute. No spec-math SL/TP. User places a manual 0.6% SL in TradeLocker at entry; actual TP1/TP2/SL come in Phase B from the chart.
  • Notification: 🟢 BUY signal DIA→US30 | 22:47:03 + annotated screenshot (detected dot highlighted).

Phase B — after user trades (chart-scan confirmation):

  • After Phase A fires, detector keeps watching the chart ROI for horizontal colored lines (red=SL, green=TP1/TP2).
  • When lines appear (user has entered trade in TradeLocker and TradeStation drew them) → scan y-pixels via Hough + color mask, convert via y-axis calibration → send second alert to both channels: ✅ Levels: SL=484.35 | TP1=485.20 | TP2=485.88.
  • If chart-line scan times out (no lines in 10 min) → silent (user didn't trade).
  • If only 2 lines detected (user didn't set TP2 or line not rendered yet) → partial-result alert.
  • Phase B overlap with next signal: guarded by per-direction lockout + Phase-B completion flag; a new FIRE cannot issue until prior Phase B closes (timeout or success).

Dedup / Lockout

  • Time-based lockout: after any FIRE, block re-fire for 4 minutes (one 3-min bar + 1 min safety).
  • Tracked per-direction: BUY lockout doesn't block SELL.
  • Stored as single timestamp per direction (not pixel-keyed).

Observability

  • Heartbeat: every 30 min to a separate Discord thread (not main alerts channel): 🟢 22:00 alive | 0 triggers | confidence avg 0.85 | chart OK. Silence >35 min = watchdog concern (user notices).
  • Layout canary: every 60 cycles (5 min), hash a stable reference region (axis labels, chart border). Stored baseline in config. On significant divergence (>threshold) → ⚠️ Layout changed — auto-paused, recalibrate to alerts channel. Bot pauses detection until operator acknowledges (touch a pause-file or restart).
  • Low-confidence alert: 3+ consecutive cycles with confidence below threshold → ⚠️ Bot lost sight (already in original plan).
  • Window-lost alert: TradeStation window not found for 60s → ⚠️ Cannot find chart.
  • Audit JSONL: per-cycle, daily rotation (logs/YYYY-MM-DD.jsonl), fields: {ts, window_found, roi_ok, rightmost_dot_color, confidence, state, transition, trigger, notified, reason}.

Files to Create

  • /workspace/atm/pyproject.toml — Python 3.11+ required. Deps: mss, opencv-python, numpy, requests, pygetwindow, pywin32 (DPI + window capture), rich (CLI), pillow (screenshot annotation). No tomli — use stdlib tomllib.
  • /workspace/atm/config.toml — populated by calibration tool (ROI coords, per-color RGB + tolerance, debounce_depth, y-axis scale, canary-region baseline hash, Discord webhook URL, Telegram bot token + chat_id)
  • /workspace/atm/src/atm/config.py[ENG-REVIEW] @dataclass Config with Config.load(path) that validates on load (RGB tuples, positive tolerances, both notifier credentials present, y-axis 2-point pair). Fail fast at startup.
  • /workspace/atm/src/atm/vision.py[ENG-REVIEW] shared primitives: ROI crop, perceptual hash, pixel-to-price linear interp, Hough line detection with color mask. Used by detector/canary/levels to avoid drift.
  • /workspace/atm/src/atm/detector.py — screenshot loop, rightmost-dot scan, color classification, rolling window, debounce
  • /workspace/atm/src/atm/state_machine.py — explicit phased state machine (spec above), exhaustive transition table
  • /workspace/atm/src/atm/levels.py — Phase B chart-scan only (Phase A entry-price compute removed after ENG-REVIEW)
  • /workspace/atm/src/atm/canary.py — layout fingerprint hash + drift check + auto-pause
  • /workspace/atm/src/atm/notifier/__init__.py — abstract Notifier protocol: send_alert(), send_heartbeat(), send_levels_confirm()
  • /workspace/atm/src/atm/notifier/fanout.py[ENG-REVIEW] FanoutNotifier wraps N backends, each with its own worker thread + bounded queue (size 50, drop-oldest on overflow) + retry with exponential backoff + dead-letter file on total failure. Main loop never blocks.
  • /workspace/atm/src/atm/notifier/discord.py — webhook POST, annotated screenshot upload (multipart)
  • /workspace/atm/src/atm/notifier/telegram.py[ENG-REVIEW] built in parallel with Discord (no longer deferred); bot API, photo upload
  • /workspace/atm/src/atm/audit.py — JSONL logger with daily local-midnight rotation, line-buffered write for crash safety
  • /workspace/atm/src/atm/calibrate.py — Tkinter: window pick → DPI check → ROI corners → per-color sample → y-axis scale → canary region → save versioned config
  • /workspace/atm/src/atm/labeler.py[EXPANSION] Tkinter label UI → labels.json
  • /workspace/atm/src/atm/dryrun.py — replay with precision/recall/confusion matrix when labels present
  • /workspace/atm/src/atm/journal.py[EXPANSION] atm journal CLI → trades.jsonl
  • /workspace/atm/src/atm/report.py[EXPANSION] weekly aggregation
  • /workspace/atm/src/atm/main.py — CLI: atm calibrate, atm label <dir>, atm dryrun <dir>, atm run [--duration Xh], atm journal, atm report [--week YYYY-WW]
  • /workspace/atm/tests/[ENG-REVIEW] unit + E2E per test plan at ~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md
  • /workspace/atm/samples/, /workspace/atm/logs/
  • /workspace/atm/configs/ — versioned config archive. [ENG-REVIEW] No symlink (Windows admin-required); use configs/current.txt marker file storing the active filename. Config.load() reads the marker.
  • /workspace/atm/docs/phase2-prop-firm-audit.md — structured TOS checklist
  • /workspace/atm/README.md — setup, calibration workflow, per-session operating checklist, DPI/multi-monitor notes

Build Order

  1. pyproject.toml + package scaffold — Python 3.11+, pip install -e ., atm --help works.
  2. Standalone screenshot-dump scriptmss timer dumps to samples/ every 5s during trading sessions. Build corpus in parallel.
  3. config.py + vision.py — Config dataclass with validation; shared vision primitives. Ship with unit tests for config load + pixel-to-price interp.
  4. calibrate.py — versioned config in configs/YYYY-MM-DD-HHMM.toml; configs/current.txt marker file points at active. DPI check + canary region capture.
  5. labeler.py — once ~30 samples exist, tag them. labels.json is ground truth.
  6. state_machine.py + unit tests (clean BUY, clean SELL, cooling, opposite-rearm, lockout per-direction, noise, phase-skip, all state×color pairs via parameterized test).
  7. detector.py + unit tests (empty/background ROI, rightmost-cluster, rolling window FIFO, debounce depth=1, classification edges including UNKNOWN).
  8. canary.py + unit tests (drift threshold, pause-file gating).
  9. levels.py (Phase B only) + unit tests (Hough line detection with color mask, 2 vs 3 lines, 10-min timeout, pixel-to-price roundtrip).
  10. notifier/fanout.py + discord.py + telegram.py + unit tests (queue overflow drop-oldest, 429 backoff, dead-letter on total failure, fanout: one backend down still delivers). Both channels built in parallel — fire together from day 1.
  11. audit.py + unit tests (daily rotation at local midnight, line-buffered flush crash safety).
  12. dryrun.py — replay on samples/ against labels.json. Acceptance gate before live: precision = 100%, recall ≥ 95%.
  13. E2E replay test — feed samples/ through detector → state_machine → notifier-mock → in-memory audit; assert labels match FIREs.
  14. journal.py, report.py, main.py (unified CLI).
  15. Windows Task Scheduler setup — 16:30→18:30, 21:00→23:00. atm run --duration 2h. Manual DST check twice yearly.
  16. docs/phase2-prop-firm-audit.md — TOS checklist template.

Existing Utilities to Reuse

Greenfield Python project. No internal utilities. External libs: mss (screenshot), pygetwindow (window locate), opencv-python (line detection in Phase B), numpy (color math), requests (Discord webhook), tomli (config parsing), pillow (annotated screenshots).


Verification

End-to-end, in build order:

  1. State machine unit tests: pytest tests/test_state_machine.py — all scenarios (clean BUY, clean SELL, cooling, rearm, lockout, noise) pass.
  2. Calibration: atm calibrate → step through → config.toml populated with plausible RGBs for described colors + y-axis scale sane + canary region picked.
  3. Labeled corpus: ≥30 screenshots in samples/, atm label ./samples tags each.
  4. Dry-run with metrics: atm dryrun ./samples → precision + recall + confusion matrix printed. Acceptance gate: precision = 100%, recall ≥ 95%. If not met → tune tolerances, re-run.
  5. Live test notification-only (2 sessions): atm run. Verify:
    • Discord + Telegram notifications within 5s of trigger, both channels receive.
    • Phase A message: direction + timestamp + annotated screenshot.
    • Phase B levels-alert fires once TradeStation draws SL/TP lines; correct SL/TP1/TP2 prices.
    • Heartbeat messages every 30 min in thread.
    • Audit JSONL complete, state transitions visible.
    • Kill one notifier (e.g. wrong token) → other still delivers, dead-letter file for failed one.
  6. Canary test: manually move TradeStation window during session → layout-changed alert within 5 min. Move back → restart bot → resumes.
  7. Scheduler test: Windows Task Scheduler starts bot at 16:30, stops at 18:30 cleanly, log rotates at midnight.
  8. Journal test: after real trade, atm journal → prompt flow complete → trades.jsonl entry present.
  9. Report test: after 1 week of live use, atm report --week 2026-16 → precision per color, slippage distribution, P&L summary.

Risk Register

  • Prop firm TOS (Faza 2 blocker): read TOS using docs/phase2-prop-firm-audit.md checklist before any auto-execution work. If EA/automation prohibited → Faza 2 dead, stay on Faza 1 permanently.
  • TradeStation layout change: canary catches it within 5 min → auto-pause. Recalibrate. Losing a session to a layout change is acceptable cost.
  • Calibration drift over time: versioned configs in configs/ let you roll back to last-known-good if new calibration misfires.
  • DIA↔US30 price divergence: accepted (user's judgment). Phase 1 journal captures slippage per signal, feeding Faza 2 go/no-go.
  • Screen sharing / RDP during trading: overlay can break classification. Low prob, documented in README as operator hygiene.
  • Windows Task Scheduler DST transitions: twice per year, schedule may misfire. Manual check first week of each DST change.

Out of Scope (Faza 1)

  • Any automated click in TradeLocker (Faza 2 work)
  • Multi-symbol concurrent monitoring (single chart at a time; user switches manually between DIA and GLD)
  • Backtesting on historical data (strategy already manually validated)
  • Web UI / dashboard (headless + Discord/Telegram only)
  • Ack feedback loop (react-on-notification labeling) — deferred to TODOS.md as P2-ack-loop: shipping baseline first, adding feedback once detection quality verified
  • Telegram notifier — built only after Discord is stable 5+ sessions

Accepted Expansions (CEO review, SELECTIVE mode)

  1. Labeled sample corpus + dry-run metricslabeler.py, labels.json, automated precision/recall in dryrun. Makes acceptance criteria ("false-positives = 0, false-negatives ≤ 5%") machine-checkable.
  2. Level-extractor fallback (spec-math) — Phase A always uses spec-math; Phase B validates against chart. Redundancy on fragile piece.
  3. Layout canary + auto-pausecanary.py hashes stable UI region, auto-pauses on drift. Catches silent classification-with-wrong-positions failure mode.
  4. Trade journal CLIatm journal + trades.jsonl + weekly report. Data for Faza 2 go/no-go decision.
  5. Prop-firm TOS audit checklistdocs/phase2-prop-firm-audit.md. Structured Faza 2 evaluation framework shipped now.

Deferred to TODOS.md

  • Ack feedback loop — Discord reaction emojis feeding precision tuning. High value, operationally heavier (bot vs webhook). Add after Faza 1 baseline stable.

GSTACK REVIEW REPORT

Review Trigger Why Runs Status Findings
CEO Review /plan-ceo-review Scope & strategy 1 CLEAR (SELECTIVE EXPANSION) 6 proposals, 5 accepted, 1 deferred; 2 arch corrections
Codex Review /codex review Independent 2nd opinion 0
Eng Review /plan-eng-review Architecture & tests (required) 1 CLEAR (FULL_REVIEW) 9 issues found, 0 critical gaps; 4 decisions made, 0 unresolved
Design Review /plan-design-review UI/UX gaps 0 SKIPPED (no UI scope — CLI + Discord/Telegram)
DX Review /plan-devex-review Developer experience gaps 0 SKIPPED (personal tool, single user)

UNRESOLVED: 0

ENG REVIEW DECISIONS:

  1. Bar flicker → debounce depth=1 (configurable), rely on screenshot-in-notification for visual verification.
  2. Phase A entry price → dropped. User places manual 0.6% SL in TradeLocker at entry. Phase A = direction + screenshot only. Phase B = real SL/TP1/TP2 from chart.
  3. Notifier blocking → fire-and-forget worker threads per backend, bounded queue (size 50, drop-oldest), retry w/ backoff, dead-letter on total failure.
  4. Alert SPoF → Discord + Telegram built in parallel from day 1, both fire together.

ENG REVIEW OBVIOUS FIXES (stated, no decision):

  • Exhaustive state transition table (all state×color pairs, default-noise rule, SELL mirror explicit).
  • Python 3.11+ pin, drop tomli dep, use stdlib tomllib.
  • Windows symlink → configs/current.txt marker file.
  • Shared vision.py module (ROI, hash, interp, Hough).
  • @dataclass Config with fail-fast load-time validation.
  • DPI check + multi-monitor note in calibrate + README.

ENG REVIEW TEST SCOPE (accepted: FULL): unit tests for every module (state_machine, detector, levels Phase B, canary, audit, notifier fanout/retry, calibrate roundtrip, config validate) + 1 E2E replay harness asserting labeled-corpus precision/recall. Test plan artifact: ~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md.

VERDICT: CEO + ENG CLEARED — ready to implement. Run /ship after implementation. No further reviews required before build.