Files

Claude Agent 9207197a56 initial: scaffold atm trading monitor (Faza 1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-15 22:03:36 +00:00

20 KiB

Raw Permalink Blame History

Plan: ATM — Automated Trading Monitor (M2D, Faza 1) — ENG-REVIEWED

Source plan: /home/claude/.claude/plans/swirling-drifting-starfish.md CEO plan artifact: ~/.gstack/projects/romfast-workspace/ceo-plans/2026-04-15-atm-trading.md Eng review mode: FULL_REVIEW (4 decisions made, 0 unresolved) Design doc: ~/.gstack/projects/romfast-workspace/claude-master-design-20260415-atm-trading.md (APPROVED) Eng test plan: ~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md

Context

User trades M2D strategy manually on DIA (TradeStation) with execution on TradeLocker US30 CFD (prop firm). Same strategy on GLD → XAUUSD. 4h/evening dual-screen monitoring. Faza 1 goal: bot auto-detects M2D trigger, sends Discord/Telegram notification with screenshot + SL/TP1/TP2 levels; user executes manually in TradeLocker. Faza 2 (auto-execution) deferred until prop firm TOS verified and Faza 1 proven over 20+ sessions.

Review changed two things from the original plan:

State machine spec corrected. Original "last 3 consecutive non-gray dots" is wrong. Actual M2D is phased: Phase 1 arming (turquoise → gray/dark-green) → Phase 2 trigger (light-green).
Levels extraction corrected. Original plan had levels.py extracting SL/TP at trigger. But those lines only appear on TradeStation chart after user enters trade in TradeLocker. Corrected to two-phase: spec-math at trigger, chart-scan after entry.

Plus 5 accepted expansions (labeled corpus, level fallback, layout canary, trade journal, TOS checklist).

Approach: B (Structured Python service, dry-run, audit log) + CEO-reviewed additions

Runs on Windows machine alongside TradeStation. mss screenshots → ROI color-sample on M2D MAPS strip → phased state machine → Discord webhook + Telegram bot → JSONL audit + trade journal → dry-run replay against labeled corpus.

State Machine Spec (corrected + exhaustive)

States:

IDLE
ARMED_BUY — turquoise seen
PRIMED_BUY — turquoise + at least one dark-green seen
ARMED_SELL — yellow seen
PRIMED_SELL — yellow + at least one dark-red seen

Default rule: any (state, event) pair not listed below → stay in current state, no action, log as noise.

Transitions — BUY side:

From	Event	To	Action
IDLE	turquoise	ARMED_BUY	log arm_ts
IDLE	yellow	ARMED_SELL	log arm_ts (sell)
IDLE	dark-green / dark-red / light-green / light-red / gray	IDLE	noise (log phase-skip if light-green/light-red)
ARMED_BUY	gray	ARMED_BUY	persist
ARMED_BUY	turquoise	ARMED_BUY	refresh arm_ts
ARMED_BUY	dark-green	PRIMED_BUY	log prime_ts
ARMED_BUY	yellow	ARMED_SELL	opposite rearm
ARMED_BUY	dark-red	ARMED_BUY	ignore (minority noise)
ARMED_BUY	light-green	IDLE	skip detected — no FIRE, log phase_skip
ARMED_BUY	light-red	IDLE	skip detected, log
PRIMED_BUY	dark-green	PRIMED_BUY	accumulate
PRIMED_BUY	dark-red	PRIMED_BUY	ignore (minority noise)
PRIMED_BUY	light-green	IDLE	FIRE BUY, lockout(BUY)=4min
PRIMED_BUY	light-red	IDLE	skip detected (wrong trigger)
PRIMED_BUY	gray	IDLE	COOLED — signal dead, log
PRIMED_BUY	turquoise	ARMED_BUY	rearm fresh
PRIMED_BUY	yellow	ARMED_SELL	opposite rearm

SELL side mirrors exactly: swap turquoise↔yellow, dark-green↔dark-red, light-green↔light-red, BUY↔SELL.

Notes:

No time-based TTL on ARMED/PRIMED. State persists until trigger fires, cooled by gray after PRIMED, opposite-color rearm, or process restart (Windows Task Scheduler stops bot at session end → natural session-boundary reset).
Cooling rule: "gray after dark-green" = signal racit (user's term). Gray during ARMED_BUY (before any dark-green) is OK.
After FIRE: 4-minute lockout per-direction. BUY lockout doesn't block SELL and vice versa. Single timestamp per direction.
Opposite-color-Phase-1 triggers rearm to opposite side (captures direction flip).
Phase-skip (arming color → trigger color with no phase-2 step) → IDLE, no FIRE, logged. Would be legitimate only if indicator collapses phases, which it doesn't per observed behavior.

Detection Details

Loop interval: 5 seconds (36 cycles per 3-min bar; stays well inside notification-latency target).
Rightmost-dot detection: scan ROI from right edge leftward, find first non-background pixel cluster → that's the rightmost dot. Don't hardcode x-pixel positions (chart scrolls; hardcoded positions drift).
Debounce: configurable debounce_depth in config.toml (default 1 — single-read acceptance). Increase if future sessions show mid-bar color flicker. Screenshot-in-notification is the user's visual verification on top.
Rolling window: keep last 20 classified dots with their detection timestamps. State machine consumes the newest accepted (post-debounce) dot per cycle.
Classification: nearest-color match in RGB Euclidean distance, per-color tolerance from calibration. Report confidence = 1 - distance_nearest / distance_second_nearest. Log confidence every cycle. If all distances > tolerance → UNKNOWN, state unchanged.

Levels Extraction (two-phase, simplified)

Phase A — at trigger (immediate alert to Discord + Telegram):

No entry-price compute. No spec-math SL/TP. User places a manual 0.6% SL in TradeLocker at entry; actual TP1/TP2/SL come in Phase B from the chart.
Notification: 🟢 BUY signal DIA→US30 | 22:47:03 + annotated screenshot (detected dot highlighted).

Phase B — after user trades (chart-scan confirmation):

After Phase A fires, detector keeps watching the chart ROI for horizontal colored lines (red=SL, green=TP1/TP2).
When lines appear (user has entered trade in TradeLocker and TradeStation drew them) → scan y-pixels via Hough + color mask, convert via y-axis calibration → send second alert to both channels: ✅ Levels: SL=484.35 | TP1=485.20 | TP2=485.88.
If chart-line scan times out (no lines in 10 min) → silent (user didn't trade).
If only 2 lines detected (user didn't set TP2 or line not rendered yet) → partial-result alert.
Phase B overlap with next signal: guarded by per-direction lockout + Phase-B completion flag; a new FIRE cannot issue until prior Phase B closes (timeout or success).

Dedup / Lockout

Time-based lockout: after any FIRE, block re-fire for 4 minutes (one 3-min bar + 1 min safety).
Tracked per-direction: BUY lockout doesn't block SELL.
Stored as single timestamp per direction (not pixel-keyed).

Observability

Heartbeat: every 30 min to a separate Discord thread (not main alerts channel): 🟢 22:00 alive | 0 triggers | confidence avg 0.85 | chart OK. Silence >35 min = watchdog concern (user notices).
Layout canary: every 60 cycles (5 min), hash a stable reference region (axis labels, chart border). Stored baseline in config. On significant divergence (>threshold) → ⚠️ Layout changed — auto-paused, recalibrate to alerts channel. Bot pauses detection until operator acknowledges (touch a pause-file or restart).
Low-confidence alert: 3+ consecutive cycles with confidence below threshold → ⚠️ Bot lost sight (already in original plan).
Window-lost alert: TradeStation window not found for 60s → ⚠️ Cannot find chart.
Audit JSONL: per-cycle, daily rotation (logs/YYYY-MM-DD.jsonl), fields: {ts, window_found, roi_ok, rightmost_dot_color, confidence, state, transition, trigger, notified, reason}.

Files to Create

/workspace/atm/pyproject.toml — Python 3.11+ required. Deps: mss, opencv-python, numpy, requests, pygetwindow, pywin32 (DPI + window capture), rich (CLI), pillow (screenshot annotation). No tomli — use stdlib tomllib.
/workspace/atm/config.toml — populated by calibration tool (ROI coords, per-color RGB + tolerance, debounce_depth, y-axis scale, canary-region baseline hash, Discord webhook URL, Telegram bot token + chat_id)
/workspace/atm/src/atm/config.py — [ENG-REVIEW] @dataclass Config with Config.load(path) that validates on load (RGB tuples, positive tolerances, both notifier credentials present, y-axis 2-point pair). Fail fast at startup.
/workspace/atm/src/atm/vision.py — [ENG-REVIEW] shared primitives: ROI crop, perceptual hash, pixel-to-price linear interp, Hough line detection with color mask. Used by detector/canary/levels to avoid drift.
/workspace/atm/src/atm/detector.py — screenshot loop, rightmost-dot scan, color classification, rolling window, debounce
/workspace/atm/src/atm/state_machine.py — explicit phased state machine (spec above), exhaustive transition table
/workspace/atm/src/atm/levels.py — Phase B chart-scan only (Phase A entry-price compute removed after ENG-REVIEW)
/workspace/atm/src/atm/canary.py — layout fingerprint hash + drift check + auto-pause
/workspace/atm/src/atm/notifier/__init__.py — abstract Notifier protocol: send_alert(), send_heartbeat(), send_levels_confirm()
/workspace/atm/src/atm/notifier/fanout.py — [ENG-REVIEW] FanoutNotifier wraps N backends, each with its own worker thread + bounded queue (size 50, drop-oldest on overflow) + retry with exponential backoff + dead-letter file on total failure. Main loop never blocks.
/workspace/atm/src/atm/notifier/discord.py — webhook POST, annotated screenshot upload (multipart)
/workspace/atm/src/atm/notifier/telegram.py — [ENG-REVIEW] built in parallel with Discord (no longer deferred); bot API, photo upload
/workspace/atm/src/atm/audit.py — JSONL logger with daily local-midnight rotation, line-buffered write for crash safety
/workspace/atm/src/atm/calibrate.py — Tkinter: window pick → DPI check → ROI corners → per-color sample → y-axis scale → canary region → save versioned config
/workspace/atm/src/atm/labeler.py — [EXPANSION] Tkinter label UI → labels.json
/workspace/atm/src/atm/dryrun.py — replay with precision/recall/confusion matrix when labels present
/workspace/atm/src/atm/journal.py — [EXPANSION] atm journal CLI → trades.jsonl
/workspace/atm/src/atm/report.py — [EXPANSION] weekly aggregation
/workspace/atm/src/atm/main.py — CLI: atm calibrate, atm label <dir>, atm dryrun <dir>, atm run [--duration Xh], atm journal, atm report [--week YYYY-WW]
/workspace/atm/tests/ — [ENG-REVIEW] unit + E2E per test plan at ~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md
/workspace/atm/samples/, /workspace/atm/logs/
/workspace/atm/configs/ — versioned config archive. [ENG-REVIEW] No symlink (Windows admin-required); use configs/current.txt marker file storing the active filename. Config.load() reads the marker.
/workspace/atm/docs/phase2-prop-firm-audit.md — structured TOS checklist
/workspace/atm/README.md — setup, calibration workflow, per-session operating checklist, DPI/multi-monitor notes

Build Order

pyproject.toml + package scaffold — Python 3.11+, pip install -e ., atm --help works.
Standalone screenshot-dump script — mss timer dumps to samples/ every 5s during trading sessions. Build corpus in parallel.
config.py + vision.py — Config dataclass with validation; shared vision primitives. Ship with unit tests for config load + pixel-to-price interp.
calibrate.py — versioned config in configs/YYYY-MM-DD-HHMM.toml; configs/current.txt marker file points at active. DPI check + canary region capture.
labeler.py — once ~30 samples exist, tag them. labels.json is ground truth.
state_machine.py + unit tests (clean BUY, clean SELL, cooling, opposite-rearm, lockout per-direction, noise, phase-skip, all state×color pairs via parameterized test).
detector.py + unit tests (empty/background ROI, rightmost-cluster, rolling window FIFO, debounce depth=1, classification edges including UNKNOWN).
canary.py + unit tests (drift threshold, pause-file gating).
levels.py (Phase B only) + unit tests (Hough line detection with color mask, 2 vs 3 lines, 10-min timeout, pixel-to-price roundtrip).
notifier/fanout.py + discord.py + telegram.py + unit tests (queue overflow drop-oldest, 429 backoff, dead-letter on total failure, fanout: one backend down still delivers). Both channels built in parallel — fire together from day 1.
audit.py + unit tests (daily rotation at local midnight, line-buffered flush crash safety).
dryrun.py — replay on samples/ against labels.json. Acceptance gate before live: precision = 100%, recall ≥ 95%.
E2E replay test — feed samples/ through detector → state_machine → notifier-mock → in-memory audit; assert labels match FIREs.
journal.py, report.py, main.py (unified CLI).
Windows Task Scheduler setup — 16:30→18:30, 21:00→23:00. atm run --duration 2h. Manual DST check twice yearly.
docs/phase2-prop-firm-audit.md — TOS checklist template.

Existing Utilities to Reuse

Greenfield Python project. No internal utilities. External libs: mss (screenshot), pygetwindow (window locate), opencv-python (line detection in Phase B), numpy (color math), requests (Discord webhook), tomli (config parsing), pillow (annotated screenshots).

Verification

End-to-end, in build order:

State machine unit tests: pytest tests/test_state_machine.py — all scenarios (clean BUY, clean SELL, cooling, rearm, lockout, noise) pass.
Calibration: atm calibrate → step through → config.toml populated with plausible RGBs for described colors + y-axis scale sane + canary region picked.
Labeled corpus: ≥30 screenshots in samples/, atm label ./samples tags each.
Dry-run with metrics: atm dryrun ./samples → precision + recall + confusion matrix printed. Acceptance gate: precision = 100%, recall ≥ 95%. If not met → tune tolerances, re-run.
Live test notification-only (2 sessions): atm run. Verify:
- Discord + Telegram notifications within 5s of trigger, both channels receive.
- Phase A message: direction + timestamp + annotated screenshot.
- Phase B levels-alert fires once TradeStation draws SL/TP lines; correct SL/TP1/TP2 prices.
- Heartbeat messages every 30 min in thread.
- Audit JSONL complete, state transitions visible.
- Kill one notifier (e.g. wrong token) → other still delivers, dead-letter file for failed one.
Canary test: manually move TradeStation window during session → layout-changed alert within 5 min. Move back → restart bot → resumes.
Scheduler test: Windows Task Scheduler starts bot at 16:30, stops at 18:30 cleanly, log rotates at midnight.
Journal test: after real trade, atm journal → prompt flow complete → trades.jsonl entry present.
Report test: after 1 week of live use, atm report --week 2026-16 → precision per color, slippage distribution, P&L summary.

Risk Register

Prop firm TOS (Faza 2 blocker): read TOS using docs/phase2-prop-firm-audit.md checklist before any auto-execution work. If EA/automation prohibited → Faza 2 dead, stay on Faza 1 permanently.
TradeStation layout change: canary catches it within 5 min → auto-pause. Recalibrate. Losing a session to a layout change is acceptable cost.
Calibration drift over time: versioned configs in configs/ let you roll back to last-known-good if new calibration misfires.
DIA↔US30 price divergence: accepted (user's judgment). Phase 1 journal captures slippage per signal, feeding Faza 2 go/no-go.
Screen sharing / RDP during trading: overlay can break classification. Low prob, documented in README as operator hygiene.
Windows Task Scheduler DST transitions: twice per year, schedule may misfire. Manual check first week of each DST change.

Out of Scope (Faza 1)

Any automated click in TradeLocker (Faza 2 work)
Multi-symbol concurrent monitoring (single chart at a time; user switches manually between DIA and GLD)
Backtesting on historical data (strategy already manually validated)
Web UI / dashboard (headless + Discord/Telegram only)
Ack feedback loop (react-on-notification labeling) — deferred to TODOS.md as P2-ack-loop: shipping baseline first, adding feedback once detection quality verified
Telegram notifier — built only after Discord is stable 5+ sessions

Accepted Expansions (CEO review, SELECTIVE mode)

✅ Labeled sample corpus + dry-run metrics — labeler.py, labels.json, automated precision/recall in dryrun. Makes acceptance criteria ("false-positives = 0, false-negatives ≤ 5%") machine-checkable.
✅ Level-extractor fallback (spec-math) — Phase A always uses spec-math; Phase B validates against chart. Redundancy on fragile piece.
✅ Layout canary + auto-pause — canary.py hashes stable UI region, auto-pauses on drift. Catches silent classification-with-wrong-positions failure mode.
✅ Trade journal CLI — atm journal + trades.jsonl + weekly report. Data for Faza 2 go/no-go decision.
✅ Prop-firm TOS audit checklist — docs/phase2-prop-firm-audit.md. Structured Faza 2 evaluation framework shipped now.

Deferred to TODOS.md

Ack feedback loop — Discord reaction emojis feeding precision tuning. High value, operationally heavier (bot vs webhook). Add after Faza 1 baseline stable.

GSTACK REVIEW REPORT

Review	Trigger	Why	Runs	Status	Findings
CEO Review	`/plan-ceo-review`	Scope & strategy	1	CLEAR (SELECTIVE EXPANSION)	6 proposals, 5 accepted, 1 deferred; 2 arch corrections
Codex Review	`/codex review`	Independent 2nd opinion	0	—	—
Eng Review	`/plan-eng-review`	Architecture & tests (required)	1	CLEAR (FULL_REVIEW)	9 issues found, 0 critical gaps; 4 decisions made, 0 unresolved
Design Review	`/plan-design-review`	UI/UX gaps	0	—	SKIPPED (no UI scope — CLI + Discord/Telegram)
DX Review	`/plan-devex-review`	Developer experience gaps	0	—	SKIPPED (personal tool, single user)

UNRESOLVED: 0

ENG REVIEW DECISIONS:

Bar flicker → debounce depth=1 (configurable), rely on screenshot-in-notification for visual verification.
Phase A entry price → dropped. User places manual 0.6% SL in TradeLocker at entry. Phase A = direction + screenshot only. Phase B = real SL/TP1/TP2 from chart.
Notifier blocking → fire-and-forget worker threads per backend, bounded queue (size 50, drop-oldest), retry w/ backoff, dead-letter on total failure.
Alert SPoF → Discord + Telegram built in parallel from day 1, both fire together.

ENG REVIEW OBVIOUS FIXES (stated, no decision):

Exhaustive state transition table (all state×color pairs, default-noise rule, SELL mirror explicit).
Python 3.11+ pin, drop tomli dep, use stdlib tomllib.
Windows symlink → configs/current.txt marker file.
Shared vision.py module (ROI, hash, interp, Hough).
@dataclass Config with fail-fast load-time validation.
DPI check + multi-monitor note in calibrate + README.

ENG REVIEW TEST SCOPE (accepted: FULL): unit tests for every module (state_machine, detector, levels Phase B, canary, audit, notifier fanout/retry, calibrate roundtrip, config validate) + 1 E2E replay harness asserting labeled-corpus precision/recall. Test plan artifact: ~/.gstack/projects/romfast-workspace/claude-master-eng-review-test-plan-20260415-212932.md.

VERDICT: CEO + ENG CLEARED — ready to implement. Run /ship after implementation. No further reviews required before build.

20 KiB Raw Permalink Blame History Unescape Escape