Files
clawd/antfarm/docs/design-story-loop.md
Echo dc64d18224 fix: convert antfarm from broken submodule to regular directory
Fixes Gitea 500 error caused by invalid submodule reference.
Converted antfarm from pseudo-submodule (missing .gitmodules) to
regular directory with all source files.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 16:03:37 +00:00

32 KiB

Design: Story-Based Execution (Ralph-Style Decomposition)

Status: Approved, ready for implementation. Date: 2026-02-08 Approved by: Ryan Carson

Problem

Today, Antfarm's feature-dev workflow hands the entire task to a developer agent in one shot. For non-trivial features, this fails because:

  1. Context window limits — large tasks exhaust the agent's context before completion
  2. No incremental progress — if the agent fails partway, everything is lost
  3. No checkpoint/resume — can't pick up where we left off
  4. Monolithic commits — one giant change vs. small, reviewable increments

Ralph (github.com/snarktank/ralph) solves this by breaking work into small user stories and spawning a fresh session per story. Each story is scoped to fit in one context window. We adopt this pattern.

Design Decisions (Final)

Decision Choice Notes
Planner model Same as other agents (Opus 4.6) No special model
Cross-session memory File-based (progress.txt, AGENTS.md, MEMORY.md) Not DB-only
Verification cadence Verify after EACH story Review only at the end
Failure handling Verify/review failures pass back to developer Existing retry mechanism
Progress archiving Archive progress.txt at run completion Keep history accessible
Cron frequency 5 minutes (down from 15) Configurable per-workflow
Max stories 20 per run Planner enforces this
Progress sharing Inject via template variable {{progress}} Other agents don't read the file directly

Architecture

Pipeline Flow

[planner] → [developer ⟳ verify] → [test] → [pr] → [review]
                    ↑______|
                (loop per story)
  1. Plan — Planner reads the task + codebase, produces ordered user stories
  2. Implement + Verify loop — For each story: a. Developer implements the story (fresh session) b. Verifier checks it (fresh session) c. If verify fails → back to developer for that story d. If verify passes → next story
  3. Test — Full test suite after all stories complete
  4. PR — Developer creates pull request
  5. Review — Reviewer checks the PR; if changes needed → back to developer

Database Changes

New table: stories

CREATE TABLE IF NOT EXISTS stories (
  id TEXT PRIMARY KEY,
  run_id TEXT NOT NULL REFERENCES runs(id),
  story_index INTEGER NOT NULL,
  story_id TEXT NOT NULL,           -- e.g. "US-001"
  title TEXT NOT NULL,
  description TEXT NOT NULL,
  acceptance_criteria TEXT NOT NULL, -- JSON array of strings
  status TEXT NOT NULL DEFAULT 'pending',  -- pending | running | done | failed
  output TEXT,
  retry_count INTEGER DEFAULT 0,
  max_retries INTEGER DEFAULT 2,
  created_at TEXT NOT NULL,
  updated_at TEXT NOT NULL
);

Altered table: steps

Add two columns (with defaults for backwards compat):

-- type: 'single' (default, current behavior) or 'loop'
ALTER TABLE steps ADD COLUMN type TEXT NOT NULL DEFAULT 'single';
-- loop_config: JSON blob, nullable. Only set when type='loop'.
ALTER TABLE steps ADD COLUMN loop_config TEXT;
-- current_story_id: tracks which story a loop step is currently working on
ALTER TABLE steps ADD COLUMN current_story_id TEXT;

Since we use node:sqlite (DatabaseSync), the migration approach is: check if columns exist, add if missing. Same pattern as existing migrate() in db.ts.


Type Changes (types.ts)

// Add to existing types:

export type LoopConfig = {
  over: "stories";
  completion: "all_done";
  freshSession?: boolean;     // default true
  verifyEach?: boolean;       // default false
  verifyStep?: string;        // step id to run after each iteration
};

export type WorkflowStep = {
  id: string;
  agent: string;
  type?: "single" | "loop";   // NEW, default "single"
  loop?: LoopConfig;           // NEW, only when type="loop"
  input: string;
  expects: string;
  max_retries?: number;
  on_fail?: WorkflowStepFailure;
};

export type Story = {
  id: string;
  runId: string;
  storyIndex: number;
  storyId: string;        // "US-001"
  title: string;
  description: string;
  acceptanceCriteria: string[];
  status: "pending" | "running" | "done" | "failed";
  output?: string;
  retryCount: number;
  maxRetries: number;
};

Step Operations Changes (step-ops.ts)

This is the core of the implementation. All loop logic lives here — agents don't know they're in a loop.

claimStep(agentId) — Updated

1. Find pending step for this agent (existing logic)
2. If step.type === 'loop':
   a. Parse loop_config JSON
   b. If loop_config.over === 'stories':
      - Query stories table: next story with status='pending' for this run
      - If no pending story found:
        * Mark step as 'done'
        * Advance pipeline (existing logic)
        * Return { found: false }
      - Claim the story: set story status='running', set step.current_story_id
      - Build extra template vars:
        * {{current_story}} — formatted story block (id, title, desc, acceptance criteria)
        * {{current_story_id}} — "US-001"
        * {{current_story_title}} — "Add status field"
        * {{completed_stories}} — summary of done stories
        * {{stories_remaining}} — count of pending stories
        * {{verify_feedback}} — from run context (set by verifier on failure)
        * {{progress}} — contents of progress.txt from developer workspace
      - Merge extra vars into context, resolve template, return
3. If step.type === 'single': existing logic unchanged

completeStep(stepId, output) — Updated

1. Existing: save output, merge KEY:VALUE pairs into context
2. NEW — Detect STORIES_JSON in output:
   - Find the line starting with "STORIES_JSON:" 
   - Everything after that prefix (possibly multi-line) is JSON
   - Parse the array and INSERT into stories table
   - Each story gets: run_id from the step, sequential story_index, status='pending'
3. If step is a loop step (type='loop'):
   a. Mark current story as 'done', save output to story
   b. Clear step.current_story_id
   c. Check loop_config.verify_each:
      - If true: set the verify step (by loop_config.verify_step) to 'pending'
        * Also save {{changes}} etc. in run context so verifier can see them
        * The loop step stays 'running' (not 'pending' yet — waiting for verify)
      - If false: check for more pending stories
        * More stories → set step back to 'pending' (next poll picks up next story)
        * No more stories → mark step 'done', advance pipeline
4. If step is a single step: existing advance logic

Verify step completion (new behavior for verify-each)

When the verify step completes and it was triggered by a loop step's verify_each:

1. If verify STATUS=done:
   - Check if more pending stories remain
   - If yes: set the loop step back to 'pending' (developer picks up next story)
   - If no: mark loop step 'done', advance pipeline past verify to next step
   - Clear verify_feedback from context
2. If verify STATUS=retry (failure):
   - Set the current story back to 'pending'
   - Store verify ISSUES in context as {{verify_feedback}}
   - Set the loop step back to 'pending' (developer retries the story)
   - Increment story retry_count
   - If story retry_count >= max_retries: fail the story, fail the step, fail the run

How to detect "this verify completion was triggered by verify_each":

  • Check if the verify step's run has a loop step with verify_each: true and verify_step matching the current step's step_id
  • Or: add a triggered_by field to the step record when setting it to pending

Recommendation: add a triggered_by_loop TEXT column to steps table (nullable). When verify-each sets the verify step to pending, it writes the loop step's ID here. On verify completion, check this field.

Actually simpler: just check if there's a loop step in this run with verify_step pointing to this step's step_id and the loop step is in 'running' status. No extra column needed.

failStep(stepId, error) — Updated

1. If step is a loop step:
   a. Fail the current story (increment retry_count)
   b. If story retries remain: story → 'pending', step stays 'pending'
   c. If story retries exhausted: story → 'failed', step → 'failed', run → 'failed'
2. If step is a single step: existing logic

New: getStories(runId)

function getStories(runId: string): Story[] {
  // Return all stories for a run, ordered by story_index
}

New: getCurrentStory(stepId)

function getCurrentStory(stepId: string): Story | null {
  // Get the story currently being worked on by a loop step
  // Uses step.current_story_id
}

Run Creation Changes (run.ts)

When inserting steps, persist the new fields:

const stepType = step.type ?? "single";
const loopConfig = step.loop ? JSON.stringify(step.loop) : null;
// Add to INSERT: type, loop_config columns

Workflow Spec Changes (workflow-spec.ts)

Parsing

Read type and loop from YAML step definitions. Validate:

  • If type: loop, loop must be present
  • loop.over must be "stories" (only supported value for now)
  • loop.completion must be "all_done"
  • If loop.verifyEach, loop.verifyStep must reference a valid step id
  • The referenced verify step must exist in the steps list

YAML field mapping

# In workflow.yml
type: loop              → step.type = "loop"
loop:
  over: stories         → loopConfig.over = "stories"
  completion: all_done  → loopConfig.completion = "all_done"
  verify_each: true     → loopConfig.verifyEach = true
  verify_step: verify   → loopConfig.verifyStep = "verify"

Note: YAML uses snake_case, TypeScript uses camelCase. Convert during parsing.


Agent Cron Changes (agent-cron.ts)

Frequency

Change EVERY_MS from 900_000 (15 min) to 300_000 (5 min).

Make it configurable per-workflow:

# In workflow.yml (optional)
cron:
  interval_ms: 300000   # 5 minutes

If not specified, default to 300_000.

Prompt

No changes needed to the agent cron prompt. The step claim / step complete / step fail CLI commands handle all the loop logic server-side. The agent doesn't know it's in a loop — it just claims work, does it, reports completion. Same prompt works for single and loop steps.


CLI Changes (cli.ts)

New command: antfarm step stories <run-id>

Lists all stories for a run:

$ antfarm step stories abc123
US-001 [done]    Add status field to database
US-002 [done]    Display status badge on task cards  
US-003 [running] Add status toggle to task list rows
US-004 [pending] Filter tasks by status

Updated: antfarm workflow status

Include story progress in status output when stories exist.


Cross-Session Memory: File-Based

Where files live

Developer agent workspace: /Users/scout/.openclaw/workspaces/workflows/feature-dev/agents/developer/

This directory contains: AGENTS.md, SOUL.md, IDENTITY.md, TOOLS.md, USER.md, HEARTBEAT.md

We add: progress.txt (created by the developer agent on first story), MEMORY.md (optional, created if agent finds it useful), archive/ (created on run completion).

progress.txt

Created by: Developer agent during first story implementation. Location: Developer agent workspace directory. Lifecycle: Created fresh per run. Archived on run completion.

Format:

# Progress Log
Run: <run-id>
Task: <task description>
Started: <timestamp>

## Codebase Patterns
- Pattern 1 discovered during implementation
- Pattern 2
(consolidated reusable patterns — updated by developer after each story)

---

## <timestamp> - US-001: <title>
- What was implemented
- Files changed
- **Learnings:** What was discovered about the codebase
---

## <timestamp> - US-002: <title>
- What was implemented  
- Files changed
- **Learnings:** ...
---

How other agents access progress

The developer agent writes progress.txt to its own workspace. Other agents (verifier, tester) need to see it.

Solution: When claimStep() resolves template variables for any step in a run that has stories, it reads the developer workspace's progress.txt and injects its contents as {{progress}}. This way the verifier/tester prompt can include:

input: |
  ...
  PROGRESS LOG:
  {{progress}}

The claimStep() function needs to know the developer workspace path. It can derive this from:

  • The loop step's agent_id → workflow agent config → workspace path
  • Or: store the developer workspace path in run context during planning

Recommendation: The planner step outputs REPO: /path/to/repo. The developer's workspace path is deterministic from the workflow config. Have claimStep() look up the workspace path from the agent config for the loop step's agent.

Actually simpler: the developer agent writes progress.txt in its workspace. The workspace path is known from the OpenClaw config (agents.list[].workspace). Add a helper getAgentWorkspace(agentId) that reads the config and returns the path.

Even simpler: store the progress.txt path in run context. When the loop step first claims a story, set context.progress_file = "<workspace>/progress.txt". Then claimStep() reads that file for {{progress}}.

Final approach: Add a resolveProgressFile(runId) helper that:

  1. Finds the loop step for this run
  2. Gets its agent_id
  3. Looks up that agent's workspace from the OpenClaw config
  4. Returns <workspace>/progress.txt

Then in claimStep() for any step (not just loops), if the run has stories, inject {{progress}} by reading that file.

AGENTS.md updates

Developer agent updates its own AGENTS.md with structural codebase knowledge. This persists across runs. Guidance for what to add:

  • Project stack/framework info
  • How to run tests
  • Key file locations and patterns
  • Gotchas and non-obvious dependencies

These go in a ## Codebase Knowledge section that the agent appends to.

MEMORY.md

Optional. If the developer agent creates one, OpenClaw auto-loads it on each session. Could be used for longer-term memory across multiple runs. Not required for the loop mechanism to work.

Archiving

When a run completes (final step done → run status = 'completed'):

The completeStep() function, after marking a run as completed, should trigger archiving:

  1. Find the developer workspace for this run's workflow
  2. If progress.txt exists:
    • Create archive/<run-id>/
    • Copy progress.txtarchive/<run-id>/progress.txt
    • Truncate progress.txt (or delete it — next run creates a fresh one)

This can be a separate function archiveRunProgress(runId) called from completeStep() when runCompleted: true.


New Agent: Planner

Files to create

workflows/feature-dev/agents/planner/AGENTS.md
workflows/feature-dev/agents/planner/SOUL.md
workflows/feature-dev/agents/planner/IDENTITY.md

AGENTS.md (Planner)

Should contain:

  • Role: decompose tasks into user stories
  • Story sizing rules (must fit in one context window)
  • Ordering rules (dependencies first: schema → backend → frontend)
  • Acceptance criteria rules (must be verifiable, always include "Typecheck passes")
  • Output format (STATUS, REPO, BRANCH, STORIES_JSON)
  • Max 20 stories rule
  • Examples of well-sized vs too-big stories
  • Instructions to explore the codebase before decomposing

Key content to borrow from Ralph's PRD skill (/tmp/ralph/skills/ralph/SKILL.md):

  • Story sizing section ("Right-sized stories" vs "Too big")
  • Acceptance criteria section ("Must Be Verifiable")
  • Story ordering section ("Dependencies First")

SOUL.md (Planner)

Analytical, thorough. Takes time to understand the codebase before decomposing. Not a coder — a planner. Thinks in terms of dependencies, risk, and incremental delivery.

IDENTITY.md (Planner)

# Identity
Name: Planner
Role: Decomposes tasks into user stories

Updated Agent: Developer

AGENTS.md changes

Add sections:

## Story-Based Execution

You work on ONE user story per session. A fresh session is started for each story.

### Each Session

1. Read `progress.txt` — especially the Codebase Patterns section at the top
2. Check the branch, pull latest
3. Implement the story described in your task input
4. Run quality checks
5. Commit: `feat: <story-id> - <story-title>`
6. Append to progress.txt (see format below)
7. Update Codebase Patterns in progress.txt if you found reusable patterns
8. Update AGENTS.md if you learned something structural about the codebase

### progress.txt Format

Append this after completing a story:

## <date/time> - <story-id>: <title>
- What was implemented
- Files changed
- **Learnings:** codebase patterns, gotchas, useful context
---

### Codebase Patterns

If you discover a reusable pattern, add it to the `## Codebase Patterns` section at the TOP of progress.txt. Only add patterns that are general and reusable, not story-specific.

### AGENTS.md Updates

If you discover something structural (not story-specific), add it to your AGENTS.md:
- Project stack/framework
- How to run tests
- Key file locations
- Dependencies between modules
- Gotchas

Updated Agent: Verifier

AGENTS.md changes

Update to reflect per-story verification:

## Per-Story Verification

You verify ONE story at a time, immediately after the developer completes it.

### What to Check

1. Code exists and is not just TODOs or placeholders
2. Each acceptance criterion for the story is met
3. No obvious incomplete work
4. Typecheck passes
5. If the story has "Verify in browser" criterion, do that

### Context Available

- The story details (in your task input)
- What the developer changed (in your task input)
- The progress log (in your task input as {{progress}})
- The actual code (in the repo on the branch)

### Output

Pass: STATUS: done + VERIFIED: what you confirmed
Fail: STATUS: retry + ISSUES: what's missing/broken (this goes back to the developer)

Updated Workflow YAML

The full workflow.yml for feature-dev v4 is in the Architecture section above. Key changes from v3:

  1. Added planner agent and plan step
  2. Changed implement step to type: loop with verify_each: true
  3. Added {{progress}} injection to verify/test steps
  4. Max stories: 20 (in planner instructions)
  5. Removed the pr step's TESTS: context dependency (tester output goes to context, PR reads progress.txt)

SKILL.md Updates

Update ~/.openclaw/skills/antfarm-workflows/SKILL.md:

  • Document the new pipeline (plan → implement loop → test → pr → review)
  • Note that the planner handles decomposition automatically
  • Update the example interaction
  • Add antfarm step stories <run-id> to CLI reference
  • Note 5-minute cron cycles
  • Update "Manually Triggering Agents" to mention the planner

Dashboard Changes

Stories panel

On the run detail view, add a stories section showing:

  • Each story with status (pending/running/done/failed)
  • Story title and acceptance criteria
  • Retry count
  • Output snippet (collapsible)

API endpoints

  • GET /api/runs/:id/stories — returns stories for a run

Implementation Tasks

All tasks are in the antfarm repo: ~/.openclaw/workspace/antfarm/

Phase 1: Core Engine (do these first, in order)

  • T1: DB migrationsrc/db.ts

    • Add stories table (schema above)
    • Add type, loop_config, current_story_id columns to steps table
    • Use ALTER TABLE with existence checks for backwards compat
    • Test: existing counter-test workflow still works after migration
  • T2: Typessrc/installer/types.ts

    • Add LoopConfig type
    • Add Story type
    • Update WorkflowStep with optional type and loop fields
    • No runtime impact, just type definitions
  • T3: Workflow spec parsingsrc/installer/workflow-spec.ts

    • Parse type and loop fields from YAML
    • Convert snake_case YAML to camelCase TypeScript (verify_each → verifyEach, etc.)
    • Validate: if type=loop, loop config must be present and valid
    • Validate: verify_step must reference existing step id
    • Test: parse the new feature-dev v4 workflow.yml successfully
  • T4: Run creationsrc/installer/run.ts

    • Persist type and loop_config when inserting steps
    • Add type and loop_config to the INSERT statement
    • Test: create a run with the new workflow, verify steps have correct type/loop_config in DB
  • T5: Step operations — story parsingsrc/installer/step-ops.ts

    • In completeStep(): detect STORIES_JSON: in output
    • Parse the JSON array (handle multi-line — everything from STORIES_JSON: to end of output, or to next KEY: line)
    • Insert parsed stories into stories table
    • Test: complete a plan step with STORIES_JSON output, verify stories appear in DB
  • T6: Step operations — loop claimsrc/installer/step-ops.ts

    • In claimStep(): when step.type='loop', find next pending story
    • Mark story as 'running', set step.current_story_id
    • Build dynamic template vars (current_story, completed_stories, stories_remaining, etc.)
    • Read progress.txt from developer workspace and inject as {{progress}}
    • If no pending stories, mark step done and advance
    • Helper: getAgentWorkspacePath(agentId) — reads OpenClaw config to find workspace
    • Helper: formatStoryForTemplate(story) — formats story as readable text block
    • Helper: formatCompletedStories(stories) — formats done stories as summary
    • Test: claim a loop step, verify correct story is returned with resolved template
  • T7: Step operations — loop completesrc/installer/step-ops.ts

    • In completeStep() for loop steps: mark story done (not step)
    • Save output to story record
    • If verify_each: set verify step to 'pending', loop step stays 'running'
    • If not verify_each: check for more stories, set step pending or done
    • Test: complete a loop step iteration, verify story marked done and step stays pending
  • T8: Step operations — verify-each flowsrc/installer/step-ops.ts

    • In completeStep() for verify step: detect if triggered by verify-each
    • Detection: check if run has a loop step with verifyStep matching this step's step_id and loop step status='running'
    • On verify pass: set loop step to 'pending' (next story), or 'done' if no more stories
    • On verify fail (STATUS: retry): set story back to 'pending', store ISSUES as verify_feedback in context, set loop step to 'pending', increment story retry_count
    • If story retries exhausted: fail story, fail step, fail run
    • Test: full mini-loop — dev completes → verify passes → dev gets next story. Dev completes → verify fails → dev retries same story.
  • T9: Step operations — loop failsrc/installer/step-ops.ts

    • In failStep() for loop steps: fail current story, not step
    • Per-story retry logic
    • Test: fail a story, verify retry. Exhaust retries, verify run fails.

Phase 2: Agent Files

  • T10: Planner agent filesworkflows/feature-dev/agents/planner/

    • Create AGENTS.md with decomposition instructions (borrow from Ralph's PRD skill for story sizing, ordering, acceptance criteria guidance)
    • Create SOUL.md — analytical, thorough planner personality
    • Create IDENTITY.md — name and role
    • Reference: /tmp/ralph/skills/ralph/SKILL.md for story sizing rules (clone ralph if needed: gh repo clone snarktank/ralph /tmp/ralph)
  • T11: Developer agent AGENTS.md updateworkflows/feature-dev/agents/developer/AGENTS.md

    • Add "Story-Based Execution" section
    • Document progress.txt format and when to write to it
    • Document Codebase Patterns section maintenance
    • Document when to update AGENTS.md (structural knowledge only)
  • T12: Verifier agent AGENTS.md updateworkflows/feature-dev/agents/verifier/AGENTS.md

    • Update for per-story verification model
    • Document what to check per story
    • Document pass/fail output format
  • T13: Workflow YAMLworkflows/feature-dev/workflow.yml

    • Bump to version 4
    • Add planner agent definition
    • Add plan step
    • Change implement step to type: loop with verify_each
    • Update all step input templates
    • Add {{progress}} to verify/test/tester inputs

Phase 3: Infrastructure

  • T14: Cron frequencysrc/installer/agent-cron.ts

    • Change EVERY_MS from 900_000 to 300_000 (5 min)
    • Make configurable: read cron.interval_ms from workflow.yml if present
    • Pass interval to setupAgentCrons()
  • T15: Progress archivingsrc/installer/step-ops.ts (or new file)

    • New function: archiveRunProgress(runId)
    • Called from completeStep() when run completes
    • Finds developer workspace, creates archive//, copies progress.txt, truncates original
    • Needs getAgentWorkspacePath() helper (same as T6)
  • T16: CLI — stories commandsrc/cli/cli.ts

    • Add antfarm step stories <run-id> command
    • Pretty-print stories with status, title, retry count
    • Also update antfarm workflow status to show story progress
  • T17: Dashboard — stories viewsrc/server/dashboard.ts + src/server/index.html

    • Add /api/runs/:id/stories endpoint
    • Add stories panel to run detail in the HTML
    • Show status, title, acceptance criteria, output
  • T18: SKILL.md update~/.openclaw/skills/antfarm-workflows/SKILL.md

    • Document new pipeline
    • Update CLI reference
    • Update example interaction
    • Note 5-min cron cycles

Phase 4: Install & Test

  • T19: Reinstall workflow

    • Run antfarm workflow uninstall feature-dev then antfarm workflow install feature-dev
    • Verify: new planner agent appears in OpenClaw config
    • Verify: cron jobs recreated at 5-min intervals
    • Verify: counter-test still works (backwards compat)
  • T20: Build

    • Run npm run build (or tsc)
    • Fix any type errors
  • T21: End-to-end test

    • Run a real feature-dev workflow with a small task
    • Verify: planner produces stories, developer loops through them, verifier checks each one
    • Verify: progress.txt is created and appended to
    • Verify: archiving works on completion
    • Check dashboard shows stories
  • T22: Commit and push

    • Commit all changes with clear message
    • Push to main

Key Files Reference

For the implementor — here's every file you'll touch and where it is:

File Path What to do
DB migration src/db.ts Add stories table, alter steps table
Types src/installer/types.ts Add LoopConfig, Story, update WorkflowStep
Step operations src/installer/step-ops.ts Loop claim/complete/fail, story parsing, verify-each
Run creation src/installer/run.ts Persist type/loop_config on step insert
Workflow spec src/installer/workflow-spec.ts Parse type/loop from YAML, validate
CLI src/cli/cli.ts Add step stories command
Agent cron src/installer/agent-cron.ts Change to 5min, make configurable
Dashboard server src/server/dashboard.ts Add stories API endpoint
Dashboard HTML src/server/index.html Add stories panel
Planner AGENTS.md workflows/feature-dev/agents/planner/AGENTS.md Create (new file)
Planner SOUL.md workflows/feature-dev/agents/planner/SOUL.md Create (new file)
Planner IDENTITY.md workflows/feature-dev/agents/planner/IDENTITY.md Create (new file)
Developer AGENTS.md workflows/feature-dev/agents/developer/AGENTS.md Add story-based execution section
Verifier AGENTS.md workflows/feature-dev/agents/verifier/AGENTS.md Update for per-story verification
Workflow YAML workflows/feature-dev/workflow.yml v4 with planner + loop steps
Antfarm skill (installed at ~/.openclaw/skills/antfarm-workflows/SKILL.md) Update docs

STORIES_JSON Parsing Details

The planner outputs stories as a JSON array after STORIES_JSON:. This needs careful parsing because the agent output has KEY: VALUE lines mixed with the JSON.

Parsing algorithm

1. Split output into lines
2. Find the line starting with "STORIES_JSON:"
3. Take everything after "STORIES_JSON:" on that line, plus all subsequent lines
   until we hit a line that matches /^[A-Z_]+:/ (next KEY: line) or end of output
4. Join those lines and JSON.parse()
5. Validate: must be an array, each element must have id, title, description, acceptanceCriteria

Edge cases

  • STORIES_JSON might be on one line (small stories list) or many lines
  • The JSON might contain colons (which look like KEY: VALUE lines) — only break on lines matching ^[A-Z_]+:\s at the start
  • Handle JSON parse failures gracefully — fail the step with a clear error

Validation

  • Max 20 stories (reject if more)
  • Each story must have: id (string), title (string), description (string), acceptanceCriteria (string[])
  • Story IDs should be unique within the run
  • acceptanceCriteria must be non-empty array

Verify-Each State Machine

Detailed state transitions for the implement→verify mini-loop:

INITIAL STATE (after planner completes):
  implement step: pending
  verify step: waiting
  stories: US-001=pending, US-002=pending, US-003=pending

DEVELOPER CLAIMS (claimStep for developer agent):
  implement step: running, current_story_id=US-001
  US-001: running

DEVELOPER COMPLETES (completeStep for implement):
  implement step: running (stays running, waiting for verify)
  verify step: pending
  US-001: done (output saved)
  context: { changes: "...", verify_feedback: "" }

VERIFIER CLAIMS (claimStep for verifier agent):
  verify step: running

VERIFIER PASSES (completeStep for verify, STATUS=done):
  verify step: waiting (reset for next story)
  implement step: pending (ready for next story)
  US-001: done (confirmed)

DEVELOPER CLAIMS NEXT (claimStep for developer agent):
  implement step: running, current_story_id=US-002
  US-002: running

... (repeat until all stories done) ...

LAST STORY VERIFIED:
  verify step: done
  implement step: done
  → advance to test step

--- FAILURE PATH ---

VERIFIER FAILS (completeStep for verify, STATUS=retry):
  verify step: waiting (reset)
  implement step: pending (developer retries)
  US-001: pending (retry_count incremented)
  context: { verify_feedback: "ISSUES: ..." }

Note: the verify step transitions between waiting and pending/running during the loop. After the loop completes, it should be marked done (even though it was never "pending→running→done" in a linear sense). The step ran N times successfully. Mark it done when the loop step completes.


Progress.txt Path Resolution

Helper function needed in step-ops.ts:

function resolveProgressFilePath(runId: string): string | null {
  // 1. Find the loop step for this run
  const loopStep = db.prepare(
    "SELECT agent_id FROM steps WHERE run_id = ? AND type = 'loop' LIMIT 1"
  ).get(runId);
  if (!loopStep) return null;
  
  // 2. Get the agent's workspace path from OpenClaw config
  const workspace = getAgentWorkspacePath(loopStep.agent_id);
  if (!workspace) return null;
  
  // 3. Return progress.txt path
  return path.join(workspace, "progress.txt");
}

function getAgentWorkspacePath(agentId: string): string | null {
  // Read ~/.openclaw/openclaw.json
  // Find agent in agents.list by id
  // Return workspace path
  const configPath = path.join(os.homedir(), ".openclaw", "openclaw.json");
  const config = JSON.parse(fs.readFileSync(configPath, "utf-8"));
  const agent = config.agents?.list?.find((a: any) => a.id === agentId);
  return agent?.workspace ?? null;
}

function readProgressFile(runId: string): string {
  const filePath = resolveProgressFilePath(runId);
  if (!filePath) return "(no progress file)";
  try {
    return fs.readFileSync(filePath, "utf-8");
  } catch {
    return "(no progress yet)";
  }
}

This is used by claimStep() to inject {{progress}} into any step's template.