Skip to content

refactor(regen): split into resumable single-step invocations for local models#5650

Open
MarkusNeusinger wants to merge 12 commits intomainfrom
claude/simplify-regen-command-Qnnk1
Open

refactor(regen): split into resumable single-step invocations for local models#5650
MarkusNeusinger wants to merge 12 commits intomainfrom
claude/simplify-regen-command-Qnnk1

Conversation

@MarkusNeusinger
Copy link
Copy Markdown
Owner

Summary

Rewrite /regen as a resumable single-step command so it runs reliably on local models with limited context windows (e.g. Gemma 3 26B at 256k), where the previous parallel-agent flow blew past the budget.

Motivation

The previous /regen:

  • spawned N parallel general-purpose agents (one per library) via experimental agent teams,
  • pulled in the full ~730-line update.md library-agent prompt,
  • aggregated all agent reports + all shipping output in the lead's working context,
  • did everything (read → modify → score → worktree → PR) for all libraries in a single invocation.

That's fine on hosted Sonnet but consistently aborts when pointed at a local Ollama endpoint with smaller context.

New design

/regen is now idempotent and resumable. State lives in .regen-plan.md at repo root. Each invocation does exactly one thing, auto-detected from the state file:

State file Action
Missing Plan mode — pick oldest spec, write checklist, exit
Has unchecked - [ ] items Execute mode — do the next library, tick it off, exit
All - [x] / - [!] Done mode — archive plan to .regen-history/, exit

Per spec, expect ~10 invocations (1 plan + 1 per library + 1 finalize). The lead does the work directly — no agent teams, no parallel agents, no CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS flag.

Resilience

  • Each library invocation ships its own PR (with quality:{N} + ai-approved/quality-poor labels) before exiting, so a crash mid-flight loses at most the in-flight library — everything before is already a merged-or-pending PR.
  • Failed libraries are marked - [!] and skipped; the user can edit them back to - [ ] to retry.
  • cat .regen-plan.md shows progress; rm .regen-plan.md abandons the current plan.

Compatibility

  • No change to the downstream pipeline: PRs still get the same quality:{N} + ai-approved/quality-poor labels, and impl-merge.yml still handles squash-merge, GCS staging→production, and impl:{lib}:done issue labelling.
  • No change to the cloud-bypass property: impl-review.yml and impl-repair.yml are still never dispatched.
  • Model-agnostic — works identically with default Sonnet and any local OpenAI/Anthropic-compatible endpoint.

Files

  • agentic/commands/regen.md — full rewrite (~420 lines, but each invocation only exercises the relevant section).
  • .gitignore — add .regen-plan.md, .regen-preview/, .regen-history/ next to existing .update-preview/ and .worktrees/ entries.

Test Plan

  • Run /regen in a clean working tree → confirm .regen-plan.md is created with the oldest spec and the correct library list, and the command exits without touching any implementation files.
  • Run /regen again → confirm it picks the first unchecked library, ships a PR with quality:{N} + the right approval label, ticks off the box, and exits.
  • Repeat for all libraries → confirm each invocation's working context stays small enough for the local model.
  • After the last library, run /regen once more → confirm .regen-plan.md is moved to .regen-history/{spec_id}-{ts}.md and the user is told to re-run for the next spec.
  • Verify gh run list --workflow=impl-review.yml --limit 5 shows no new runs for the regen'd specs (cloud bypass intact).
  • Force a failure (e.g. revoke gsutil auth) → confirm the library is marked - [!], logged, worktree cleaned up, and subsequent invocations skip it.

https://claude.ai/code/session_01Tuiif6dcyttLrc1KoKCmqR


Generated by Claude Code

Replace the parallel agent-team flow with an idempotent state-file driven
loop so /regen runs reliably on local models with limited context (e.g.
Gemma 3 26B at 256k):

- Mode auto-detected from .regen-plan.md at repo root
  - missing → plan mode (pick oldest spec, write checklist, exit)
  - has unchecked items → execute mode (do one library, tick off, exit)
  - all checked → done mode (archive plan, exit)
- No agent teams, no parallel agents — lead does the work sequentially
- Each invocation processes exactly one library and ships its own PR with
  quality:{N} + ai-approved/quality-poor labels (impl-merge.yml unchanged)
- Restart-safe: kill Claude Code anytime, next /regen resumes from the
  state file

.gitignore: add .regen-plan.md, .regen-preview/, .regen-history/.
Copilot AI review requested due to automatic review settings May 2, 2026 18:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rewrites the /regen command into a resumable, single-step workflow aimed at keeping local-model context usage low while preserving the existing implementation-PR pipeline.

Changes:

  • Replaced the old parallel-agent /regen flow with a plan/execute/done state machine stored in .regen-plan.md.
  • Documented a per-library regeneration flow that generates, scores, stages, and ships one PR per invocation.
  • Added gitignore entries for regen state/history/preview artifacts.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File Description
agentic/commands/regen.md Rewrites the /regen command docs and operational flow around a resumable state file.
.gitignore Ignores regen-specific local artifacts and history files.

Comment thread agentic/commands/regen.md Outdated

1. Print final summary: counts of done / failed, list of PR URLs from the log.
2. Move `.regen-plan.md` to `.regen-history/{SPEC_ID}-{YYYYMMDD-HHMMSS}.md` (create directory if needed).
3. Tell the user: `Spec {SPEC_ID} complete. Run /regen to start the next oldest spec.`
Comment thread agentic/commands/regen.md Outdated
### 2j. Worktree → commit → push → PR

```bash
WORKTREE=".worktrees/{SPEC_ID}-{LIBRARY}"
…view)

Two related staleness bugs flagged by Copilot on #5650:

1. Plan mode picked the oldest spec from local metadata only. After a
   previous /regen merged its PRs remotely, re-running on the same
   checkout would re-pick the just-completed spec. Add step 1a that
   fetches origin/main and fast-forwards local main if checked out.

2. Step 2j based each new worktree on local `main`, which can lag the
   remote tip after impl-merge.yml lands prior libraries' PRs. Push
   rejection or merge conflicts result when two regen'd libraries touch
   shared spec files. Fetch origin/main and base the worktree on
   origin/main directly.

Renumber subsequent plan-mode steps (1b-1f).
Copilot AI review requested due to automatic review settings May 2, 2026 21:44
…ort workaround, GCS path)

The previous /regen flow produced a single plot.png and metadata with legacy
preview_url/preview_html, neither of which match what impl-generate.yml /
impl-merge.yml actually do. The merge step requires both plot-light.png AND
plot-dark.png and errors out otherwise; metadata uses preview_url_light /
preview_url_dark / preview_html_light / preview_html_dark; and the GCS
staging path is staging/{spec}/python/{lib}/ (note the python/ segment).

Also documents the self-import collision: implementation files share names
with their library (e.g. altair.py), so running them as scripts puts the
script's directory on sys.path and `import altair` resolves to the local
file, not the package. Sidestep by copying the impl into the preview dir
under run_impl.py and running that.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

Comment thread agentic/commands/regen.md Outdated
Comment on lines +58 to +60
If the user is on a non-`main` branch, that's fine — the candidate scan below works against `origin/main`-tracked
files as long as they're checked out, and step 2j explicitly bases each worktree on `origin/main` (not local
`main`), so staleness can't poison the PR base either.
Comment thread agentic/commands/regen.md Outdated
Comment on lines +214 to +221
PREVIEW_DIR="plots/{SPEC_ID}/implementations/python/.regen-preview/{LIBRARY}"
mkdir -p "$PREVIEW_DIR"
cp "plots/{SPEC_ID}/implementations/python/{LIBRARY}.py" "$PREVIEW_DIR/run_impl.py"

# Render both themes — required by impl-generate.yml / impl-merge.yml
for THEME in light dark; do
(cd "$PREVIEW_DIR" && MPLBACKEND=Agg ANYPLOT_THEME="$THEME" uv run python run_impl.py)
done
Comment thread agentic/commands/regen.md Outdated
Comment on lines +356 to +365
PR_URL=$(gh pr create \
--title "regen({SPEC_ID}): {LIBRARY}" \
--body "$(cat <<EOF
Locally regenerated **{LIBRARY}** for **{SPEC_ID}**.

Quality: {SCORE}/100
- VQ: {vq}/30 | DE: {de}/20 | SC: {sc}/15 | DQ: {dq}/15 | CQ: {cq}/10 | LM: {lm}/10

Generated by \`/regen\` (local model, no Cloud AI review dispatched).
EOF
MarkusNeusinger added a commit that referenced this pull request May 2, 2026
…0) (#5654)

## Summary

PR #5653 (regen of altair sparkline-basic) merged with `quality_score:
null` and a stale `review` block still referencing the old `#306998`
Python Blue palette and `pyplots.ai` title. The Cloud AI review pipeline
(`impl-review.yml`) is not dispatched on regen PRs (the regen flow sets
`ai-approved` directly), so this metadata never got populated
automatically.

This PR fills it in by locally evaluating both rendered themes against
`prompts/quality-criteria.md`.

## Score: 90/100 — APPROVED

| Category | Score |
|----------|-------|
| Visual Quality | 30/30 |
| Design Excellence | 13/20 |
| Spec Compliance | 15/15 |
| Data Quality | 14/15 |
| Code Quality | 10/10 |
| Library Mastery | 8/10 |

DE/LM ceilings reflect structural limits: spec is \`basic\` (annotations
forbidden, capping DE-03) and Altair's layered composition is idiomatic
but not deeply distinctive (LM-02).

## What's populated

- `quality_score: 90` (was `null`)
- `review.image_description` — fresh description of the regenerated plot
(both themes)
- `review.strengths` / `review.weaknesses` — current state, no longer
references retired colors
- `review.criteria_checklist` — full breakdown across all 6 categories
using the current quality-criteria schema (VQ-01…VQ-07, DE-01…DE-03,
SC-01…SC-04, DQ-01…DQ-03, CQ-01…CQ-05, LM-01…LM-02). Old file used a
stale schema without the Design Excellence category.
- `review.verdict: APPROVED` (review-1 threshold ≥ 90)
- `impl_tags.styling` += `okabe-ito`, `theme-aware-chrome`
- `impl_tags.techniques` += `theme-adaptive`

## Going forward

`/regen` (PR #5650) now requires the local flow to view both light and
dark renders and write `quality_score` + `review` on every iteration —
so this kind of follow-up shouldn't be needed again.

## Test plan

- [ ] CI green (lint, tests, codeql)
- [ ] sync-postgres run after merge picks up new quality_score and
review

Generated by `/regen` follow-up (no Cloud AI review dispatched).
…runs)

Previous regen.md set quality_score to null and kept the prior review
block unchanged. But /regen sets ai-approved directly — impl-review.yml
never runs, so that data never gets filled in. The merged metadata ends
up with stale evaluation data, breaking Postgres sync, the catalog UI,
and the next regen's "previous review" lookup.

Step 2f now requires opening BOTH plot-light.png and plot-dark.png and
producing a full structured evaluation (image_description, strengths,
weaknesses, per-criterion checklist across all 6 quality-criteria
categories, verdict) — same shape as the Cloud AI review.

Step 2g now writes the integer quality_score and full review block into
the metadata YAML.
Old template had `Quality: /100` literally — the score never made it into
the impl header. Step 2h now requires {SCORE} and explicitly notes it must
match the metadata's quality_score.
MarkusNeusinger added a commit that referenced this pull request May 2, 2026
## Summary

Follow-up to #5653 / #5654. The `/regen` flow blanked the `Quality:`
line in the docstring header — it now reads `Quality: /100`, but should
match the metadata's `quality_score: 90` populated in #5654.

```diff
-Quality: /100 | Updated: 2026-05-02
+Quality: 90/100 | Updated: 2026-05-02
```

The `regen.md` template (PR #5650) is being updated in parallel to
require `{SCORE}/100`, not blank, so future regens won't drop the score.

## Test plan

- [ ] CI green
- [ ] header matches metadata `quality_score: 90`
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

Comment thread agentic/commands/regen.md Outdated
Comment on lines +58 to +60
If the user is on a non-`main` branch, that's fine — the candidate scan below works against `origin/main`-tracked
files as long as they're checked out, and step 2j explicitly bases each worktree on `origin/main` (not local
`main`), so staleness can't poison the PR base either.
Comment thread agentic/commands/regen.md Outdated
Comment on lines +395 to +405
PR_URL=$(gh pr create \
--title "regen({SPEC_ID}): {LIBRARY}" \
--body "$(cat <<EOF
Locally regenerated **{LIBRARY}** for **{SPEC_ID}**.

Quality: {SCORE}/100
- VQ: {vq}/30 | DE: {de}/20 | SC: {sc}/15 | DQ: {dq}/15 | CQ: {cq}/10 | LM: {lm}/10

Generated by \`/regen\` (local model, no Cloud AI review dispatched).
EOF
)")
…p, parent issue marker

Three real concerns from Copilot on PR #5650:

1. Picker read working tree, so on a feature branch behind main it could
   reselect a spec already regenerated upstream. Replaced Path('plots')
   walk with `git ls-tree`/`git show origin/main:...` so the selection is
   always against the current main, regardless of checked-out branch.

2. PREVIEW_DIR was reused across retries. If a prior render produced
   plot-light.png and the current code only emits plot-dark.png, the
   existence checks would still pass with stale art. Now `rm -rf`'d
   before each fresh render.

3. PR body lacked `**Parent Issue:** #N`, which impl-merge.yml parses to
   comment on the spec issue and add impl:{lib}:done. Now read from
   plots/{SPEC_ID}/specification.yaml and inject the marker when present.

Two earlier comments (re: Done mode + worktree main refresh) were already
addressed by the prior `git fetch origin main` calls — skipped.
Copilot AI review requested due to automatic review settings May 2, 2026 22:29
…n package

The regen.md slash command had grown to ~570 lines of inline shell + Python:
a 60-line picker, 25+ lines of YAML metadata structure, ~70-line worktree+PR
orchestration. Each invocation re-derived this code from markdown — exactly
where AI-driven typos crept in (wrong field names, mismatched score totals,
drifted bash quoting).

Now mirrors the core/images.py pattern: a module-as-CLI dispatcher with
named, testable functions for every clear step. regen.md shrinks to ~370
lines, each step a one-line `uv run python -m agentic.workflows.modules.regen
<subcommand>` call.

New package agentic/workflows/modules/regen/:
  __init__.py     — PlanSpec + QualityEval dataclasses (JSON-serializable)
  __main__.py     — CLI dispatcher
  picker.py       — pick_oldest() / validate_spec() reading origin/main via
                    git ls-tree + git show (independent of checked-out branch)
  plan.py         — Plan I/O: write/parse/next-unchecked/mark-done/mark-failed
                    /archive + worktree+preview cleanup
  render.py       — copy impl→run_impl.py (sidesteps the self-import collision),
                    run both ANYPLOT_THEME=light and =dark, assert both PNGs
  metadata.py     — build_metadata_dict + write_metadata + update_impl_header.
                    Theme-aware preview URLs under /python/ segment, full
                    review.criteria_checklist, preserves created/issue/impl_tags
                    on regenerations
  staging.py      — invokes core.images process+responsive for both themes,
                    bundle-uploads to gs://anyplot-images/staging/{spec}/python/{lib}/
  pr_create.py    — worktree from origin/main → commit → push → gh pr create
                    with **Parent Issue:** marker → label via REST → cleanup

User-facing change: /regen now accepts an optional spec id.
  /regen              # picks the oldest spec
  /regen <spec-id>    # uses the given spec (must exist on origin/main)

29 unit tests cover all modules end-to-end. Full suite: 1368 passed.

Out of scope (tracked separately): lifting metadata.py + staging.py to core/
so impl-generate.yml can reuse them too.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.

# ---------------------------------------------------------------------------


def create_regen_pr(
shutil.rmtree(preview)
preview.mkdir(parents=True)

shutil.copyfile(impl, preview / "run_impl.py")
Comment on lines +106 to +114
branch = f"implementation/{spec_id}/{library}"
worktree = worktrees_root / f"{spec_id}-{library}"

# Fetch and (re-)create the worktree from current origin/main
subprocess.run(["git", "fetch", "origin", "main"], check=True)
if worktree.exists():
subprocess.run(["git", "worktree", "remove", str(worktree), "--force"], check=False)
subprocess.run(["git", "worktree", "prune"], check=False)
subprocess.run(["git", "worktree", "add", "-b", branch, str(worktree), "origin/main"], check=True)
subprocess.run(["gsutil", "acl", "ch", "-u", "AllUsers:R", f"{staging_path}/plot-{theme}.html"], check=False)


def stage_images_to_gcs(spec_id: str, library: str, plots_root: Path = Path("plots")) -> str:
@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

…tries, untested orchestration

Three real issues from Copilot's review of the extraction:

1. render.py was copying the impl into .regen-preview/{lib}/run_impl.py
   to sidestep the self-import collision. But that changes __file__,
   breaking highcharts/pygal impls that use Path(__file__).parents[...]
   to locate node_modules and other relative assets. Fix: invoke
   `uv run python -P <impl>` instead — Python ≥3.11's `-P` skips
   prepending the script's directory to sys.path, so `import altair`
   resolves to the installed package without copying the file. __file__
   stays at the original impl path.

2. pr_create.py created branch implementation/{spec}/{lib} via
   `git worktree add -b` but never deleted the local branch on cleanup.
   A second regen attempt for the same library failed with
   "branch already exists". Fix: `git branch -D` both before the
   worktree add (in case a stale branch survives from a crashed run) and
   in the finally cleanup. The branch lives on the remote after push.

3. pr_create.py and staging.py had no test coverage despite shelling out
   to git, gh, gsutil, and core.images. Added 15 mocked subprocess tests
   covering: orchestration sequence, cleanup-on-failure path, REST-based
   label add, theme-bundle GCS path math, html upload for interactive
   libs, and missing-render error paths.

Plus regression tests in test_render.py:
- __file__ stays at the original impl path (the highcharts/pygal bug)
- import altair in altair.py resolves to the package, not the local file

44/44 regen tests pass. Lint clean.
Copilot AI review requested due to automatic review settings May 2, 2026 22:47
…ion test

CI's unit test job doesn't install the [plotting] extra, so `import altair`
raised ModuleNotFoundError and the test failed even though the bug-under-test
(library shadowing) was correctly fixed. Stdlib `json` is always available
and the shadowing behavior is identical: an impl named json.py would shadow
json without -P just like altair.py shadows altair.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 2 comments.

Comment on lines +61 to +68
for path in meta_paths:
try:
data = yaml.safe_load(_git_show(path, ref)) or {}
except (subprocess.CalledProcessError, yaml.YAMLError):
continue
ts = data.get("updated") or data.get("created")
if ts and (latest is None or str(ts) > str(latest)):
latest = str(ts)
Comment on lines +121 to +127
# Copy the regenerated files into the worktree (impl + metadata at minimum)
for rel in (
f"plots/{spec_id}/implementations/python/{library}.py",
f"plots/{spec_id}/metadata/python/{library}.yaml",
):
src = Path(rel)
dst = worktree / rel
Copilot review on #5650: `_latest_timestamp` was using lexicographic
string comparison, which can flip ordering when the same instant is
written as `...Z` vs `...+00:00` (both formats appear under
`plots/**/metadata`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 2, 2026 23:14
MarkusNeusinger and others added 2 commits May 3, 2026 01:15
- Emphasize that each library implementation is generated in isolation
- Prohibit reading sibling-library implementations for consistency
- Outline allowed and forbidden inputs for library implementations
- Encourage unique interpretations and choices within the spec's framework
Mirrors the rule just added to prompts/plot-generator.md and the
impl-generate / impl-repair workflow prompts. Each library's regen
invocation must read only its own files; cross-reading another
library's .py or .yaml leads to cloned data scenarios and identical
charts across the catalog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 3 comments.

Comment on lines +121 to +125
# Copy the regenerated files into the worktree (impl + metadata at minimum)
for rel in (
f"plots/{spec_id}/implementations/python/{library}.py",
f"plots/{spec_id}/metadata/python/{library}.yaml",
):
Comment on lines +16 to +39
def test_list_spec_dirs_returns_known_specs():
specs = _list_spec_dirs()
# Specs known to be on main at the time of writing this test
assert "scatter-basic" in specs
assert "sparkline-basic" in specs
# No path separators or hidden entries
assert all("/" not in s for s in specs)
assert all(not s.startswith(".") for s in specs)


def test_list_metadata_files_for_known_spec():
files = _list_metadata_files("sparkline-basic")
assert any(p.endswith("altair.yaml") for p in files)


def test_pick_oldest_returns_real_spec():
spec_id, latest = pick_oldest()
assert spec_id # non-empty
assert latest # non-empty ISO


def test_validate_spec_known_id():
spec_id, latest = validate_spec("sparkline-basic")
assert spec_id == "sparkline-basic"
Comment on lines +96 to +99
_id, latest = validate_spec(spec_id)
title = spec_title(spec_id)
libs = list_libraries(spec_id)
if not libs:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants