refactor(regen): split into resumable single-step invocations for local models by MarkusNeusinger · Pull Request #5650 · MarkusNeusinger/anyplot

MarkusNeusinger · 2026-05-02T18:08:05Z

Summary

Rewrite /regen as a resumable single-step command so it runs reliably on local models with limited context windows (e.g. Gemma 3 26B at 256k), where the previous parallel-agent flow blew past the budget.

Motivation

The previous /regen:

spawned N parallel general-purpose agents (one per library) via experimental agent teams,
pulled in the full ~730-line update.md library-agent prompt,
aggregated all agent reports + all shipping output in the lead's working context,
did everything (read → modify → score → worktree → PR) for all libraries in a single invocation.

That's fine on hosted Sonnet but consistently aborts when pointed at a local Ollama endpoint with smaller context.

New design

/regen is now idempotent and resumable. State lives in .regen-plan.md at repo root. Each invocation does exactly one thing, auto-detected from the state file:

State file	Action
Missing	Plan mode — pick oldest spec, write checklist, exit
Has unchecked `- [ ]` items	Execute mode — do the next library, tick it off, exit
All `- [x]` / `- [!]`	Done mode — archive plan to `.regen-history/`, exit

Per spec, expect ~10 invocations (1 plan + 1 per library + 1 finalize). The lead does the work directly — no agent teams, no parallel agents, no CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS flag.

Resilience

Each library invocation ships its own PR (with quality:{N} + ai-approved/quality-poor labels) before exiting, so a crash mid-flight loses at most the in-flight library — everything before is already a merged-or-pending PR.
Failed libraries are marked - [!] and skipped; the user can edit them back to - [ ] to retry.
cat .regen-plan.md shows progress; rm .regen-plan.md abandons the current plan.

Compatibility

No change to the downstream pipeline: PRs still get the same quality:{N} + ai-approved/quality-poor labels, and impl-merge.yml still handles squash-merge, GCS staging→production, and impl:{lib}:done issue labelling.
No change to the cloud-bypass property: impl-review.yml and impl-repair.yml are still never dispatched.
Model-agnostic — works identically with default Sonnet and any local OpenAI/Anthropic-compatible endpoint.

Files

agentic/commands/regen.md — full rewrite (~420 lines, but each invocation only exercises the relevant section).
.gitignore — add .regen-plan.md, .regen-preview/, .regen-history/ next to existing .update-preview/ and .worktrees/ entries.

Test Plan

Run /regen in a clean working tree → confirm .regen-plan.md is created with the oldest spec and the correct library list, and the command exits without touching any implementation files.
Run /regen again → confirm it picks the first unchecked library, ships a PR with quality:{N} + the right approval label, ticks off the box, and exits.
Repeat for all libraries → confirm each invocation's working context stays small enough for the local model.
After the last library, run /regen once more → confirm .regen-plan.md is moved to .regen-history/{spec_id}-{ts}.md and the user is told to re-run for the next spec.
Verify gh run list --workflow=impl-review.yml --limit 5 shows no new runs for the regen'd specs (cloud bypass intact).
Force a failure (e.g. revoke gsutil auth) → confirm the library is marked - [!], logged, worktree cleaned up, and subsequent invocations skip it.

https://claude.ai/code/session_01Tuiif6dcyttLrc1KoKCmqR

Generated by Claude Code

Replace the parallel agent-team flow with an idempotent state-file driven loop so /regen runs reliably on local models with limited context (e.g. Gemma 3 26B at 256k): - Mode auto-detected from .regen-plan.md at repo root - missing → plan mode (pick oldest spec, write checklist, exit) - has unchecked items → execute mode (do one library, tick off, exit) - all checked → done mode (archive plan, exit) - No agent teams, no parallel agents — lead does the work sequentially - Each invocation processes exactly one library and ships its own PR with quality:{N} + ai-approved/quality-poor labels (impl-merge.yml unchanged) - Restart-safe: kill Claude Code anytime, next /regen resumes from the state file .gitignore: add .regen-plan.md, .regen-preview/, .regen-history/.

Copilot

Pull request overview

This PR rewrites the /regen command into a resumable, single-step workflow aimed at keeping local-model context usage low while preserving the existing implementation-PR pipeline.

Changes:

Replaced the old parallel-agent /regen flow with a plan/execute/done state machine stored in .regen-plan.md.
Documented a per-library regeneration flow that generates, scores, stages, and ships one PR per invocation.
Added gitignore entries for regen state/history/preview artifacts.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`agentic/commands/regen.md`	Rewrites the `/regen` command docs and operational flow around a resumable state file.
`.gitignore`	Ignores regen-specific local artifacts and history files.

+
+1. Print final summary: counts of done / failed, list of PR URLs from the log.
+2. Move `.regen-plan.md` to `.regen-history/{SPEC_ID}-{YYYYMMDD-HHMMSS}.md` (create directory if needed).
+3. Tell the user: `Spec {SPEC_ID} complete. Run /regen to start the next oldest spec.`


+### 2j. Worktree → commit → push → PR
+
+```bash
+WORKTREE=".worktrees/{SPEC_ID}-{LIBRARY}"


…view) Two related staleness bugs flagged by Copilot on #5650: 1. Plan mode picked the oldest spec from local metadata only. After a previous /regen merged its PRs remotely, re-running on the same checkout would re-pick the just-completed spec. Add step 1a that fetches origin/main and fast-forwards local main if checked out. 2. Step 2j based each new worktree on local `main`, which can lag the remote tip after impl-merge.yml lands prior libraries' PRs. Push rejection or merge conflicts result when two regen'd libraries touch shared spec files. Fetch origin/main and base the worktree on origin/main directly. Renumber subsequent plan-mode steps (1b-1f).

…ort workaround, GCS path) The previous /regen flow produced a single plot.png and metadata with legacy preview_url/preview_html, neither of which match what impl-generate.yml / impl-merge.yml actually do. The merge step requires both plot-light.png AND plot-dark.png and errors out otherwise; metadata uses preview_url_light / preview_url_dark / preview_html_light / preview_html_dark; and the GCS staging path is staging/{spec}/python/{lib}/ (note the python/ segment). Also documents the self-import collision: implementation files share names with their library (e.g. altair.py), so running them as scripts puts the script's directory on sys.path and `import altair` resolves to the local file, not the package. Sidestep by copying the impl into the preview dir under run_impl.py and running that.

Copilot

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

+If the user is on a non-`main` branch, that's fine — the candidate scan below works against `origin/main`-tracked
+files as long as they're checked out, and step 2j explicitly bases each worktree on `origin/main` (not local
+`main`), so staleness can't poison the PR base either.


+PREVIEW_DIR="plots/{SPEC_ID}/implementations/python/.regen-preview/{LIBRARY}"
+mkdir -p "$PREVIEW_DIR"
+cp "plots/{SPEC_ID}/implementations/python/{LIBRARY}.py" "$PREVIEW_DIR/run_impl.py"
+
+# Render both themes — required by impl-generate.yml / impl-merge.yml
+for THEME in light dark; do
+  (cd "$PREVIEW_DIR" && MPLBACKEND=Agg ANYPLOT_THEME="$THEME" uv run python run_impl.py)
+done


+PR_URL=$(gh pr create \
+  --title "regen({SPEC_ID}): {LIBRARY}" \
+  --body "$(cat <<EOF
+Locally regenerated **{LIBRARY}** for **{SPEC_ID}**.
+
+Quality: {SCORE}/100
+- VQ: {vq}/30 | DE: {de}/20 | SC: {sc}/15 | DQ: {dq}/15 | CQ: {cq}/10 | LM: {lm}/10
+
+Generated by \`/regen\` (local model, no Cloud AI review dispatched).
+EOF


…0) (#5654) ## Summary PR #5653 (regen of altair sparkline-basic) merged with `quality_score: null` and a stale `review` block still referencing the old `#306998` Python Blue palette and `pyplots.ai` title. The Cloud AI review pipeline (`impl-review.yml`) is not dispatched on regen PRs (the regen flow sets `ai-approved` directly), so this metadata never got populated automatically. This PR fills it in by locally evaluating both rendered themes against `prompts/quality-criteria.md`. ## Score: 90/100 — APPROVED | Category | Score | |----------|-------| | Visual Quality | 30/30 | | Design Excellence | 13/20 | | Spec Compliance | 15/15 | | Data Quality | 14/15 | | Code Quality | 10/10 | | Library Mastery | 8/10 | DE/LM ceilings reflect structural limits: spec is \`basic\` (annotations forbidden, capping DE-03) and Altair's layered composition is idiomatic but not deeply distinctive (LM-02). ## What's populated - `quality_score: 90` (was `null`) - `review.image_description` — fresh description of the regenerated plot (both themes) - `review.strengths` / `review.weaknesses` — current state, no longer references retired colors - `review.criteria_checklist` — full breakdown across all 6 categories using the current quality-criteria schema (VQ-01…VQ-07, DE-01…DE-03, SC-01…SC-04, DQ-01…DQ-03, CQ-01…CQ-05, LM-01…LM-02). Old file used a stale schema without the Design Excellence category. - `review.verdict: APPROVED` (review-1 threshold ≥ 90) - `impl_tags.styling` += `okabe-ito`, `theme-aware-chrome` - `impl_tags.techniques` += `theme-adaptive` ## Going forward `/regen` (PR #5650) now requires the local flow to view both light and dark renders and write `quality_score` + `review` on every iteration — so this kind of follow-up shouldn't be needed again. ## Test plan - [ ] CI green (lint, tests, codeql) - [ ] sync-postgres run after merge picks up new quality_score and review Generated by `/regen` follow-up (no Cloud AI review dispatched).

…runs) Previous regen.md set quality_score to null and kept the prior review block unchanged. But /regen sets ai-approved directly — impl-review.yml never runs, so that data never gets filled in. The merged metadata ends up with stale evaluation data, breaking Postgres sync, the catalog UI, and the next regen's "previous review" lookup. Step 2f now requires opening BOTH plot-light.png and plot-dark.png and producing a full structured evaluation (image_description, strengths, weaknesses, per-criterion checklist across all 6 quality-criteria categories, verdict) — same shape as the Cloud AI review. Step 2g now writes the integer quality_score and full review block into the metadata YAML.

Old template had `Quality: /100` literally — the score never made it into the impl header. Step 2h now requires {SCORE} and explicitly notes it must match the metadata's quality_score.

## Summary Follow-up to #5653 / #5654. The `/regen` flow blanked the `Quality:` line in the docstring header — it now reads `Quality: /100`, but should match the metadata's `quality_score: 90` populated in #5654. ```diff -Quality: /100 | Updated: 2026-05-02 +Quality: 90/100 | Updated: 2026-05-02 ``` The `regen.md` template (PR #5650) is being updated in parallel to require `{SCORE}/100`, not blank, so future regens won't drop the score. ## Test plan - [ ] CI green - [ ] header matches metadata `quality_score: 90`

Copilot

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

+If the user is on a non-`main` branch, that's fine — the candidate scan below works against `origin/main`-tracked
+files as long as they're checked out, and step 2j explicitly bases each worktree on `origin/main` (not local
+`main`), so staleness can't poison the PR base either.


+PR_URL=$(gh pr create \
+  --title "regen({SPEC_ID}): {LIBRARY}" \
+  --body "$(cat <<EOF
+Locally regenerated **{LIBRARY}** for **{SPEC_ID}**.
+
+Quality: {SCORE}/100
+- VQ: {vq}/30 | DE: {de}/20 | SC: {sc}/15 | DQ: {dq}/15 | CQ: {cq}/10 | LM: {lm}/10
+
+Generated by \`/regen\` (local model, no Cloud AI review dispatched).
+EOF
+)")


…p, parent issue marker Three real concerns from Copilot on PR #5650: 1. Picker read working tree, so on a feature branch behind main it could reselect a spec already regenerated upstream. Replaced Path('plots') walk with `git ls-tree`/`git show origin/main:...` so the selection is always against the current main, regardless of checked-out branch. 2. PREVIEW_DIR was reused across retries. If a prior render produced plot-light.png and the current code only emits plot-dark.png, the existence checks would still pass with stale art. Now `rm -rf`'d before each fresh render. 3. PR body lacked `**Parent Issue:** #N`, which impl-merge.yml parses to comment on the spec issue and add impl:{lib}:done. Now read from plots/{SPEC_ID}/specification.yaml and inject the marker when present. Two earlier comments (re: Done mode + worktree main refresh) were already addressed by the prior `git fetch origin main` calls — skipped.

…n package The regen.md slash command had grown to ~570 lines of inline shell + Python: a 60-line picker, 25+ lines of YAML metadata structure, ~70-line worktree+PR orchestration. Each invocation re-derived this code from markdown — exactly where AI-driven typos crept in (wrong field names, mismatched score totals, drifted bash quoting). Now mirrors the core/images.py pattern: a module-as-CLI dispatcher with named, testable functions for every clear step. regen.md shrinks to ~370 lines, each step a one-line `uv run python -m agentic.workflows.modules.regen <subcommand>` call. New package agentic/workflows/modules/regen/: __init__.py — PlanSpec + QualityEval dataclasses (JSON-serializable) __main__.py — CLI dispatcher picker.py — pick_oldest() / validate_spec() reading origin/main via git ls-tree + git show (independent of checked-out branch) plan.py — Plan I/O: write/parse/next-unchecked/mark-done/mark-failed /archive + worktree+preview cleanup render.py — copy impl→run_impl.py (sidesteps the self-import collision), run both ANYPLOT_THEME=light and =dark, assert both PNGs metadata.py — build_metadata_dict + write_metadata + update_impl_header. Theme-aware preview URLs under /python/ segment, full review.criteria_checklist, preserves created/issue/impl_tags on regenerations staging.py — invokes core.images process+responsive for both themes, bundle-uploads to gs://anyplot-images/staging/{spec}/python/{lib}/ pr_create.py — worktree from origin/main → commit → push → gh pr create with **Parent Issue:** marker → label via REST → cleanup User-facing change: /regen now accepts an optional spec id. /regen # picks the oldest spec /regen <spec-id> # uses the given spec (must exist on origin/main) 29 unit tests cover all modules end-to-end. Full suite: 1368 passed. Out of scope (tracked separately): lifting metadata.py + staging.py to core/ so impl-generate.yml can reuse them too.

Copilot

Pull request overview

Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.

+# ---------------------------------------------------------------------------
+
+
+def create_regen_pr(


+        shutil.rmtree(preview)
+    preview.mkdir(parents=True)
+
+    shutil.copyfile(impl, preview / "run_impl.py")


+    branch = f"implementation/{spec_id}/{library}"
+    worktree = worktrees_root / f"{spec_id}-{library}"
+
+    # Fetch and (re-)create the worktree from current origin/main
+    subprocess.run(["git", "fetch", "origin", "main"], check=True)
+    if worktree.exists():
+        subprocess.run(["git", "worktree", "remove", str(worktree), "--force"], check=False)
+        subprocess.run(["git", "worktree", "prune"], check=False)
+    subprocess.run(["git", "worktree", "add", "-b", branch, str(worktree), "origin/main"], check=True)


+        subprocess.run(["gsutil", "acl", "ch", "-u", "AllUsers:R", f"{staging_path}/plot-{theme}.html"], check=False)
+
+
+def stage_images_to_gcs(spec_id: str, library: str, plots_root: Path = Path("plots")) -> str:


codecov · 2026-05-02T22:34:31Z

Codecov Report

❌ Patch coverage is 70.91222% with 169 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
agentic/workflows/modules/regen/__main__.py	0.00%	136 Missing ⚠️
agentic/workflows/modules/regen/picker.py	82.89%	13 Missing ⚠️
agentic/workflows/modules/regen/plan.py	89.81%	11 Missing ⚠️
agentic/workflows/modules/regen/metadata.py	94.11%	4 Missing ⚠️
agentic/workflows/modules/regen/pr_create.py	97.18%	2 Missing ⚠️
agentic/workflows/modules/regen/staging.py	95.55%	2 Missing ⚠️
agentic/workflows/modules/regen/render.py	97.91%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

…tries, untested orchestration Three real issues from Copilot's review of the extraction: 1. render.py was copying the impl into .regen-preview/{lib}/run_impl.py to sidestep the self-import collision. But that changes __file__, breaking highcharts/pygal impls that use Path(__file__).parents[...] to locate node_modules and other relative assets. Fix: invoke `uv run python -P <impl>` instead — Python ≥3.11's `-P` skips prepending the script's directory to sys.path, so `import altair` resolves to the installed package without copying the file. __file__ stays at the original impl path. 2. pr_create.py created branch implementation/{spec}/{lib} via `git worktree add -b` but never deleted the local branch on cleanup. A second regen attempt for the same library failed with "branch already exists". Fix: `git branch -D` both before the worktree add (in case a stale branch survives from a crashed run) and in the finally cleanup. The branch lives on the remote after push. 3. pr_create.py and staging.py had no test coverage despite shelling out to git, gh, gsutil, and core.images. Added 15 mocked subprocess tests covering: orchestration sequence, cleanup-on-failure path, REST-based label add, theme-bundle GCS path math, html upload for interactive libs, and missing-render error paths. Plus regression tests in test_render.py: - __file__ stays at the original impl path (the highcharts/pygal bug) - import altair in altair.py resolves to the package, not the local file 44/44 regen tests pass. Lint clean.

…ion test CI's unit test job doesn't install the [plotting] extra, so `import altair` raised ModuleNotFoundError and the test failed even though the bug-under-test (library shadowing) was correctly fixed. Stdlib `json` is always available and the shadowing behavior is identical: an impl named json.py would shadow json without -P just like altair.py shadows altair.

Copilot

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 2 comments.

+    for path in meta_paths:
+        try:
+            data = yaml.safe_load(_git_show(path, ref)) or {}
+        except (subprocess.CalledProcessError, yaml.YAMLError):
+            continue
+        ts = data.get("updated") or data.get("created")
+        if ts and (latest is None or str(ts) > str(latest)):
+            latest = str(ts)


+        # Copy the regenerated files into the worktree (impl + metadata at minimum)
+        for rel in (
+            f"plots/{spec_id}/implementations/python/{library}.py",
+            f"plots/{spec_id}/metadata/python/{library}.yaml",
+        ):
+            src = Path(rel)
+            dst = worktree / rel


Copilot review on #5650: `_latest_timestamp` was using lexicographic string comparison, which can flip ordering when the same instant is written as `...Z` vs `...+00:00` (both formats appear under `plots/**/metadata`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Emphasize that each library implementation is generated in isolation - Prohibit reading sibling-library implementations for consistency - Outline allowed and forbidden inputs for library implementations - Encourage unique interpretations and choices within the spec's framework

Mirrors the rule just added to prompts/plot-generator.md and the impl-generate / impl-repair workflow prompts. Each library's regen invocation must read only its own files; cross-reading another library's .py or .yaml leads to cloned data scenarios and identical charts across the catalog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 3 comments.

+        # Copy the regenerated files into the worktree (impl + metadata at minimum)
+        for rel in (
+            f"plots/{spec_id}/implementations/python/{library}.py",
+            f"plots/{spec_id}/metadata/python/{library}.yaml",
+        ):


+def test_list_spec_dirs_returns_known_specs():
+    specs = _list_spec_dirs()
+    # Specs known to be on main at the time of writing this test
+    assert "scatter-basic" in specs
+    assert "sparkline-basic" in specs
+    # No path separators or hidden entries
+    assert all("/" not in s for s in specs)
+    assert all(not s.startswith(".") for s in specs)
+
+
+def test_list_metadata_files_for_known_spec():
+    files = _list_metadata_files("sparkline-basic")
+    assert any(p.endswith("altair.yaml") for p in files)
+
+
+def test_pick_oldest_returns_real_spec():
+    spec_id, latest = pick_oldest()
+    assert spec_id  # non-empty
+    assert latest  # non-empty ISO
+
+
+def test_validate_spec_known_id():
+    spec_id, latest = validate_spec("sparkline-basic")
+    assert spec_id == "sparkline-basic"


+        _id, latest = validate_spec(spec_id)
+        title = spec_title(spec_id)
+        libs = list_libraries(spec_id)
+        if not libs:


Copilot AI review requested due to automatic review settings May 2, 2026 18:08

Copilot started reviewing on behalf of MarkusNeusinger May 2, 2026 18:08 View session