refactor(regen): split into resumable single-step invocations for local models#5650
Open
MarkusNeusinger wants to merge 12 commits intomainfrom
Open
refactor(regen): split into resumable single-step invocations for local models#5650MarkusNeusinger wants to merge 12 commits intomainfrom
MarkusNeusinger wants to merge 12 commits intomainfrom
Conversation
Replace the parallel agent-team flow with an idempotent state-file driven
loop so /regen runs reliably on local models with limited context (e.g.
Gemma 3 26B at 256k):
- Mode auto-detected from .regen-plan.md at repo root
- missing → plan mode (pick oldest spec, write checklist, exit)
- has unchecked items → execute mode (do one library, tick off, exit)
- all checked → done mode (archive plan, exit)
- No agent teams, no parallel agents — lead does the work sequentially
- Each invocation processes exactly one library and ships its own PR with
quality:{N} + ai-approved/quality-poor labels (impl-merge.yml unchanged)
- Restart-safe: kill Claude Code anytime, next /regen resumes from the
state file
.gitignore: add .regen-plan.md, .regen-preview/, .regen-history/.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR rewrites the /regen command into a resumable, single-step workflow aimed at keeping local-model context usage low while preserving the existing implementation-PR pipeline.
Changes:
- Replaced the old parallel-agent
/regenflow with a plan/execute/done state machine stored in.regen-plan.md. - Documented a per-library regeneration flow that generates, scores, stages, and ships one PR per invocation.
- Added gitignore entries for regen state/history/preview artifacts.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
agentic/commands/regen.md |
Rewrites the /regen command docs and operational flow around a resumable state file. |
.gitignore |
Ignores regen-specific local artifacts and history files. |
|
|
||
| 1. Print final summary: counts of done / failed, list of PR URLs from the log. | ||
| 2. Move `.regen-plan.md` to `.regen-history/{SPEC_ID}-{YYYYMMDD-HHMMSS}.md` (create directory if needed). | ||
| 3. Tell the user: `Spec {SPEC_ID} complete. Run /regen to start the next oldest spec.` |
| ### 2j. Worktree → commit → push → PR | ||
|
|
||
| ```bash | ||
| WORKTREE=".worktrees/{SPEC_ID}-{LIBRARY}" |
…view) Two related staleness bugs flagged by Copilot on #5650: 1. Plan mode picked the oldest spec from local metadata only. After a previous /regen merged its PRs remotely, re-running on the same checkout would re-pick the just-completed spec. Add step 1a that fetches origin/main and fast-forwards local main if checked out. 2. Step 2j based each new worktree on local `main`, which can lag the remote tip after impl-merge.yml lands prior libraries' PRs. Push rejection or merge conflicts result when two regen'd libraries touch shared spec files. Fetch origin/main and base the worktree on origin/main directly. Renumber subsequent plan-mode steps (1b-1f).
…ort workaround, GCS path)
The previous /regen flow produced a single plot.png and metadata with legacy
preview_url/preview_html, neither of which match what impl-generate.yml /
impl-merge.yml actually do. The merge step requires both plot-light.png AND
plot-dark.png and errors out otherwise; metadata uses preview_url_light /
preview_url_dark / preview_html_light / preview_html_dark; and the GCS
staging path is staging/{spec}/python/{lib}/ (note the python/ segment).
Also documents the self-import collision: implementation files share names
with their library (e.g. altair.py), so running them as scripts puts the
script's directory on sys.path and `import altair` resolves to the local
file, not the package. Sidestep by copying the impl into the preview dir
under run_impl.py and running that.
Comment on lines
+58
to
+60
| If the user is on a non-`main` branch, that's fine — the candidate scan below works against `origin/main`-tracked | ||
| files as long as they're checked out, and step 2j explicitly bases each worktree on `origin/main` (not local | ||
| `main`), so staleness can't poison the PR base either. |
Comment on lines
+214
to
+221
| PREVIEW_DIR="plots/{SPEC_ID}/implementations/python/.regen-preview/{LIBRARY}" | ||
| mkdir -p "$PREVIEW_DIR" | ||
| cp "plots/{SPEC_ID}/implementations/python/{LIBRARY}.py" "$PREVIEW_DIR/run_impl.py" | ||
|
|
||
| # Render both themes — required by impl-generate.yml / impl-merge.yml | ||
| for THEME in light dark; do | ||
| (cd "$PREVIEW_DIR" && MPLBACKEND=Agg ANYPLOT_THEME="$THEME" uv run python run_impl.py) | ||
| done |
Comment on lines
+356
to
+365
| PR_URL=$(gh pr create \ | ||
| --title "regen({SPEC_ID}): {LIBRARY}" \ | ||
| --body "$(cat <<EOF | ||
| Locally regenerated **{LIBRARY}** for **{SPEC_ID}**. | ||
|
|
||
| Quality: {SCORE}/100 | ||
| - VQ: {vq}/30 | DE: {de}/20 | SC: {sc}/15 | DQ: {dq}/15 | CQ: {cq}/10 | LM: {lm}/10 | ||
|
|
||
| Generated by \`/regen\` (local model, no Cloud AI review dispatched). | ||
| EOF |
2 tasks
MarkusNeusinger
added a commit
that referenced
this pull request
May 2, 2026
…0) (#5654) ## Summary PR #5653 (regen of altair sparkline-basic) merged with `quality_score: null` and a stale `review` block still referencing the old `#306998` Python Blue palette and `pyplots.ai` title. The Cloud AI review pipeline (`impl-review.yml`) is not dispatched on regen PRs (the regen flow sets `ai-approved` directly), so this metadata never got populated automatically. This PR fills it in by locally evaluating both rendered themes against `prompts/quality-criteria.md`. ## Score: 90/100 — APPROVED | Category | Score | |----------|-------| | Visual Quality | 30/30 | | Design Excellence | 13/20 | | Spec Compliance | 15/15 | | Data Quality | 14/15 | | Code Quality | 10/10 | | Library Mastery | 8/10 | DE/LM ceilings reflect structural limits: spec is \`basic\` (annotations forbidden, capping DE-03) and Altair's layered composition is idiomatic but not deeply distinctive (LM-02). ## What's populated - `quality_score: 90` (was `null`) - `review.image_description` — fresh description of the regenerated plot (both themes) - `review.strengths` / `review.weaknesses` — current state, no longer references retired colors - `review.criteria_checklist` — full breakdown across all 6 categories using the current quality-criteria schema (VQ-01…VQ-07, DE-01…DE-03, SC-01…SC-04, DQ-01…DQ-03, CQ-01…CQ-05, LM-01…LM-02). Old file used a stale schema without the Design Excellence category. - `review.verdict: APPROVED` (review-1 threshold ≥ 90) - `impl_tags.styling` += `okabe-ito`, `theme-aware-chrome` - `impl_tags.techniques` += `theme-adaptive` ## Going forward `/regen` (PR #5650) now requires the local flow to view both light and dark renders and write `quality_score` + `review` on every iteration — so this kind of follow-up shouldn't be needed again. ## Test plan - [ ] CI green (lint, tests, codeql) - [ ] sync-postgres run after merge picks up new quality_score and review Generated by `/regen` follow-up (no Cloud AI review dispatched).
…runs) Previous regen.md set quality_score to null and kept the prior review block unchanged. But /regen sets ai-approved directly — impl-review.yml never runs, so that data never gets filled in. The merged metadata ends up with stale evaluation data, breaking Postgres sync, the catalog UI, and the next regen's "previous review" lookup. Step 2f now requires opening BOTH plot-light.png and plot-dark.png and producing a full structured evaluation (image_description, strengths, weaknesses, per-criterion checklist across all 6 quality-criteria categories, verdict) — same shape as the Cloud AI review. Step 2g now writes the integer quality_score and full review block into the metadata YAML.
2 tasks
Old template had `Quality: /100` literally — the score never made it into
the impl header. Step 2h now requires {SCORE} and explicitly notes it must
match the metadata's quality_score.
MarkusNeusinger
added a commit
that referenced
this pull request
May 2, 2026
## Summary Follow-up to #5653 / #5654. The `/regen` flow blanked the `Quality:` line in the docstring header — it now reads `Quality: /100`, but should match the metadata's `quality_score: 90` populated in #5654. ```diff -Quality: /100 | Updated: 2026-05-02 +Quality: 90/100 | Updated: 2026-05-02 ``` The `regen.md` template (PR #5650) is being updated in parallel to require `{SCORE}/100`, not blank, so future regens won't drop the score. ## Test plan - [ ] CI green - [ ] header matches metadata `quality_score: 90`
Comment on lines
+58
to
+60
| If the user is on a non-`main` branch, that's fine — the candidate scan below works against `origin/main`-tracked | ||
| files as long as they're checked out, and step 2j explicitly bases each worktree on `origin/main` (not local | ||
| `main`), so staleness can't poison the PR base either. |
Comment on lines
+395
to
+405
| PR_URL=$(gh pr create \ | ||
| --title "regen({SPEC_ID}): {LIBRARY}" \ | ||
| --body "$(cat <<EOF | ||
| Locally regenerated **{LIBRARY}** for **{SPEC_ID}**. | ||
|
|
||
| Quality: {SCORE}/100 | ||
| - VQ: {vq}/30 | DE: {de}/20 | SC: {sc}/15 | DQ: {dq}/15 | CQ: {cq}/10 | LM: {lm}/10 | ||
|
|
||
| Generated by \`/regen\` (local model, no Cloud AI review dispatched). | ||
| EOF | ||
| )") |
…p, parent issue marker Three real concerns from Copilot on PR #5650: 1. Picker read working tree, so on a feature branch behind main it could reselect a spec already regenerated upstream. Replaced Path('plots') walk with `git ls-tree`/`git show origin/main:...` so the selection is always against the current main, regardless of checked-out branch. 2. PREVIEW_DIR was reused across retries. If a prior render produced plot-light.png and the current code only emits plot-dark.png, the existence checks would still pass with stale art. Now `rm -rf`'d before each fresh render. 3. PR body lacked `**Parent Issue:** #N`, which impl-merge.yml parses to comment on the spec issue and add impl:{lib}:done. Now read from plots/{SPEC_ID}/specification.yaml and inject the marker when present. Two earlier comments (re: Done mode + worktree main refresh) were already addressed by the prior `git fetch origin main` calls — skipped.
…n package
The regen.md slash command had grown to ~570 lines of inline shell + Python:
a 60-line picker, 25+ lines of YAML metadata structure, ~70-line worktree+PR
orchestration. Each invocation re-derived this code from markdown — exactly
where AI-driven typos crept in (wrong field names, mismatched score totals,
drifted bash quoting).
Now mirrors the core/images.py pattern: a module-as-CLI dispatcher with
named, testable functions for every clear step. regen.md shrinks to ~370
lines, each step a one-line `uv run python -m agentic.workflows.modules.regen
<subcommand>` call.
New package agentic/workflows/modules/regen/:
__init__.py — PlanSpec + QualityEval dataclasses (JSON-serializable)
__main__.py — CLI dispatcher
picker.py — pick_oldest() / validate_spec() reading origin/main via
git ls-tree + git show (independent of checked-out branch)
plan.py — Plan I/O: write/parse/next-unchecked/mark-done/mark-failed
/archive + worktree+preview cleanup
render.py — copy impl→run_impl.py (sidesteps the self-import collision),
run both ANYPLOT_THEME=light and =dark, assert both PNGs
metadata.py — build_metadata_dict + write_metadata + update_impl_header.
Theme-aware preview URLs under /python/ segment, full
review.criteria_checklist, preserves created/issue/impl_tags
on regenerations
staging.py — invokes core.images process+responsive for both themes,
bundle-uploads to gs://anyplot-images/staging/{spec}/python/{lib}/
pr_create.py — worktree from origin/main → commit → push → gh pr create
with **Parent Issue:** marker → label via REST → cleanup
User-facing change: /regen now accepts an optional spec id.
/regen # picks the oldest spec
/regen <spec-id> # uses the given spec (must exist on origin/main)
29 unit tests cover all modules end-to-end. Full suite: 1368 passed.
Out of scope (tracked separately): lifting metadata.py + staging.py to core/
so impl-generate.yml can reuse them too.
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def create_regen_pr( |
| shutil.rmtree(preview) | ||
| preview.mkdir(parents=True) | ||
|
|
||
| shutil.copyfile(impl, preview / "run_impl.py") |
Comment on lines
+106
to
+114
| branch = f"implementation/{spec_id}/{library}" | ||
| worktree = worktrees_root / f"{spec_id}-{library}" | ||
|
|
||
| # Fetch and (re-)create the worktree from current origin/main | ||
| subprocess.run(["git", "fetch", "origin", "main"], check=True) | ||
| if worktree.exists(): | ||
| subprocess.run(["git", "worktree", "remove", str(worktree), "--force"], check=False) | ||
| subprocess.run(["git", "worktree", "prune"], check=False) | ||
| subprocess.run(["git", "worktree", "add", "-b", branch, str(worktree), "origin/main"], check=True) |
| subprocess.run(["gsutil", "acl", "ch", "-u", "AllUsers:R", f"{staging_path}/plot-{theme}.html"], check=False) | ||
|
|
||
|
|
||
| def stage_images_to_gcs(spec_id: str, library: str, plots_root: Path = Path("plots")) -> str: |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
6 tasks
…tries, untested orchestration
Three real issues from Copilot's review of the extraction:
1. render.py was copying the impl into .regen-preview/{lib}/run_impl.py
to sidestep the self-import collision. But that changes __file__,
breaking highcharts/pygal impls that use Path(__file__).parents[...]
to locate node_modules and other relative assets. Fix: invoke
`uv run python -P <impl>` instead — Python ≥3.11's `-P` skips
prepending the script's directory to sys.path, so `import altair`
resolves to the installed package without copying the file. __file__
stays at the original impl path.
2. pr_create.py created branch implementation/{spec}/{lib} via
`git worktree add -b` but never deleted the local branch on cleanup.
A second regen attempt for the same library failed with
"branch already exists". Fix: `git branch -D` both before the
worktree add (in case a stale branch survives from a crashed run) and
in the finally cleanup. The branch lives on the remote after push.
3. pr_create.py and staging.py had no test coverage despite shelling out
to git, gh, gsutil, and core.images. Added 15 mocked subprocess tests
covering: orchestration sequence, cleanup-on-failure path, REST-based
label add, theme-bundle GCS path math, html upload for interactive
libs, and missing-render error paths.
Plus regression tests in test_render.py:
- __file__ stays at the original impl path (the highcharts/pygal bug)
- import altair in altair.py resolves to the package, not the local file
44/44 regen tests pass. Lint clean.
…ion test CI's unit test job doesn't install the [plotting] extra, so `import altair` raised ModuleNotFoundError and the test failed even though the bug-under-test (library shadowing) was correctly fixed. Stdlib `json` is always available and the shadowing behavior is identical: an impl named json.py would shadow json without -P just like altair.py shadows altair.
Comment on lines
+61
to
+68
| for path in meta_paths: | ||
| try: | ||
| data = yaml.safe_load(_git_show(path, ref)) or {} | ||
| except (subprocess.CalledProcessError, yaml.YAMLError): | ||
| continue | ||
| ts = data.get("updated") or data.get("created") | ||
| if ts and (latest is None or str(ts) > str(latest)): | ||
| latest = str(ts) |
Comment on lines
+121
to
+127
| # Copy the regenerated files into the worktree (impl + metadata at minimum) | ||
| for rel in ( | ||
| f"plots/{spec_id}/implementations/python/{library}.py", | ||
| f"plots/{spec_id}/metadata/python/{library}.yaml", | ||
| ): | ||
| src = Path(rel) | ||
| dst = worktree / rel |
Copilot review on #5650: `_latest_timestamp` was using lexicographic string comparison, which can flip ordering when the same instant is written as `...Z` vs `...+00:00` (both formats appear under `plots/**/metadata`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Emphasize that each library implementation is generated in isolation - Prohibit reading sibling-library implementations for consistency - Outline allowed and forbidden inputs for library implementations - Encourage unique interpretations and choices within the spec's framework
Mirrors the rule just added to prompts/plot-generator.md and the impl-generate / impl-repair workflow prompts. Each library's regen invocation must read only its own files; cross-reading another library's .py or .yaml leads to cloned data scenarios and identical charts across the catalog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+121
to
+125
| # Copy the regenerated files into the worktree (impl + metadata at minimum) | ||
| for rel in ( | ||
| f"plots/{spec_id}/implementations/python/{library}.py", | ||
| f"plots/{spec_id}/metadata/python/{library}.yaml", | ||
| ): |
Comment on lines
+16
to
+39
| def test_list_spec_dirs_returns_known_specs(): | ||
| specs = _list_spec_dirs() | ||
| # Specs known to be on main at the time of writing this test | ||
| assert "scatter-basic" in specs | ||
| assert "sparkline-basic" in specs | ||
| # No path separators or hidden entries | ||
| assert all("/" not in s for s in specs) | ||
| assert all(not s.startswith(".") for s in specs) | ||
|
|
||
|
|
||
| def test_list_metadata_files_for_known_spec(): | ||
| files = _list_metadata_files("sparkline-basic") | ||
| assert any(p.endswith("altair.yaml") for p in files) | ||
|
|
||
|
|
||
| def test_pick_oldest_returns_real_spec(): | ||
| spec_id, latest = pick_oldest() | ||
| assert spec_id # non-empty | ||
| assert latest # non-empty ISO | ||
|
|
||
|
|
||
| def test_validate_spec_known_id(): | ||
| spec_id, latest = validate_spec("sparkline-basic") | ||
| assert spec_id == "sparkline-basic" |
Comment on lines
+96
to
+99
| _id, latest = validate_spec(spec_id) | ||
| title = spec_title(spec_id) | ||
| libs = list_libraries(spec_id) | ||
| if not libs: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrite
/regenas a resumable single-step command so it runs reliably on local models with limited context windows (e.g. Gemma 3 26B at 256k), where the previous parallel-agent flow blew past the budget.Motivation
The previous
/regen:general-purposeagents (one per library) via experimental agent teams,update.mdlibrary-agent prompt,That's fine on hosted Sonnet but consistently aborts when pointed at a local Ollama endpoint with smaller context.
New design
/regenis now idempotent and resumable. State lives in.regen-plan.mdat repo root. Each invocation does exactly one thing, auto-detected from the state file:- [ ]items- [x]/- [!].regen-history/, exitPer spec, expect ~10 invocations (1 plan + 1 per library + 1 finalize). The lead does the work directly — no agent teams, no parallel agents, no
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMSflag.Resilience
quality:{N}+ai-approved/quality-poorlabels) before exiting, so a crash mid-flight loses at most the in-flight library — everything before is already a merged-or-pending PR.- [!]and skipped; the user can edit them back to- [ ]to retry.cat .regen-plan.mdshows progress;rm .regen-plan.mdabandons the current plan.Compatibility
quality:{N}+ai-approved/quality-poorlabels, andimpl-merge.ymlstill handles squash-merge, GCS staging→production, andimpl:{lib}:doneissue labelling.impl-review.ymlandimpl-repair.ymlare still never dispatched.Files
agentic/commands/regen.md— full rewrite (~420 lines, but each invocation only exercises the relevant section)..gitignore— add.regen-plan.md,.regen-preview/,.regen-history/next to existing.update-preview/and.worktrees/entries.Test Plan
/regenin a clean working tree → confirm.regen-plan.mdis created with the oldest spec and the correct library list, and the command exits without touching any implementation files./regenagain → confirm it picks the first unchecked library, ships a PR withquality:{N}+ the right approval label, ticks off the box, and exits./regenonce more → confirm.regen-plan.mdis moved to.regen-history/{spec_id}-{ts}.mdand the user is told to re-run for the next spec.gh run list --workflow=impl-review.yml --limit 5shows no new runs for the regen'd specs (cloud bypass intact).gsutilauth) → confirm the library is marked- [!], logged, worktree cleaned up, and subsequent invocations skip it.https://claude.ai/code/session_01Tuiif6dcyttLrc1KoKCmqR
Generated by Claude Code