Cap skill turns/timeout + sync local HEAD after skill pushes by rdimitrov · Pull Request #783 · stacklok/docs-website

rdimitrov · 2026-04-22T10:17:35Z

Three defensive additions driven by findings from PR #780's e2e test run.

1. Sync local HEAD after skill pushes (fixes two bugs)

Root cause: `claude-code-action` works in a sibling scratch checkout (the `.claude-pr/` dir we gitignore) and pushes its commits to origin without advancing the outer workflow checkout's HEAD. Every subsequent step that reads local git state saw stale HEAD.

Symptoms on PR #780's run (run 24769641826):

`skill_commits` reported 0 despite commit `d4b3c7c` existing on the PR branch → triggered the silent-run `[!NOTE]` on a run that produced real content.
`autofix` early-exited with "No skill-touched files in scope" because its `git diff BASELINE_SHA..HEAD` returned nothing → 41 unformatted files landed on CI and failed the Lint and format checks job.

Fix: new step between `skill_review` and `skill_commits`:

```yaml

name: Sync local branch with skill-pushed commits
run: |
git fetch origin "$HEAD_REF" --quiet
git merge --ff-only "origin/$HEAD_REF" || true
```

Local HEAD now reflects whatever the skill pushed, so downstream steps see the truth. Fixes both symptoms with one change.

2. `--max-turns` per skill invocation

Session	Cap	Baseline	Rationale
`skill_gen`	500	89 (silent) / 397 (full rebuild)	~25% headroom over worst legitimate run
`skill_review`	30	4-5 every run	6x buffer; cheap safety net

Hitting a cap fails the step loudly. If a release genuinely needs more, we raise deliberately.

3. `timeout-minutes` per skill step

Session	Cap	Baseline
`skill_gen`	45 min	15-22 min observed
`skill_review`	10 min	1-2 min observed

Kills a stuck process before it burns the full 90-min job budget. Step-level, independent of turn count.

Worst-case cost ceiling once capped

Gen hits 500 turns → ~$34 (extrapolating from PR Update toolhive to v0.23.1 (manual dispatch) #780's $26.95 at 397 turns)
Review hits 30 turns → ~$2
Per-run ceiling: ~$36

What this does NOT cover

Legitimate expensive runs (PR Update toolhive to v0.23.1 (manual dispatch) #780 at $27 was real work; would still cost $27).
Upstream Anthropic pricing changes.
High-frequency triggering (e.g. 20 retries in a row).

For those, set a monthly spend cap at the Anthropic console — that's the ultimate ceiling independent of any workflow change.

Validation

Next `workflow_dispatch` on rolled-back content should produce:

`skill_commits` count matching the actual skill commit count (non-zero when content was produced).
Autofix step finding and formatting skill-touched files before they hit CI.
No silent-run NOTE when commits exist.
If a runaway ever occurs, clean failure at cap with a clear error message instead of an unbounded spend.

Three defensive additions to the upstream-release-docs pipeline. ## Sync local HEAD with skill-pushed commits claude-code-action works in a sibling scratch checkout (the `.claude-pr/` dir we gitignore) and pushes its commits to origin without advancing the outer workflow checkout's HEAD. Every subsequent step that reads local git state saw stale HEAD at the pre-skill SHA. Symptoms on PR #780: - `skill_commits` counter reported 0 despite commit `d4b3c7c` existing on the PR branch -- triggered the silent-run NOTE on a run that wasn't actually silent. - `autofix` early-exited with "No skill-touched files in scope" because its `git diff BASELINE_SHA..HEAD` returned nothing. 41 unformatted files from the skill landed on CI and failed the Lint and format checks job. Fix: new step between `skill_review` and `skill_commits` that runs `git fetch origin $HEAD_REF && git merge --ff-only`. Local HEAD now reflects whatever the skill pushed, so downstream steps see the truth. ## --max-turns caps Per-session turn ceilings via claude_args: - skill_gen: 500 (baselines: 89 silent → 397 full rebuild) - skill_review: 30 (baseline: 4-5 turns every run) Clips genuine runaway loops without interfering with legitimate complex runs. Hitting a cap fails loudly; we raise deliberately if a release genuinely needs more. ## timeout-minutes per step Wall-clock ceilings: - skill_gen: 45 min (observed: 15-22 min) - skill_review: 10 min (observed: 1-2 min) Kills a stuck process before it burns the full 90-min job budget. ## What this does NOT cover - Legitimate-but-expensive runs (PR #780 at $27 is real work). - Upstream Anthropic pricing changes. - High-frequency workflow triggering. For those: set a monthly spend cap at the Anthropic console. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-04-22T10:17:41Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs-website	Ready	Preview, Comment	Apr 22, 2026 10:22am

Copilot

Pull request overview

Adds defensive safeguards to the upstream-release-docs GitHub Actions workflow to prevent runaway Claude sessions and to ensure downstream steps operate on the updated branch tip after the skill pushes commits.

Changes:

Add per-step timeout-minutes for skill_gen and skill_review.
Add --max-turns caps for skill_gen and skill_review invocations.
Sync the workflow checkout’s local HEAD with the remote PR branch after the skill runs/pushes commits.

Drops the `|| true` after `git merge --ff-only`. Silently continuing on a failed sync reintroduces the exact skill_commits=0 + autofix-skips-files bugs this step prevents. A no-op fast-forward (local already at origin) still exits 0 cleanly, so the common case is unaffected. Only genuine divergence or merge errors now fail the step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 22, 2026 10:17

vercel Bot deployed to Preview April 22, 2026 10:18 View deployment

rdimitrov enabled auto-merge (squash) April 22, 2026 10:20

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread .github/workflows/upstream-release-docs.yml Outdated

rdimitrov disabled auto-merge April 22, 2026 10:21

vercel Bot deployed to Preview April 22, 2026 10:22 View deployment

rdimitrov enabled auto-merge (squash) April 22, 2026 10:23

JAORMX approved these changes Apr 22, 2026

View reviewed changes

rdimitrov merged commit 5517fcd into main Apr 22, 2026
3 checks passed

rdimitrov deleted the robustness-limits-and-sync branch April 22, 2026 11:02

rdimitrov mentioned this pull request Apr 22, 2026

Fix stats parse + add Summary of changes section #785

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cap skill turns/timeout + sync local HEAD after skill pushes#783

Cap skill turns/timeout + sync local HEAD after skill pushes#783
rdimitrov merged 2 commits intomainfrom
robustness-limits-and-sync

rdimitrov commented Apr 22, 2026

Uh oh!

vercel Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rdimitrov commented Apr 22, 2026

1. Sync local HEAD after skill pushes (fixes two bugs)

2. `--max-turns` per skill invocation

3. `timeout-minutes` per skill step

Worst-case cost ceiling once capped

What this does NOT cover

Validation

Uh oh!

vercel Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented Apr 22, 2026 •

edited

Loading