⚡ Copilot Token Optimization2026-04-15 — build-test.md

## Target Workflow: `build-test.md`

**Source report:** #1981
**Estimated cost per run:** N/A _(api-proxy `estimated_cost` not populated; token counts from AWF firewall logs)_
**Total tokens per run:** ~621K (avg of 5 complete runs; report avg 443K including 2 cancelled)
**Effective tokens per run:** ~697K
**Cache hit rate:** 91% (cache_read / total_input)
**LLM requests per run:** 13.8 avg (range: 11–16)
**Model:** `claude-sonnet-4.6`

---

## Current Configuration

| Setting | Value |
|---------|-------|
| Tools loaded | `bash: ["*"]`, `github:` (no toolset restriction — loads ~22 tools) |
| Tools actually used | `bash`, github PR comment/label via safeoutputs |
| Network groups | `defaults`, `github`, `node`, `go`, `rust`, `crates.io`, `java`, `dotnet`, `bun.sh`, `deno.land`, `jsr.io`, `dl.deno.land` (12 groups) |
| Pre-agent steps | No |
| Post-agent steps | No |
| Prompt size | 7,640 chars (~8,500 tokens in system context) |
| Output tokens/run | ~6.9K avg (tiny — agent is almost entirely orchestrating bash, not generating text) |

**Root cause:** The workflow sequentially clones 8 repos and runs 18 test projects one-by-one, making a separate LLM request after each bash call. This compounds context: each request receives the full system prompt (~40K tokens) **plus** all accumulated bash output from prior steps. By request 15, the context has grown to ~43K tokens/request on average, totaling 621K input tokens for a workflow that produces only ~7K tokens of useful output.

---

## Recommendations

### 1. Script-First Execution Pattern

**Estimated savings: ~460K tokens/run (~74%)**

Currently the agent executes 18 test projects interactively — one bash call per step, with each result appended to context. With 13–16 round trips, the rolling context alone accounts for 600K+ input tokens.

**Fix:** Restructure the prompt to have the agent write a single comprehensive bash script, execute it once, then summarize from log files. This collapses 14 round trips into 3.

Replace the current task-by-task prompt structure with:

```markdown
## Instructions

Write a single bash script `/tmp/run-all-tests.sh` that:
1. Runs all 8 ecosystem setups and tests
2. Captures all output to `/tmp/test-results/<ecosystem>.log`
3. Exits 0 regardless of individual test failures

Execute the script with `bash /tmp/run-all-tests.sh`, then read each log file
and post the combined summary table as a PR comment.

## Setup

**Bun:**
```bash
curl -fsSL (bun.sh/redacted) | bash
export BUN_INSTALL="$HOME/.bun" PATH="$BUN_INSTALL/bin:$PATH"
gh repo clone Mossaka/gh-aw-firewall-test-bun /tmp/test-bun
```
Test: `cd /tmp/test-bun/elysia && bun install && bun test`; `cd /tmp/test-bun/hono && bun install && bun test`

... (same pattern for each ecosystem)
```

**Expected request pattern after change:**
1. Agent writes `/tmp/run-all-tests.sh` (~2K output tokens)
2. Agent runs it (1 bash call, stdout captured in logs)
3. Agent reads log files and posts PR comment

**Token projection:**
| | Current | Projected |
|--|---------|-----------|
| LLM requests/run | ~14 | ~3 |
| Input tokens | ~614K | ~150K |
| Output tokens | ~7K | ~7K |
| **Total tokens** | **~621K** | **~157K** |

The cache hit rate should remain high (~90%) since the stable system prompt portion doesn't change.

---

### 2. Restrict GitHub MCP Toolset

**Estimated savings: ~154K tokens/run (~25%) at current request count; ~33K/run after Rec 1**

The `github:` tool in `build-test.md` loads without a `toolsets:` restriction. The default loads ~22 tool schemas into every LLM context. The workflow only needs:
- `create_pull_request_review_comment` or `add_issue_comment` (but these are handled by `safeoutputs`)
- Likely just `list_pull_requests` for PR context

**Fix:** Add `toolsets` restriction in `build-test.md`:

```diff
 tools:
   bash:
     - "*"
   github:
     github-token: "$\{\{ secrets.GH_AW_GITHUB_MCP_SERVER_TOKEN }}"
+    toolsets: [pull_requests]
```

This reduces loaded tools from ~22 → ~4, saving ~11K tokens per request.

At current 14 requests/run: **14 × 11K = 154K tokens/run saved (-25%)**
After Rec 1 (3 requests/run): 3 × 11K = **33K tokens/run saved (-21% of projected 157K)**

---

### 3. Truncate Bash Output in Context

**Estimated savings: ~40K tokens/run (~7%) — applies after Rec 1 to reduce log reading overhead**

Even with the script-first approach, when the agent reads log files, verbose build output (cargo compile warnings, Maven download progress, npm install trees) gets added to context. Each ecosystem's full log can be 3–5K tokens.

**Fix:** Add output truncation instruction to the prompt:

```markdown
When reading log files, use `tail -30` to capture only the final lines
(test results and exit codes). If a test fails, re-read the full log to
get error details.
```

This reduces per-log read from ~4K → ~0.5K tokens × 8 ecosystems = **~28K tokens saved**.

---

### 4. Compress Prompt Body

**Estimated savings: ~10K tokens/run (~1.6%)**

The current 7,640-char prompt contains redundant elements that inflate the system context:

- **18-row output table template** (~1,200 chars) — the agent doesn't need a pre-filled table to follow; just describe the format in one line
- **9 `CRITICAL:` annotations** (~600 chars) — consolidate into a single error-handling section
- **Maven settings.xml block** (~400 chars) — move to a pre-agent `step:` (see Rec 5)
- **Repetitive "Clone Repository / CRITICAL: If clone fails"** pattern for all 8 ecosystems — abstract into a single rule

Estimated compressed size: ~4,000 chars (-48%). This saves ~4K tokens × 14 requests × 7% uncached = **~4K uncached tokens** (minor absolute impact but improves clarity).

---

### 5. Pre-Agent Step: Maven Settings

**Estimated savings: ~400 chars of prompt; removes agent error surface**

The Maven proxy settings are deterministic. Move them to a `steps:` block:

```diff
+steps:
+  - name: Configure Maven proxy
+    run: |
+      mkdir -p ~/.m2
+      cat > ~/.m2/settings.xml << 'EOF'
+      <settings><proxies>
+        <proxy><id>awf-http</id><active>true</active><protocol>http</protocol>
+          <host>squid-proxy</host><port>3128</port></proxy>
+        <proxy><id>awf-https</id><active>true</active><protocol>https</protocol>
+          <host>squid-proxy</host><port>3128</port></proxy>
+      </proxies></settings>
+      EOF
```

Then remove the entire "Configure Maven Proxy" section from the agent prompt. This eliminates a common agent error source (agent sometimes forgets or formats the XML incorrectly).

---

## Expected Impact

| Metric | Current | Projected | Savings |
|--------|---------|-----------|---------|
| Total tokens/run | ~621K | ~120K | **-81%** |
| Effective tokens/run | ~697K | ~140K | **-80%** |
| LLM requests/run | ~14 | ~3–4 | **-75%** |
| Uncached input/run | ~53K | ~15K | **-72%** |
| Session duration | ~107s | ~35s (est.) | **-67%** |
| Cache hit rate | 91% | 90% | ≈same |

> Projections assume Rec 1 (script-first) + Rec 2 (toolset restriction) are both applied.
> Individual contribution: Rec 1 = -74%; Rec 2 = -21% on top (of reduced baseline); Rec 3 = -7% on top.

---

## Implementation Checklist

- [ ] **Rec 1:** Restructure `build-test.md` prompt to script-first pattern (write → execute → summarize)
- [ ] **Rec 2:** Add `toolsets: [pull_requests]` under `github:` in `build-test.md`
- [ ] **Rec 3:** Add `tail -30` instruction for log reading in prompt
- [ ] **Rec 4:** Remove 18-row table template; consolidate 9 `CRITICAL:` blocks into one error-handling section
- [ ] **Rec 5:** Add `steps:` pre-agent step to create `~/.m2/settings.xml`; remove from prompt
- [ ] Recompile: `gh aw compile .github/workflows/build-test.md`
- [ ] Post-process: `npx tsx scripts/ci/postprocess-smoke-workflows.ts`
- [ ] Verify CI passes on PR
- [ ] Compare token usage on new run vs this baseline (~621K → target ~120K)

---

## Data Sources

- Report: #1981 (2026-04-14, period 2026-04-13T23:31Z – 2026-04-14T22:56Z)
- Run token data from `/tmp/gh-aw/token-audit/copilot-logs.json` (7 runs analyzed; 5 with full token data)
- Workflow source: `.github/workflows/build-test.md` (8,592 bytes)




> Generated by [Daily Copilot Token Optimization Advisor](https://github.com/github/gh-aw-firewall/actions/runs/24427638400/agentic_workflow) · ● 579.1K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fcopilot-token-optimizer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Copilot Token Optimization2026-04-15 — build-test.md #1982

Target Workflow: `build-test.md`

Current Configuration

Recommendations

1. Script-First Execution Pattern

3. Truncate Bash Output in Context

4. Compress Prompt Body

5. Pre-Agent Step: Maven Settings

Expected Impact

Implementation Checklist

Data Sources

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Value
Tools loaded	`bash: ["*"]`, `github:` (no toolset restriction — loads ~22 tools)
Tools actually used	`bash`, github PR comment/label via safeoutputs
Network groups	`defaults`, `github`, `node`, `go`, `rust`, `crates.io`, `java`, `dotnet`, `bun.sh`, `deno.land`, `jsr.io`, `dl.deno.land` (12 groups)
Pre-agent steps	No
Post-agent steps	No
Prompt size	7,640 chars (~8,500 tokens in system context)
Output tokens/run	~6.9K avg (tiny — agent is almost entirely orchestrating bash, not generating text)

Metric	Current	Projected	Savings
Total tokens/run	~621K	~120K	-81%
Effective tokens/run	~697K	~140K	-80%
LLM requests/run	~14	~3–4	-75%
Uncached input/run	~53K	~15K	-72%
Session duration	~107s	~35s (est.)	-67%
Cache hit rate	91%	90%	≈same

⚡ Copilot Token Optimization2026-04-15 — build-test.md #1982

Description

Target Workflow: build-test.md

Current Configuration

Recommendations

1. Script-First Execution Pattern

3. Truncate Bash Output in Context

4. Compress Prompt Body

5. Pre-Agent Step: Maven Settings

Expected Impact

Implementation Checklist

Data Sources

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Target Workflow: `build-test.md`