Skip to content

⚡ Copilot Token Optimization2026-04-15 — build-test.md #1982

@github-actions

Description

@github-actions

Target Workflow: build-test.md

Source report: #1981
Estimated cost per run: N/A (api-proxy estimated_cost not populated; token counts from AWF firewall logs)
Total tokens per run: ~621K (avg of 5 complete runs; report avg 443K including 2 cancelled)
Effective tokens per run: ~697K
Cache hit rate: 91% (cache_read / total_input)
LLM requests per run: 13.8 avg (range: 11–16)
Model: claude-sonnet-4.6


Current Configuration

Setting Value
Tools loaded bash: ["*"], github: (no toolset restriction — loads ~22 tools)
Tools actually used bash, github PR comment/label via safeoutputs
Network groups defaults, github, node, go, rust, crates.io, java, dotnet, bun.sh, deno.land, jsr.io, dl.deno.land (12 groups)
Pre-agent steps No
Post-agent steps No
Prompt size 7,640 chars (~8,500 tokens in system context)
Output tokens/run ~6.9K avg (tiny — agent is almost entirely orchestrating bash, not generating text)

Root cause: The workflow sequentially clones 8 repos and runs 18 test projects one-by-one, making a separate LLM request after each bash call. This compounds context: each request receives the full system prompt (~40K tokens) plus all accumulated bash output from prior steps. By request 15, the context has grown to ~43K tokens/request on average, totaling 621K input tokens for a workflow that produces only ~7K tokens of useful output.


Recommendations

1. Script-First Execution Pattern

Estimated savings: ~460K tokens/run (~74%)

Currently the agent executes 18 test projects interactively — one bash call per step, with each result appended to context. With 13–16 round trips, the rolling context alone accounts for 600K+ input tokens.

Fix: Restructure the prompt to have the agent write a single comprehensive bash script, execute it once, then summarize from log files. This collapses 14 round trips into 3.

Replace the current task-by-task prompt structure with:

## Instructions

Write a single bash script `/tmp/run-all-tests.sh` that:
1. Runs all 8 ecosystem setups and tests
2. Captures all output to `/tmp/test-results/<ecosystem>.log`
3. Exits 0 regardless of individual test failures

Execute the script with `bash /tmp/run-all-tests.sh`, then read each log file
and post the combined summary table as a PR comment.

## Setup

**Bun:**
```bash
curl -fsSL (bun.sh/redacted) | bash
export BUN_INSTALL="$HOME/.bun" PATH="$BUN_INSTALL/bin:$PATH"
gh repo clone Mossaka/gh-aw-firewall-test-bun /tmp/test-bun

Test: cd /tmp/test-bun/elysia && bun install && bun test; cd /tmp/test-bun/hono && bun install && bun test

... (same pattern for each ecosystem)


**Expected request pattern after change:**
1. Agent writes `/tmp/run-all-tests.sh` (~2K output tokens)
2. Agent runs it (1 bash call, stdout captured in logs)
3. Agent reads log files and posts PR comment

**Token projection:**
| | Current | Projected |
|--|---------|-----------|
| LLM requests/run | ~14 | ~3 |
| Input tokens | ~614K | ~150K |
| Output tokens | ~7K | ~7K |
| **Total tokens** | **~621K** | **~157K** |

The cache hit rate should remain high (~90%) since the stable system prompt portion doesn't change.

---

### 2. Restrict GitHub MCP Toolset

**Estimated savings: ~154K tokens/run (~25%) at current request count; ~33K/run after Rec 1**

The `github:` tool in `build-test.md` loads without a `toolsets:` restriction. The default loads ~22 tool schemas into every LLM context. The workflow only needs:
- `create_pull_request_review_comment` or `add_issue_comment` (but these are handled by `safeoutputs`)
- Likely just `list_pull_requests` for PR context

**Fix:** Add `toolsets` restriction in `build-test.md`:

```diff
 tools:
   bash:
     - "*"
   github:
     github-token: "$\{\{ secrets.GH_AW_GITHUB_MCP_SERVER_TOKEN }}"
+    toolsets: [pull_requests]

This reduces loaded tools from ~22 → ~4, saving ~11K tokens per request.

At current 14 requests/run: 14 × 11K = 154K tokens/run saved (-25%)
After Rec 1 (3 requests/run): 3 × 11K = 33K tokens/run saved (-21% of projected 157K)


3. Truncate Bash Output in Context

Estimated savings: ~40K tokens/run (~7%) — applies after Rec 1 to reduce log reading overhead

Even with the script-first approach, when the agent reads log files, verbose build output (cargo compile warnings, Maven download progress, npm install trees) gets added to context. Each ecosystem's full log can be 3–5K tokens.

Fix: Add output truncation instruction to the prompt:

When reading log files, use `tail -30` to capture only the final lines
(test results and exit codes). If a test fails, re-read the full log to
get error details.

This reduces per-log read from ~4K → ~0.5K tokens × 8 ecosystems = ~28K tokens saved.


4. Compress Prompt Body

Estimated savings: ~10K tokens/run (~1.6%)

The current 7,640-char prompt contains redundant elements that inflate the system context:

  • 18-row output table template (~1,200 chars) — the agent doesn't need a pre-filled table to follow; just describe the format in one line
  • 9 CRITICAL: annotations (~600 chars) — consolidate into a single error-handling section
  • Maven settings.xml block (~400 chars) — move to a pre-agent step: (see Rec 5)
  • Repetitive "Clone Repository / CRITICAL: If clone fails" pattern for all 8 ecosystems — abstract into a single rule

Estimated compressed size: ~4,000 chars (-48%). This saves ~4K tokens × 14 requests × 7% uncached = ~4K uncached tokens (minor absolute impact but improves clarity).


5. Pre-Agent Step: Maven Settings

Estimated savings: ~400 chars of prompt; removes agent error surface

The Maven proxy settings are deterministic. Move them to a steps: block:

+steps:
+  - name: Configure Maven proxy
+    run: |
+      mkdir -p ~/.m2
+      cat > ~/.m2/settings.xml << 'EOF'
+      <settings><proxies>
+        <proxy><id>awf-http</id><active>true</active><protocol>http</protocol>
+          <host>squid-proxy</host><port>3128</port></proxy>
+        <proxy><id>awf-https</id><active>true</active><protocol>https</protocol>
+          <host>squid-proxy</host><port>3128</port></proxy>
+      </proxies></settings>
+      EOF

Then remove the entire "Configure Maven Proxy" section from the agent prompt. This eliminates a common agent error source (agent sometimes forgets or formats the XML incorrectly).


Expected Impact

Metric Current Projected Savings
Total tokens/run ~621K ~120K -81%
Effective tokens/run ~697K ~140K -80%
LLM requests/run ~14 ~3–4 -75%
Uncached input/run ~53K ~15K -72%
Session duration ~107s ~35s (est.) -67%
Cache hit rate 91% 90% ≈same

Projections assume Rec 1 (script-first) + Rec 2 (toolset restriction) are both applied.
Individual contribution: Rec 1 = -74%; Rec 2 = -21% on top (of reduced baseline); Rec 3 = -7% on top.


Implementation Checklist

  • Rec 1: Restructure build-test.md prompt to script-first pattern (write → execute → summarize)
  • Rec 2: Add toolsets: [pull_requests] under github: in build-test.md
  • Rec 3: Add tail -30 instruction for log reading in prompt
  • Rec 4: Remove 18-row table template; consolidate 9 CRITICAL: blocks into one error-handling section
  • Rec 5: Add steps: pre-agent step to create ~/.m2/settings.xml; remove from prompt
  • Recompile: gh aw compile .github/workflows/build-test.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Verify CI passes on PR
  • Compare token usage on new run vs this baseline (~621K → target ~120K)

Data Sources

  • Report: 📊 Copilot Token Usage Report2026-04-14 #1981 (2026-04-14, period 2026-04-13T23:31Z – 2026-04-14T22:56Z)
  • Run token data from /tmp/gh-aw/token-audit/copilot-logs.json (7 runs analyzed; 5 with full token data)
  • Workflow source: .github/workflows/build-test.md (8,592 bytes)

Generated by Daily Copilot Token Optimization Advisor · ● 579.1K ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions