Add Gemma 4 text-decoder export to CoreML by john-rocky · Pull Request #19253 · pytorch/executorch

john-rocky · 2026-05-01T06:03:17Z

Summary

The Gemma 4 text decoder shipped with examples/models/gemma4/text_decoder/
already implements hybrid sliding/full attention, partial RoPE,
per-layer head_dim (256 for sliding / 512 for full), MQA, and YOCO
KV sharing in plain PyTorch.

I checked, and that implementation lowers cleanly through
torch.export and CoreMLPartitioner today — for the synthetic
10-layer Gemma 4 used in the new test, the lowered edge program
contains exactly executorch_call_delegate and getitem at the top
level (1186 MIL ops fully delegated). No portable fallbacks, no
unsupported ops.

So the missing piece is not new modeling code — it is the small amount
of glue that turns "exportable in principle" into "exportable from one
shell command". This PR adds that glue:

examples/apple/coreml/gemma4/export_gemma4_text_decoder_coreml.py,
with sensible CoreML defaults: iOS18+ deployment target so the
YOCO KV caches can be taken over as stateful tensors,
compute_unit=CPU_AND_NE, fp16 by default (the ANE requires fp16).
A --random_weights mode for smoke-testing the export pipeline
without a HuggingFace checkpoint, plus --config_json,
--sliding_window, --sliding_window_pattern overrides.
A readme.md documenting the flags and the "everything delegates"
property.
A BUCK target so the script is buildable in fbcode the same way
the existing CoreML llama scripts are.

The audio and vision encoders are intentionally out of scope — the
existing ATen pipeline in examples/models/gemma4 is more appropriate
for those.

Test plan

examples/apple/coreml/gemma4/test.py builds a 10-layer synthetic
Gemma 4 (4 sliding + 1 full × 2) — same hybrid pattern as Gemma 4 E2B,
just at smaller dimensions — and runs the full export pipeline,
asserting the resulting .pte is non-empty.

$ python -m pytest examples/apple/coreml/gemma4/test.py -v
test.py::TestGemma4CoreMLExport::test_eager_forward_runs PASSED
test.py::TestGemma4CoreMLExport::test_full_export_pipeline_lowers_to_coreml PASSED
============================== 2 passed in 15.32s ==============================

I also ran the export by hand and confirmed the resulting edge program
is fully delegated.

Relationship to other open PRs

Add --sliding_window flag to CoreML static LLM export #19250 / Add per-layer hybrid sliding/full attention (Gemma 3 / Gemma 4) to CoreML static LLM export #19251 add --sliding_window / --sliding_window_pattern
for the static-LLM Llama path. Gemma 4's text decoder uses a
different attention implementation (per-layer head_dim, partial
RoPE, etc.) that already understands those concepts via Gemma4Config,
so this PR doesn't depend on those — it just plumbs the equivalent
overrides through to Gemma4Config directly.
Add coreml_compute_plan.py: report which CoreML ops dispatch to ANE / GPU / CPU #19252 adds coreml_compute_plan.py, which is the natural next step
for tuning a Gemma 4 export: run it against the produced .pte to
see which ops the runtime would dispatch to the ANE vs the CPU.

Authored with Claude.

The Gemma 4 text decoder shipped with examples/models/gemma4 already implements hybrid sliding/full attention, partial RoPE, per-layer head_dim, MQA, and YOCO KV sharing in plain PyTorch. That implementation lowers cleanly through torch.export and CoreMLPartitioner — every node in the resulting edge program is a single executorch_call_delegate and a getitem. This script wires up the small amount of glue needed for an on-device-friendly default: * compile_specs targeting iOS18+ so the YOCO KV caches can be taken over as stateful tensors. * fp16 by default (the ANE requires fp16). * compute_unit=CPU_AND_NE so the runtime is free to keep ops on the ANE. * Optional --random_weights mode for smoke-testing the export without a HuggingFace checkpoint, plus --config_json / --sliding_window / --sliding_window_pattern overrides. Audio and vision encoders are intentionally out of scope here — the existing ATen pipeline in examples/models/gemma4 is more appropriate for those. ### Test plan `test.py` builds a 10-layer synthetic Gemma 4 (4 sliding + 1 full × 2) and runs the full export pipeline, asserting the resulting .pte exists. $ python -m pytest examples/apple/coreml/gemma4/test.py -v test.py::TestGemma4CoreMLExport::test_eager_forward_runs PASSED test.py::TestGemma4CoreMLExport::test_full_export_pipeline_lowers_to_coreml PASSED ============================== 2 passed in 15.32s ============================== I also ran the export by hand against the synthetic config and confirmed the lowered edge program contains only `executorch_call_delegate` and `getitem` at the top level. Authored with Claude.

pytorch-bot · 2026-05-01T06:03:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19253

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-01T06:04:09Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

john-rocky requested a review from metascroy as a code owner May 1, 2026 06:03

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 4 text-decoder export to CoreML#19253

Add Gemma 4 text-decoder export to CoreML#19253
john-rocky wants to merge 1 commit intopytorch:mainfrom
john-rocky:coreml/gemma4-text-decoder

john-rocky commented May 1, 2026

Uh oh!

pytorch-bot Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

john-rocky commented May 1, 2026

Summary

Test plan

Relationship to other open PRs

Uh oh!

pytorch-bot Bot commented May 1, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19253

Uh oh!

github-actions Bot commented May 1, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

This PR needs a `release notes:` label