Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[PyTorch] Guard/document single parameter feature for grouped linear 2.15.0
#2955 opened May 1, 2026 by ksivaman Member Loading…
6 of 13 tasks
[PyTorch][Core] Fix CUBLAS GGEMM when weight dims are not divisible by 128
#2954 opened May 1, 2026 by vthumbe1503 Collaborator Loading…
13 tasks
[JAX] Remove xla deterministic arg for MNIST test to not timeout L2_jax_unittest CI
#2952 opened May 1, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
[Common, PyTorch] Add Triton MLA attention kernels for SM80 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2950 opened Apr 30, 2026 by bzantium Loading…
[All] Remove max512 backend
#2949 opened Apr 30, 2026 by cyanguwa Collaborator Loading…
13 tasks
Add NVFP4 1x64 Local Encode Recipe
#2941 opened Apr 29, 2026 by cael-ling Contributor Loading…
1 of 13 tasks
[Common/PyTorch/JAX] make offset of ClampedSwiGLU configurable
#2938 opened Apr 28, 2026 by hxbai Contributor Loading…
13 tasks
Fix CUDA graph parameter grad lifetime
#2937 opened Apr 28, 2026 by buptzyb Contributor Loading…
[PyTorch] Enable head dim 256 for FA4
#2932 opened Apr 27, 2026 by yaox12 Member Draft
13 tasks
Implement per-token NVFP4 fprop recipe
#2931 opened Apr 27, 2026 by zianglih Contributor Loading…
8 of 13 tasks
[Common/PyTorch] Add MXFP8 cast-and-transpose op community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2930 opened Apr 26, 2026 by jeweldave Loading…
Fix WHEEL Tag mismatch in transformer-engine-cu12 wheels
#2928 opened Apr 25, 2026 by eyupcanakman Loading…
7 of 13 tasks
[PyTorch] Fix stale columnwise data usage
#2925 opened Apr 25, 2026 by ksivaman Member Loading…
7 of 13 tasks
Make TE Sequential Grouped linear Op CUDA graphable
#2923 opened Apr 24, 2026 by vthumbe1503 Collaborator Loading…
13 tasks
[PyTorch] Add distributed Muon optimizer 2.16.0
#2920 opened Apr 23, 2026 by vcherepanov-nv Collaborator Loading…
5 of 13 tasks
guard fuser grad checks on non-leaf nodes
#2919 opened Apr 23, 2026 by CarlosGomes98 Contributor Loading…
1 of 13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916 opened Apr 22, 2026 by sudhakarsingh27 Collaborator Draft
1 of 3 tasks
NVFP4 per-token recipe
#2913 opened Apr 21, 2026 by YigongQin Draft
1 of 13 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2911 opened Apr 21, 2026 by NoonePauseferg Loading…
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2907 opened Apr 21, 2026 by jing-4369 Loading…
3 of 4 tasks
Add head dim 256 support for SDPA on Blackwell
#2906 opened Apr 21, 2026 by yaox12 Member Loading…
1 of 13 tasks
[PyTorch] Expose function to bulk-allocate tensors backed by the same buffer
#2900 opened Apr 18, 2026 by timmoon10 Collaborator Loading…
9 of 13 tasks
ProTip! no:milestone will show everything without a milestone.