-
Notifications
You must be signed in to change notification settings - Fork 718
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Guard/document single parameter feature for grouped linear
2.15.0
#2955
opened May 1, 2026 by
ksivaman
Member
Loading…
6 of 13 tasks
[PyTorch][Core] Fix CUBLAS GGEMM when weight dims are not divisible by 128
#2954
opened May 1, 2026 by
vthumbe1503
Collaborator
Loading…
13 tasks
[JAX] Remove xla deterministic arg for MNIST test to not timeout L2_jax_unittest CI
#2952
opened May 1, 2026 by
tdophung
Collaborator
Loading…
6 of 13 tasks
MXFP8 + FSDP2 checkpoint resume crashes in reset_sharded_param - add mxfp8 recpipe to fully shard
#2951
opened May 1, 2026 by
savitha-eng
Loading…
[Common, PyTorch] Add Triton MLA attention kernels for SM80
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2950
opened Apr 30, 2026 by
bzantium
Loading…
Add NVFP4 1x64 Local Encode Recipe
#2941
opened Apr 29, 2026 by
cael-ling
Contributor
Loading…
1 of 13 tasks
[Common/PyTorch/JAX] make offset of ClampedSwiGLU configurable
#2938
opened Apr 28, 2026 by
hxbai
Contributor
Loading…
13 tasks
Implement per-token NVFP4 fprop recipe
#2931
opened Apr 27, 2026 by
zianglih
Contributor
Loading…
8 of 13 tasks
[Common/PyTorch] Add MXFP8 cast-and-transpose op
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2930
opened Apr 26, 2026 by
jeweldave
Loading…
Fix WHEEL Tag mismatch in transformer-engine-cu12 wheels
#2928
opened Apr 25, 2026 by
eyupcanakman
Loading…
7 of 13 tasks
[PyTorch] Fix stale columnwise data usage
#2925
opened Apr 25, 2026 by
ksivaman
Member
Loading…
7 of 13 tasks
Make TE Sequential Grouped linear Op CUDA graphable
#2923
opened Apr 24, 2026 by
vthumbe1503
Collaborator
Loading…
13 tasks
[PyTorch] Add distributed Muon optimizer
2.16.0
#2920
opened Apr 23, 2026 by
vcherepanov-nv
Collaborator
Loading…
5 of 13 tasks
guard fuser grad checks on non-leaf nodes
#2919
opened Apr 23, 2026 by
CarlosGomes98
Contributor
Loading…
1 of 13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916
opened Apr 22, 2026 by
sudhakarsingh27
Collaborator
•
Draft
1 of 3 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2911
opened Apr 21, 2026 by
NoonePauseferg
Loading…
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2907
opened Apr 21, 2026 by
jing-4369
Loading…
3 of 4 tasks
Add head dim 256 support for SDPA on Blackwell
#2906
opened Apr 21, 2026 by
yaox12
Member
Loading…
1 of 13 tasks
[PyTorch] Expose function to bulk-allocate tensors backed by the same buffer
#2900
opened Apr 18, 2026 by
timmoon10
Collaborator
Loading…
9 of 13 tasks
Previous Next
ProTip!
no:milestone will show everything without a milestone.