ci[gpu]: run cuda micro-benchmarks with codspeed #7696
Merged
CodSpeed HQ / CodSpeed Performance Analysis
succeeded
Apr 29, 2026 in 0s
Performance Gate Passed
⚠️ Unknown Walltime execution environment detected
Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.
For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.
✅ 1130 untouched benchmarks
🆕 68 new benchmarks
⏩ 33 skipped benchmarks1
Performance Changes
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| 🆕 | WallTime | alp_decode[10%] |
N/A | 4.5 ms | N/A |
| 🆕 | WallTime | for[10M_u64] |
N/A | 349 µs | N/A |
| 🆕 | WallTime | alp_decode[0%] |
N/A | 3.3 ms | N/A |
| 🆕 | WallTime | dict[10M_u64_values_u8_codes] |
N/A | 218.1 µs | N/A |
| 🆕 | WallTime | runend[100M_i32_runlen_100000] |
N/A | 829.4 µs | N/A |
| 🆕 | WallTime | alp_decode[0%] |
N/A | 2.5 ms | N/A |
| 🆕 | WallTime | dict[10M_u32_values_u16_codes] |
N/A | 150.3 µs | N/A |
| 🆕 | WallTime | for[10M_u64] |
N/A | 202 µs | N/A |
| 🆕 | WallTime | alp_decode[10%] |
N/A | 6.9 ms | N/A |
| 🆕 | WallTime | for[10M_u16] |
N/A | 94.8 µs | N/A |
| 🆕 | WallTime | for[10M_u8] |
N/A | 73.1 µs | N/A |
| 🆕 | WallTime | mix[0%_in/100%_out] |
N/A | 226.3 µs | N/A |
| 🆕 | WallTime | runend[100M_i32_runlen_10] |
N/A | 1.4 ms | N/A |
| 🆕 | WallTime | alp_decode[1%] |
N/A | 6.6 ms | N/A |
| 🆕 | WallTime | mix[100%_in/0%_out] |
N/A | 449.5 µs | N/A |
| 🆕 | WallTime | for[10M_u16] |
N/A | 72.1 µs | N/A |
| 🆕 | WallTime | for[10M_u32] |
N/A | 114.2 µs | N/A |
| 🆕 | WallTime | runend[10M_i32_runlen_10] |
N/A | 160.6 µs | N/A |
| 🆕 | WallTime | dict[10M_u32_values_u8_codes] |
N/A | 129.5 µs | N/A |
| 🆕 | WallTime | runend[10M_i32_runlen_100000] |
N/A | 90.1 µs | N/A |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Comparing ad/gpu-codspeed (1fdbfb8) with develop (d2d79f0)
Footnotes
-
33 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Loading