runpod · deanq · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026
diff --git a/.github/workflows/cd.yml b/.github/workflows/cd.yml
@@ -102,6 +102,48 @@ jobs:
         with:
           verbose: true
 
+  upload-gpu-test-asset:
+    name: Upload gpu_test binary to release
+    needs: [release-please, pypi-publish]
+    if: ${{ always() && (needs.release-please.outputs.release_created || (github.event_name == 'workflow_dispatch' && inputs.force_publish == 'true')) }}
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v5
+        with:
+          ref: ${{ needs.release-please.outputs.tag_name }}
+
+      - name: Login to Docker Hub (optional)
+        if: ${{ vars.DOCKERHUB_USERNAME != '' }}
+        uses: docker/login-action@v3
+        with:
+          username: ${{ vars.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Compile gpu_test binary
+        run: |
+          cd build_tools
+          ./compile_gpu_test.sh
+          cd ..
+          test -f runpod/serverless/binaries/gpu_test
+
+      - name: Generate sha256 checksum
+        working-directory: runpod/serverless/binaries
+        run: |
+          sha256sum gpu_test > gpu_test.sha256
+          cat gpu_test.sha256
+
+      - name: Upload binary to release
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          gh release upload "${{ needs.release-please.outputs.tag_name }}" \
+            runpod/serverless/binaries/gpu_test \
+            runpod/serverless/binaries/gpu_test.sha256 \
+            --clobber
+
   # TODO: Re-enable after optimizing (17 parallel jobs each sleeping 5min is wasteful).
   #        Consider a single job that sleeps once then dispatches sequentially.
   # notify-workers:

diff --git a/.gitignore b/.gitignore
@@ -142,3 +142,6 @@ runpod/_version.py
 
 *.lock
 benchmark_results/
+
+# Locally-compiled CUDA test binary — CI compiles per-release
+runpod/serverless/binaries/gpu_test
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,22 @@
 # Changelog
 
+## Unreleased
+
+### Changed
+
+- **gpu_test binary no longer bundled in the PyPI wheel.** Fixes installs on
+  Nix and other non-glibc platforms ([#498](https://github.com/runpod/runpod-python/issues/498)).
+  Runtime falls back to an `nvidia-smi`-based availability check when the
+  binary is missing. Runpod GPU workers should add
+  `RUN runpod install-gpu-test` after `pip install runpod` to restore the
+  native CUDA memory-allocation test.
+
+### Added
+
+- `runpod install-gpu-test` CLI command — downloads the `gpu_test` binary
+  from the GitHub release matching the installed runpod version, verifies
+  sha256, and installs it into the package's `serverless/binaries/` directory.
+
 ## [1.9.0](https://github.com/runpod/runpod-python/compare/v1.8.2...v1.9.0) (2026-04-08)
 
 

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,4 +1,4 @@
-include runpod/serverless/binaries/gpu_test
 include runpod/serverless/binaries/README.md
 include build_tools/gpu_test.c
 include build_tools/compile_gpu_test.sh
+exclude runpod/serverless/binaries/gpu_test
diff --git a/docs/serverless/gpu_binary_compilation.md b/docs/serverless/gpu_binary_compilation.md
@@ -4,13 +4,30 @@ This document explains how to rebuild the `gpu_test` binary for GPU health check
 
 ## When to Rebuild
 
-You typically **do not need to rebuild** the binary. A pre-compiled version is included in the runpod-python package and works across most GPU environments. Rebuild only when:
+You typically **do not need to rebuild** the binary. A pre-compiled version is published as a GitHub release asset and can be installed with `runpod install-gpu-test` (see next section). Rebuild only when:
 
 - You need to modify the GPU test logic (in `build_tools/gpu_test.c`)
 - Targeting specific new CUDA versions
 - Adding support for new GPU architectures
 - Fixing compilation issues for your specific environment
 
+## Installing from a release
+
+As of v1.10.0, the `gpu_test` binary is **not bundled** in the PyPI wheel so the package stays platform-agnostic (fixes [#498](https://github.com/runpod/runpod-python/issues/498) — Nix / non-glibc builds).
+
+Runpod GPU workers that want the native CUDA memory-allocation test back should run:
+
+```bash
+pip install runpod
+runpod install-gpu-test
+```
+
+This downloads `gpu_test` from the GitHub release matching the installed runpod version, verifies its sha256, and places it at `runpod/serverless/binaries/gpu_test` inside the installed package.
+
+If the binary is missing, the runtime falls back to an `nvidia-smi`-based availability check (no memory-allocation test).
+
+Advanced users can override the binary path with the `RUNPOD_BINARY_GPU_TEST_PATH` environment variable.
+
 ## Prerequisites
 
 You need Docker installed to build the binary:

diff --git a/docs/serverless/worker_fitness_checks.md b/docs/serverless/worker_fitness_checks.md
@@ -169,10 +169,19 @@ GPU workers automatically run a built-in fitness check that validates GPU memory
 The check:
 - Tests actual GPU memory allocation (cudaMalloc) to ensure GPUs are accessible
 - Enumerates all detected GPUs and validates each one
-- Uses a native CUDA binary for comprehensive testing
-- Falls back to Python-based checks if the binary is unavailable
+- Uses a native CUDA binary for comprehensive testing (opt-in; see below)
+- Falls back to an `nvidia-smi` availability check if the binary is unavailable
 - Skips silently on CPU-only workers (allows same code for CPU/GPU)
 
+**Installing the native binary**: as of v1.10.0 the `gpu_test` binary is not
+bundled in the PyPI wheel. Runpod GPU worker Dockerfiles should add:
+
+```dockerfile
+RUN pip install runpod && runpod install-gpu-test
+```
+
+See [GPU Binary Compilation](./gpu_binary_compilation.md) for details.
+
 ```python
 import runpod
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -49,7 +49,6 @@ include-package-data = true
 
 [tool.setuptools.package-data]
 runpod = [
-    "serverless/binaries/gpu_test",
     "serverless/binaries/README.md",
 ]
 

diff --git a/runpod/cli/entry.py b/runpod/cli/entry.py
@@ -8,6 +8,7 @@
 
 from .groups.config.commands import config_wizard
 from .groups.exec.commands import exec_cli
+from .groups.install.commands import install_gpu_test_cli
 from .groups.pod.commands import pod_cli
 from .groups.project.commands import project_cli
 from .groups.ssh.commands import ssh_cli
@@ -24,3 +25,4 @@ def runpod_cli():
 runpod_cli.add_command(pod_cli)  # runpod pod
 runpod_cli.add_command(exec_cli)  # runpod exec
 runpod_cli.add_command(project_cli)  # runpod project
+runpod_cli.add_command(install_gpu_test_cli)  # runpod install-gpu-test
diff --git a/runpod/cli/groups/install/__init__.py b/runpod/cli/groups/install/__init__.py
@@ -0,0 +1 @@
+"""GPU test binary installer CLI."""
diff --git a/runpod/cli/groups/install/commands.py b/runpod/cli/groups/install/commands.py
@@ -0,0 +1,65 @@
+"""
+CLI commands for installing optional runpod binaries.
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+import click
+
+import runpod
+from runpod.version import get_version
+
+from .functions import (
+    BinaryChecksumMismatch,
+    BinaryDownloadError,
+    download_gpu_test_binary,
+)
+
+
+def _default_install_path() -> Path:
+    """Package-local binaries dir — the same path _binary_helpers checks."""
+    return Path(runpod.__file__).parent / "serverless" / "binaries" / "gpu_test"
+
+
+@click.command(
+    "install-gpu-test",
+    help=(
+        "Download the optional gpu_test CUDA health-check binary from the "
+        "GitHub release matching the installed runpod version. "
+        "Runpod GPU workers only — no-op on CPU-only environments."
+    ),
+)
+@click.option(
+    "--version",
+    "version",
+    default=None,
+    help="Release tag to download (defaults to installed runpod version).",
+)
+@click.option(
+    "--dest",
+    "dest",
+    type=click.Path(dir_okay=False, writable=True, path_type=Path),
+    default=None,
+    help="Override destination path. Defaults to the package's binaries dir.",
+)
+def install_gpu_test_cli(version: str | None, dest: Path | None) -> None:
+    version = version or get_version()
+    if version == "unknown":
+        click.echo(
+            "Cannot determine installed runpod version; pass --version explicitly.",
+            err=True,
+        )
+        sys.exit(1)
+
+    target = dest or _default_install_path()
+
+    try:
+        installed_at = download_gpu_test_binary(version=version, dest=target)
+    except (BinaryDownloadError, BinaryChecksumMismatch) as exc:
+        click.echo(f"Failed to install gpu_test: {exc}", err=True)
+        sys.exit(1)
+
+    click.echo(f"Installed gpu_test at {installed_at}")
diff --git a/runpod/cli/groups/install/functions.py b/runpod/cli/groups/install/functions.py
@@ -0,0 +1,108 @@
+"""
+Download and install the optional gpu_test binary from a GitHub release.
+
+The binary is NOT bundled in PyPI wheels to keep them universal
+(py3-none-any). Runpod GPU workers that want the native CUDA memory
+allocation test can fetch it from the GitHub release matching their
+installed runpod version.
+
+See docs/serverless/gpu_binary_compilation.md for usage.
+"""
+
+from __future__ import annotations
+
+import hashlib
+import os
+import tempfile
+import urllib.error
+import urllib.request
+from dataclasses import dataclass
+from pathlib import Path
+
+GITHUB_REPO = "runpod/runpod-python"
+DOWNLOAD_TIMEOUT_SECONDS = 60
+
+
+@dataclass(frozen=True)
+class ReleaseAssetUrls:
+    binary: str
+    checksum: str
+
+
+class BinaryDownloadError(RuntimeError):
+    """Raised when the binary or checksum cannot be fetched."""
+
+
+class BinaryChecksumMismatch(RuntimeError):
+    """Raised when the downloaded binary's sha256 does not match the expected value."""
+
+
+def release_asset_urls(version: str) -> ReleaseAssetUrls:
+    """Build release-asset URLs for a given runpod version.
+
+    Accepts either '1.9.0' or 'v1.9.0' — the leading 'v' is optional.
+    """
+    clean = version.lstrip("v")
-    clean = version.lstrip("v")
+    clean = version[1:] if version.startswith("v") else version
-    clean = version.lstrip("v")
+    clean = version[1:] if version.startswith("v") else version
+    base = f"https://github.com/{GITHUB_REPO}/releases/download/v{clean}/gpu_test"
+    return ReleaseAssetUrls(binary=base, checksum=f"{base}.sha256")
+
+
+def _fetch(url: str) -> bytes:
+    try:
+        with urllib.request.urlopen(url, timeout=DOWNLOAD_TIMEOUT_SECONDS) as response:
+            return response.read()
+    except urllib.error.HTTPError as exc:
+        raise BinaryDownloadError(
+            f"HTTP {exc.code} fetching {url}: {exc.reason}"
+        ) from exc
+    except urllib.error.URLError as exc:
+        raise BinaryDownloadError(
+            f"Network error fetching {url}: {exc.reason!r}"
+        ) from exc
+
+
+def _parse_sha256(checksum_body: bytes) -> str:
+    """Extract the hex digest from a 'sha256  filename' line."""
+    text = checksum_body.decode("utf-8", errors="replace").strip()
+    first_token = text.split()[0] if text else ""
+    if len(first_token) != 64:
+        raise BinaryDownloadError(
+            f"checksum file did not contain a sha256 digest: {text!r}"
+        )
+    return first_token.lower()
+
+
+def download_gpu_test_binary(version: str, dest: Path) -> Path:
+    """Download gpu_test from the matching GitHub release and install it at dest.
+
+    Verifies sha256 before writing to the final destination. On checksum
+    mismatch or HTTP failure, no partial file is left at dest.
+
+    Returns the destination path on success.
+    """
+    urls = release_asset_urls(version)
+
+    checksum_body = _fetch(urls.checksum)
+    expected_sha = _parse_sha256(checksum_body)
+
+    binary_body = _fetch(urls.binary)
+    actual_sha = hashlib.sha256(binary_body).hexdigest()
+    if actual_sha != expected_sha:
+        raise BinaryChecksumMismatch(
+            f"sha256 mismatch for {urls.binary} "
+            f"({len(binary_body)} bytes): "
+            f"expected {expected_sha}, got {actual_sha}"
+        )
+
+    dest.parent.mkdir(parents=True, exist_ok=True)
+    with tempfile.NamedTemporaryFile(dir=dest.parent, delete=False) as tmp:
+        tmp.write(binary_body)
+        tmp_path = Path(tmp.name)
+
+    try:
+        os.chmod(tmp_path, 0o750)
+        os.replace(tmp_path, dest)
+    except OSError:
+        tmp_path.unlink(missing_ok=True)
+        raise
+    return dest
diff --git a/runpod/serverless/binaries/README.md b/runpod/serverless/binaries/README.md
@@ -4,7 +4,24 @@ Pre-compiled GPU health check binary for Linux x86_64.
 
 ## Files
 
-- `gpu_test` - Compiled binary for CUDA GPU memory allocation testing
+- `gpu_test` - Compiled binary for CUDA GPU memory allocation testing (not
+  bundled in the PyPI wheel; see below)
+
+## Availability
+
+As of runpod v1.10.0 this binary is **not included** in the PyPI wheel. The
+universal `py3-none-any` wheel would otherwise advertise itself as
+platform-agnostic while shipping a Linux x86_64 ELF, which breaks Nix and
+other strict packagers (see [#498](https://github.com/runpod/runpod-python/issues/498)).
+
+Runpod GPU workers can download the matching binary with:
+
+```bash
+runpod install-gpu-test
+```
+
+This fetches the asset from the GitHub release matching the installed runpod
+version and verifies its sha256.
 
 ## Compatibility
 
@@ -29,7 +46,7 @@ GPU 0 memory allocation test passed.
 
 ## Building
 
-See `build_tools/compile_gpu_test.sh` and `docs/serverless/gpu_binary_compilation.md` for compilation instructions.
+See `build_tools/compile_gpu_test.sh` and `docs/serverless/gpu_binary_compilation.md`.
 
 ## License
 

diff --git a/runpod/serverless/binaries/gpu_test b/runpod/serverless/binaries/gpu_test
diff --git a/setup.py b/setup.py
@@ -60,7 +60,6 @@
         include_package_data=True,
         package_data={
             "runpod": [
-                "serverless/binaries/gpu_test",
                 "serverless/binaries/README.md",
             ]
         },

diff --git a/tests/test_cli/test_install/__init__.py b/tests/test_cli/test_install/__init__.py
@@ -0,0 +1 @@
+