feat: add CodeArtifact support for ModelTrainer and FrameworkProcessor requirements.txt installation by humanzz · Pull Request #5772 · aws/sagemaker-python-sdk

humanzz · 2026-04-17T10:55:13Z

SDK v3's ModelTrainer and FrameworkProcessor override the container entrypoint with SDK-generated scripts (sm_train.sh, runproc.sh), bypassing the container's entrypoint which involved sagemaker-training-toolkit handling
CA_REPOSITORY_ARN-based CodeArtifact authentication. This broke CodeArtifact support for both training (Bug 4) and processing (Bug 3) reported in #5765.

This is the stopgap solution proposed in this comment: a self-contained install_requirements.py script that the SDK uploads to the container alongside its generated entrypoint scripts.

Add install_requirements.py in sagemaker-core — reads CA_REPOSITORY_ARN from container environment; no-op if unset
Try boto3 first (matching sagemaker-training-toolkit), fall back to AWS CLI, hard-fail if neither is available
Wire into ModelTrainer: copy script into sm_drivers/scripts/, update INSTALL_REQUIREMENTS templates to call it instead of bare pip install
Wire into FrameworkProcessor: upload script as sibling file alongside runproc.sh, update generated script to call it

Issue #, if available:

#5765

Description of changes:

When the SDK overrides a container's entrypoint — as ModelTrainer does for training jobs (Bug 4) and FrameworkProcessor does for processing jobs (Bug 3) — the container's native sagemaker-training-toolkit is bypassed. This toolkit handled CA_REPOSITORY_ARN-based CodeArtifact authentication for requirements.txt installation via boto3. Without it, pip install -r requirements.txt runs against public PyPI, failing in VPC-isolated environments or when packages are only available in a private CodeArtifact repository.

See #5765 and the detailed analysis comment for full context.

Solution: Stopgap `install_requirements.py`

A self-contained Python script in sagemaker-core that handles CodeArtifact authentication before installing requirements. It:

Reads CA_REPOSITORY_ARN from the container environment — if not set, does a normal pip install
Tries boto3 first (matching sagemaker-training-toolkit's approach) to build an authenticated pip index URL
Falls back to AWS CLI (aws codeartifact login --tool pip) if boto3 is unavailable
Hard-fails with a clear error if CA_REPOSITORY_ARN is set but neither boto3 nor AWS CLI is available

The script can be used as:

A standalone script: python install_requirements.py requirements.txt (used by bash-based entrypoints)
An importable module: from sagemaker.core.utils.install_requirements import configure_pip, install_requirements (for Python-native callers like @remote or ModelBuilder)

Changes

File	Change
`sagemaker-core/.../utils/install_requirements.py`	New module with `configure_pip()`, `install_requirements()`, `main()`, and `CodeArtifactAuthMethod` enum
`sagemaker-core/tests/unit/test_install_requirements.py`	22 unit tests covering all auth methods, fallback chains, error propagation
`sagemaker-train/.../templates.py`	`INSTALL_REQUIREMENTS` and `INSTALL_AUTO_REQUIREMENTS` now call `install_requirements.py` instead of bare `pip install`
`sagemaker-train/.../model_trainer.py`	Copy `install_requirements.py` from sagemaker-core into `sm_drivers/scripts/` at runtime
`sagemaker-core/.../processing.py`	Upload `install_requirements.py` as sibling file alongside `runproc.sh` and `sourcedir.tar.gz`; update generated script to call it
`sagemaker-core/tests/unit/test_processing.py`	Verify `install_requirements.py` is uploaded and referenced in generated script

What this covers

Job Type	Class	CodeArtifact with this PR
Training	`ModelTrainer`	✅ Fixed — `install_requirements.py` in `sm_drivers/scripts/`
Processing	`FrameworkProcessor`	✅ Fixed — `install_requirements.py` uploaded as sibling file
Tuning	`Tuner`	✅ Already works — Tuner uses container's native toolkit (not affected by this PR)
Inference	`ModelBuilder`	✅ Already works — SDK doesn't override inference entrypoints

What this does NOT cover

Path	Status	Notes
`@remote` function (`runtime_environment_manager.py`)	❌ Not wired	Has its own `_install_requirements_txt()` that does bare `pip install`. Could use `configure_pip()` via import.
`sagemaker-serve` (`requirements_manager.py`)	❌ Not wired	Same — bare `pip install` in-process. Could import `configure_pip()`.
`sagemaker-core/modules` (`templates.py`)	❌ Not wired	Duplicate of `sagemaker-train/templates.py` without `INSTALL_REQUIREMENTS`. Lower priority.

These are follow-up opportunities — the module is available for them to import.

Known risks

Tuning jobs depend on the container's toolkit — The CreateHyperParameterTuningJob API uses HyperParameterAlgorithmSpecification which lacks ContainerEntrypoint, so the Tuner cannot use sm_train.sh. If future containers drop sagemaker-training-toolkit, tuning jobs will lose CodeArtifact support with no SDK-side fix possible until the API adds entrypoint support.
boto3 availability in future containers — Current PyTorch training containers (2.7–2.9) include boto3. New DLC base images on the main branch do not. The script's fallback to AWS CLI mitigates this, but if neither is available, the script hard-fails. The long-term solution is a shared package with boto3 as a declared dependency (see analysis).

Long-term solution

This PR is a stopgap that works within the SDK alone. The long-term solution requires coordination between the SDK and DLC to ensure that both the container's default entrypoint and any SDK-overridden entrypoint have access to the same CodeArtifact-aware installer — ideally a shared package with boto3 as a declared dependency, installed in all SageMaker containers. See the proposed ideal solution for details.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…r requirements.txt installation SDK v3's `ModelTrainer` and `FrameworkProcessor` override the container entrypoint with SDK-generated scripts (`sm_train.sh`, `runproc.sh`), bypassing the container's entrypoint which involved `sagemaker-training-toolkit` handling `CA_REPOSITORY_ARN`-based CodeArtifact authentication. This broke CodeArtifact support for both training (Bug 4) and processing (Bug 3) reported in aws#5765. This is the stopgap solution proposed in this comment[aws#5765 (comment)]: a self-contained install_requirements.py script that the SDK uploads to the container alongside its generated entrypoint scripts. - Add `install_requirements.py` in sagemaker-core — reads `CA_REPOSITORY_ARN` from container environment; no-op if unset - Try `boto3` first (matching sagemaker-training-toolkit), fall back to `AWS CLI`, hard-fail if neither is available - Wire into `ModelTrainer`: copy script into `sm_drivers/scripts/`, update `INSTALL_REQUIREMENTS` templates to call it instead of bare `pip install` - Wire into `FrameworkProcessor`: upload script as sibling file alongside `runproc.sh`, update generated script to call it

humanzz · 2026-04-17T11:07:11Z

I've also tested this with my code (as an integration test) to verify the behaviours

Training Job (`ModelTrainer`)

from sagemaker.train import ModelTrainer
from sagemaker.core.training.configs import Compute, SourceCode

trainer = ModelTrainer(
    training_image="763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.10-cpu-py313",
    role="arn:aws:iam::123456789:role/MyRole",
    source_code=SourceCode(
        entry_script="train.py", source_dir="src", requirements="requirements.txt"
    ),
    compute=Compute(instance_type="ml.m5.xlarge", instance_count=1),
    sagemaker_session=sm_session,
    environment={"CA_REPOSITORY_ARN": "arn:aws:codeartifact:us-west-2:ACCOUNT:repository/DOMAIN/REPO"},
)
trainer.train(input_data_config=inputs, wait=False)

CloudWatch logs — install_requirements.py ran from sm_drivers/scripts/, authenticated via boto3, pip resolved from CodeArtifact:

Installing requirements
++ /usr/local/bin/python3 /opt/ml/input/data/sm_drivers/scripts/install_requirements.py requirements.txt
Looking in indexes: https://aws:****@amazon-ACCOUNT.d.codeartifact.us-west-2.amazonaws.com/pypi/REPO/simple/
  Downloading https://amazon-ACCOUNT.d.codeartifact.us-west-2.amazonaws.com/pypi/REPO/simple/pyarrow/20.0.0/pyarrow-20.0.0-cp313-cp313-manylinux_2_28_x86_64.whl (42.3 MB)
  Downloading https://amazon-ACCOUNT.d.codeartifact.us-west-2.amazonaws.com/pypi/REPO/simple/sentence-transformers/5.4.1/sentence_transformers-5.4.1-py3-none-any.whl (571 kB)

Training job completed successfully. ✅

Processing Job (`FrameworkProcessor`)

from sagemaker.core.processing import FrameworkProcessor

processor = FrameworkProcessor(
    image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.10-gpu-py313",
    command=["python3"],
    role="arn:aws:iam::123456789:role/MyRole",
    instance_count=1,
    instance_type="ml.g6.4xlarge",
    sagemaker_session=sm_session,
    env={"CA_REPOSITORY_ARN": "arn:aws:codeartifact:us-west-2:ACCOUNT:repository/DOMAIN/REPO"},
)
processor.run(code="my_script.py", source_dir="src", wait=False)

CloudWatch logs — install_requirements.py uploaded as sibling file, authenticated via boto3:

Files in /opt/ml/processing/input/code/ before extraction:
-rw-r--r-- 1 root root  6652 Apr 17 10:22 install_requirements.py
-rw-r--r-- 1 root root   685 Apr 17 10:22 runproc.sh
-rw-r--r-- 1 root root 81582 Apr 17 10:22 sourcedir.tar.gz

Looking in indexes: https://aws:****@amazon-ACCOUNT.d.codeartifact.us-west-2.amazonaws.com/pypi/REPO/simple/
  Downloading https://amazon-ACCOUNT.d.codeartifact.us-west-2.amazonaws.com/pypi/REPO/simple/pyarrow/20.0.0/pyarrow-20.0.0-cp313-cp313-manylinux_2_28_x86_64.whl (42.3 MB)
  Downloading https://amazon-ACCOUNT.d.codeartifact.us-west-2.amazonaws.com/pypi/REPO/simple/sentence-transformers/5.4.1/sentence_transformers-5.4.1-py3-none-any.whl (571 kB)

Processing job completed successfully. ✅

humanzz requested a deployment to manual-approval April 17, 2026 10:55 — with GitHub Actions Waiting

humanzz temporarily deployed to manual-approval April 17, 2026 10:55 — with GitHub Actions Inactive

aviruthen approved these changes Apr 17, 2026

View reviewed changes

zhaoqizqwang approved these changes Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add CodeArtifact support for ModelTrainer and FrameworkProcessor requirements.txt installation#5772

feat: add CodeArtifact support for ModelTrainer and FrameworkProcessor requirements.txt installation#5772
humanzz wants to merge 1 commit intoaws:masterfrom
humanzz:codeartifact-fix

humanzz commented Apr 17, 2026 •

edited

Loading

Uh oh!

humanzz commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

humanzz commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Solution: Stopgap install_requirements.py

Changes

What this covers

What this does NOT cover

Known risks

Long-term solution

Uh oh!

humanzz commented Apr 17, 2026

Training Job (ModelTrainer)

Processing Job (FrameworkProcessor)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

humanzz commented Apr 17, 2026 •

edited

Loading

Solution: Stopgap `install_requirements.py`

Training Job (`ModelTrainer`)

Processing Job (`FrameworkProcessor`)