Skip to content

fix: wire FrameworkProcessor code_location into code upload paths#5773

Open
humanzz wants to merge 1 commit intoaws:masterfrom
humanzz:frameworkprocessor-code-location
Open

fix: wire FrameworkProcessor code_location into code upload paths#5773
humanzz wants to merge 1 commit intoaws:masterfrom
humanzz:frameworkprocessor-code-location

Conversation

@humanzz
Copy link
Copy Markdown
Contributor

@humanzz humanzz commented Apr 17, 2026

FrameworkProcessor.__init__ accepts code_location and the docstring states it controls where code is uploaded, but _package_code and _create_and_upload_runproc always used default_bucket(), ignoring it (Bug 2 reported in #5765)

  • Add _s3_code_prefix() helper that returns code_location when set, falling back to default_bucket()/default_bucket_prefix
  • Use _s3_code_prefix() in _package_code for sourcedir.tar.gz upload
  • Use _s3_code_prefix() in _create_and_upload_runproc pipeline path for runproc.sh upload
  • Non-pipeline runproc.sh and _patch_inputs_with_payload already derive their URIs from _package_code's output, so they inherit the fix

Issue #, if available:

#5765 (Bug 2)

Problem

processor = FrameworkProcessor(
    ...,
    code_location="s3://my-custom-bucket",  # ← accepted but ignored
)
processor.run(code="my_script.py", source_dir="src", wait=False)
# Code uploads to s3://sagemaker-us-west-2-123456789/... instead of s3://my-custom-bucket/...

Fix

A single _s3_code_prefix() method that all code upload paths use:

def _s3_code_prefix(self):
    if self.code_location:
        return self.code_location
    return s3.s3_path_join(
        "s3://", self.sagemaker_session.default_bucket(),
        self.sagemaker_session.default_bucket_prefix or "",
    )

All S3 URIs for code artifacts (sourcedir.tar.gz, runproc.sh, pipeline-hashed paths) now flow through this method.

Changes

File Change
sagemaker-core/.../processing.py Add _s3_code_prefix(), use it in _package_code and _create_and_upload_runproc
sagemaker-core/tests/unit/test_processing.py 3 new tests: custom code_location, default bucket fallback, trailing slash handling

Testing

  • 83 processing unit tests pass (80 existing + 3 new)
  • New tests verify:
    • code_location="s3://my-custom-bucket/my-prefix" → upload URI starts with that prefix
    • No code_location → upload URI uses mock session's default_bucket (test-bucket/sagemaker)
    • code_location="s3://bucket/" (trailing slash) → stripped by __init__, works correctly

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

`FrameworkProcessor.__init__` accepts `code_location` and the docstring states it controls where code is uploaded, but `_package_code` and `_create_and_upload_runproc` always used `default_bucket()`, ignoring it (Bug 2 reported in aws#5765)

- Add `_s3_code_prefix()` helper that returns `code_location` when set, falling back to `default_bucket()`/`default_bucket_prefix`
- Use `_s3_code_prefix()` in `_package_code` for `sourcedir.tar.gz` upload
- Use `_s3_code_prefix()` in `_create_and_upload_runproc` pipeline path for `runproc.sh` upload
- Non-pipeline `runproc.sh` and `_patch_inputs_with_payload` already derive their URIs from `_package_code`'s output, so they inherit the fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants