Skip to content

fix: handle BaseException in trace spans to prevent span leaks on Key…#2068

Open
Di-Is wants to merge 3 commits intostrands-agents:mainfrom
Di-Is:fix/base-exception-span-handling
Open

fix: handle BaseException in trace spans to prevent span leaks on Key…#2068
Di-Is wants to merge 3 commits intostrands-agents:mainfrom
Di-Is:fix/base-exception-span-handling

Conversation

@Di-Is
Copy link
Copy Markdown
Contributor

@Di-Is Di-Is commented Apr 5, 2026

Description

PR #2032 switched spans from end_on_exit=True to explicit span.end() calls, fixing span timing. However, the existing except Exception handlers don't catch BaseException subclasses like KeyboardInterrupt and asyncio.CancelledError, so spans leak when these occur.

This PR ensures spans are properly closed on all exit paths, including BaseException interruptions and premature async generator teardown.

Follow-up to #2032 (which resolved #1930), split from #1939 per reviewer request.

Public API Changes

No public API changes. Internal tracer method signatures are widened from Exception to BaseException (backward-compatible).

Related Issues

Documentation PR

N/A

Type of Change

Bug fix

Testing

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…boardInterrupt

Trace spans were not properly closed when BaseException (e.g. KeyboardInterrupt,
asyncio.CancelledError) was raised. Add explicit BaseException handlers to close
spans and aclose() calls to ensure async generators are cleaned up.
@github-actions github-actions Bot added the size/m label Apr 5, 2026
@Di-Is Di-Is requested a deployment to manual-approval April 5, 2026 12:59 — with GitHub Actions Waiting
@Di-Is Di-Is requested a deployment to manual-approval April 5, 2026 12:59 — with GitHub Actions Waiting
@Di-Is
Copy link
Copy Markdown
Contributor Author

Di-Is commented Apr 5, 2026

@zastrowm Spun out the BaseException handling commit as a standalone PR (#2032). :)

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/strands/event_loop/event_loop.py 85.71% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

):
yield event
)
try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it be better to move try on top of streamed_events = stream_messages(...)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it.

raise
except BaseException as e:
tracer.end_span_with_error(cycle_span, str(e), e)
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The except Exception and except BaseException handlers here have identical bodies — both call tracer.end_span_with_error and re-raise. Since Exception is a subclass of BaseException, the except BaseException handler alone would catch both.

Suggestion: Consolidate into a single handler:

except BaseException as e:
    tracer.end_span_with_error(cycle_span, str(e), e)
    raise

Note: The other two locations (lines 233-250 and 398-430) are correctly separated since the except Exception handlers do additional work (wrapping in EventLoopException, retry logic, etc.) that should not apply to BaseException subclasses.

async for event in streamed_events:
yield event
finally:
await streamed_events.aclose()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: streamed_events is initialized to None (line 336), but the finally block unconditionally calls await streamed_events.aclose(). If stream_messages() raises before returning the generator object (e.g., due to a future signature change or dynamic error), this would raise AttributeError: 'NoneType' object has no attribute 'aclose' — masking the original exception.

Currently this is safe because stream_messages is an async generator function (calling it just creates the object without executing any code), but the code is fragile.

Suggestion: Add a guard:

finally:
    if streamed_events is not None:
        await streamed_events.aclose()

The same pattern at line 162 for model_events has the same concern, though it's slightly safer because model_events is assigned before the try block.

@github-actions
Copy link
Copy Markdown

Assessment: Comment

Solid bug fix that closes a real span-leak gap for BaseException subclasses (KeyboardInterrupt, asyncio.CancelledError). The approach is correct and the type widening in the tracer is backward-compatible. Tests are thorough, covering the key scenarios (KeyboardInterrupt, task cancellation, and premature aclose).

Review Details
  • Code duplication: One location (lines 166-171) has identical except Exception / except BaseException handlers that can be consolidated into a single except BaseException block.
  • Defensive coding: The finally: await streamed_events.aclose() pattern could fail if the generator variable is still None, warranting a guard check.

@mkmeral
Copy link
Copy Markdown
Contributor

mkmeral commented Apr 22, 2026

@Di-Is can you also merge from main? there are some conflicts

@github-actions github-actions Bot added size/m and removed size/m labels Apr 26, 2026
@Di-Is Di-Is requested a deployment to manual-approval April 26, 2026 11:15 — with GitHub Actions Waiting
@Di-Is Di-Is requested a deployment to manual-approval April 26, 2026 11:15 — with GitHub Actions Waiting
@Di-Is
Copy link
Copy Markdown
Contributor Author

Di-Is commented Apr 26, 2026

@mkmeral

The conflict has been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Regression: event_loop_cycle span duration becomes cumulative across recursive cycles

3 participants