feat(conversation_manager): add token-aware context management#2147
Open
srbhsrkr wants to merge 3 commits intostrands-agents:mainfrom
Open
feat(conversation_manager): add token-aware context management#2147srbhsrkr wants to merge 3 commits intostrands-agents:mainfrom
srbhsrkr wants to merge 3 commits intostrands-agents:mainfrom
Conversation
Add token-budget awareness to SlidingWindowConversationManager and SummarizingConversationManager so context reduction can be driven by estimated token counts, not just message counts. Key changes: - New `_token_utils.py` with `estimate_tokens` (chars/4 heuristic) and `TokenCounter` type alias, handling all ContentBlock types (text, toolResult, toolUse, image, document, video, reasoningContent, etc.) - `SlidingWindowConversationManager`: new `max_context_tokens`, `token_counter`, and `compactable_after_messages` parameters; proactive token-budget enforcement via BeforeModelCallEvent hook; micro-compaction of stale tool results with `_last_compacted_index` tracking - `SummarizingConversationManager`: new `max_context_tokens`, `proactive_threshold`, and `token_counter` parameters; proactive summarization via hook when token threshold exceeded - Always uses heuristic estimator (never stale model-reported `latest_context_size`) to prevent over-reduction spirals - 55 new tests covering token estimation, budget enforcement, micro-compaction, parameter validation, integration flows, and _model_call_count semantics regression Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add read_only, destructive, and requires_confirmation boolean parameters to the @tool decorator and corresponding properties on AgentTool, ToolSpec, and MCPAgentTool. This enables hook-based permission policies to reason about tool safety without hardcoding tool-name mappings. Closes strands-agents#2154 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mmarization - Loop reduce_context in apply_management until token budget is satisfied or no further progress can be made, fixing cases where a single reduce_context call was insufficient for large messages under window_size - Prevent double apply_management calls in _on_before_model_call by unifying token-budget and per_turn triggers into a single dispatch - Fix _micro_compact image reclaimed accounting to use IMAGE_CHAR_ESTIMATE instead of hardcoded 200, and subtract stub length from reclaimed_chars - Add _do_proactive_summarization guard in SummarizingConversationManager to prevent hook and apply_management from both triggering summarization in the same agent cycle - Make SummarizingConversationManager.apply_management honor the token budget contract instead of being a silent no-op - Rename _IMAGE_CHAR_ESTIMATE to IMAGE_CHAR_ESTIMATE (cross-module usage) - Use len(messages) as loop bound instead of fixed constant Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #2146
Addresses #1294, #555, #298
Related to #1295, #1678, #1296, #2048
_token_utils.pywithestimate_tokens(chars/4 heuristic) covering allContentBlocktypes andTokenCountertype alias for pluggable token counting ([CONTEXT] [FEATURE] Token Estimation API #1294)max_context_tokens,token_counter, andcompactable_after_messagestoSlidingWindowConversationManagerfor token-budget enforcement and micro-compaction of stale tool results ([FEATURE] Proactive Context Compression #555, [FEATURE] In-envent-loop cycle context management #298)max_context_tokens,proactive_threshold, andtoken_countertoSummarizingConversationManagerfor proactive summarization when token threshold is exceeded ([FEATURE] Proactive Context Compression #555)latest_context_size) to prevent over-reduction spiralsapply_management()(notreduce_context()directly) to ensure micro-compaction runs before trimmingHow this relates to existing issues
estimate_tokens()+TokenCountertype on conversation managers. Complementary to a futureModel.estimate_tokens()— ours is the lightweight heuristic, theirs would be model-specific.max_context_tokens+BeforeModelCallEventhook triggers reduction beforeContextWindowOverflowException.per_turn+compactable_after_messages+ hook-based token budget checks enable within-cycle management.model.context_limitships, it could auto-configuremax_context_tokens.apply_management()→reduce_context(), but doesn't fire a dedicated event.Design Notes
_model_call_countonly increments whenper_turnis enabled (preserves existing per-turn semantics)apply_managementis an intentional no-op — proactive summarization runs exclusively via hook to prevent double-summarization with the agent's finally block_last_compacted_indextracks compaction progress to avoid re-scanning already-processed messagesmax_context_tokens is not NoneTest plan
test_token_aware_context_management.pycovering:apply_managementandBeforeModelCallEventhookmax_context_tokens,compactable_after_messages)_model_call_countsemantics regression (not incremented whenper_turn=False)apply_management→reduce_contextfull pipeline_last_compacted_indexadjustment after message trimmingruff check), type clean (mypy)