Fix SDL false-pass by normalizing string context in attack objectives#46445
Fix SDL false-pass by normalizing string context in attack objectives#46445slister1001 wants to merge 1 commit intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a Foundry red-team execution-path bug where pre-curated sensitive_data_leakage objectives provided messages[0].context as a raw string (document text), but _extract_context_items ignored str shapes and then incorrectly fell back to synthesizing context from the user prompt—causing SDL attacks to false-pass.
Changes:
- Extend
_extract_context_itemsto handle string-shaped context at both message-level and top-level. - Add
_normalize_context_listto normalize list context entries that may contain raw strings into dict-shaped context items (preservingcontext_type/tool_namedefaults). - Add/expand unit tests (plus an end-to-end handoff test through
DatasetConfigurationBuilder) and document the fix in the changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_execution_manager.py | Normalize string/list context shapes and gate the context_type fallback to prevent SDL false-pass. |
| sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_foundry.py | Add tests covering string context, null fallback, list normalization, top-level string context, and dataset-builder handoff. |
| sdk/evaluation/azure-ai-evaluation/CHANGELOG.md | Document the SDL false-pass fix and the new normalization behavior. |
| produced_message_context = False | ||
| if "context" in first_msg: | ||
| ctx = first_msg["context"] | ||
| if isinstance(ctx, list): | ||
| context_items.extend(ctx) | ||
| normalized = self._normalize_context_list( | ||
| ctx, | ||
| first_msg.get("context_type"), | ||
| first_msg.get("tool_name"), | ||
| ) | ||
| context_items.extend(normalized) | ||
| produced_message_context = bool(normalized) | ||
| elif isinstance(ctx, dict): | ||
| context_items.append(ctx) | ||
|
|
||
| # Also check for separate context fields | ||
| if "context_type" in first_msg: | ||
| produced_message_context = True | ||
| elif isinstance(ctx, str): | ||
| context_items.append( | ||
| { | ||
| "content": ctx, | ||
| "context_type": first_msg.get("context_type"), | ||
| "tool_name": first_msg.get("tool_name"), | ||
| } | ||
| ) | ||
| produced_message_context = True |
There was a problem hiding this comment.
produced_message_context is set to True for any dict/str context, even when the extracted item is unusable downstream (e.g., context: "" or a dict missing/empty content). Since DatasetConfigurationBuilder.add_objective_with_context skips context items with falsy content (see _dataset_builder.py around the if not content: continue check), this can suppress the context_type fallback and result in no context being delivered. Consider treating empty/whitespace string context as missing (don’t append / don’t mark produced), and for dict/list normalization only marking produced when at least one item has a non-empty content.
f04611e to
7d608d7
Compare
Pre-curated sensitive_data_leakage attack objectives store messages[0].context as a string (document text) with sibling context_type/tool_name fields. The _extract_context_items helper in the Foundry execution path only handled list and dict shapes, so the document was silently dropped. The context_type fallback then synthesized a context item from the user prompt, so the agent never saw the sensitive document content and could not leak it — the evaluator scored every attempt as a pass (100% false-negative rate). Fix: - Handle str context at both the per-message and top-level blocks. - Normalize raw string entries inside list-shaped context via a new _normalize_context_list helper. - Gate the context_type fallback so it only runs when no usable context was produced, covering both missing-key and context:null cases. Added unit tests covering string context, null fallback, list-of-strings normalization, top-level string context, and an integration test that runs the extracted items through DatasetConfigurationBuilder.add_objective_with_context and verifies the resulting context SeedPrompt carries the document text plus tool_name/context_type metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7d608d7 to
37277d8
Compare
Pre-curated sensitive_data_leakage attack objectives store messages[0].context as a string (document text) with sibling context_type/tool_name fields. The _extract_context_items helper in the Foundry execution path only handled list and dict shapes, so the document was silently dropped. The context_type fallback then synthesized a context item from the user prompt, so the agent never saw the sensitive document content and could not leak it — the evaluator scored every attempt as a pass (100% false-negative rate).
Fix:
Added unit tests covering string context, null fallback, list-of-strings normalization, top-level string context, and an integration test that runs the extracted items through DatasetConfigurationBuilder.add_objective_with_context and verifies the resulting context SeedPrompt carries the document text plus tool_name/context_type metadata.
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines