Skip to content

Fix SDL false-pass by normalizing string context in attack objectives#46445

Open
slister1001 wants to merge 1 commit intoAzure:mainfrom
slister1001:fix/sdl-extract-context-string
Open

Fix SDL false-pass by normalizing string context in attack objectives#46445
slister1001 wants to merge 1 commit intoAzure:mainfrom
slister1001:fix/sdl-extract-context-string

Conversation

@slister1001
Copy link
Copy Markdown
Member

Pre-curated sensitive_data_leakage attack objectives store messages[0].context as a string (document text) with sibling context_type/tool_name fields. The _extract_context_items helper in the Foundry execution path only handled list and dict shapes, so the document was silently dropped. The context_type fallback then synthesized a context item from the user prompt, so the agent never saw the sensitive document content and could not leak it — the evaluator scored every attempt as a pass (100% false-negative rate).

Fix:

  • Handle str context at both the per-message and top-level blocks.
  • Normalize raw string entries inside list-shaped context via a new _normalize_context_list helper.
  • Gate the context_type fallback so it only runs when no usable context was produced, covering both missing-key and context:null cases.

Added unit tests covering string context, null fallback, list-of-strings normalization, top-level string context, and an integration test that runs the extracted items through DatasetConfigurationBuilder.add_objective_with_context and verifies the resulting context SeedPrompt carries the document text plus tool_name/context_type metadata.

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copilot AI review requested due to automatic review settings April 21, 2026 16:16
@slister1001 slister1001 requested a review from a team as a code owner April 21, 2026 16:16
@github-actions github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Foundry red-team execution-path bug where pre-curated sensitive_data_leakage objectives provided messages[0].context as a raw string (document text), but _extract_context_items ignored str shapes and then incorrectly fell back to synthesizing context from the user prompt—causing SDL attacks to false-pass.

Changes:

  • Extend _extract_context_items to handle string-shaped context at both message-level and top-level.
  • Add _normalize_context_list to normalize list context entries that may contain raw strings into dict-shaped context items (preserving context_type/tool_name defaults).
  • Add/expand unit tests (plus an end-to-end handoff test through DatasetConfigurationBuilder) and document the fix in the changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_execution_manager.py Normalize string/list context shapes and gate the context_type fallback to prevent SDL false-pass.
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_foundry.py Add tests covering string context, null fallback, list normalization, top-level string context, and dataset-builder handoff.
sdk/evaluation/azure-ai-evaluation/CHANGELOG.md Document the SDL false-pass fix and the new normalization behavior.

Comment on lines +341 to +363
produced_message_context = False
if "context" in first_msg:
ctx = first_msg["context"]
if isinstance(ctx, list):
context_items.extend(ctx)
normalized = self._normalize_context_list(
ctx,
first_msg.get("context_type"),
first_msg.get("tool_name"),
)
context_items.extend(normalized)
produced_message_context = bool(normalized)
elif isinstance(ctx, dict):
context_items.append(ctx)

# Also check for separate context fields
if "context_type" in first_msg:
produced_message_context = True
elif isinstance(ctx, str):
context_items.append(
{
"content": ctx,
"context_type": first_msg.get("context_type"),
"tool_name": first_msg.get("tool_name"),
}
)
produced_message_context = True
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

produced_message_context is set to True for any dict/str context, even when the extracted item is unusable downstream (e.g., context: "" or a dict missing/empty content). Since DatasetConfigurationBuilder.add_objective_with_context skips context items with falsy content (see _dataset_builder.py around the if not content: continue check), this can suppress the context_type fallback and result in no context being delivered. Consider treating empty/whitespace string context as missing (don’t append / don’t mark produced), and for dict/list normalization only marking produced when at least one item has a non-empty content.

Copilot uses AI. Check for mistakes.
@slister1001 slister1001 force-pushed the fix/sdl-extract-context-string branch from f04611e to 7d608d7 Compare April 21, 2026 20:38
Pre-curated sensitive_data_leakage attack objectives store messages[0].context
as a string (document text) with sibling context_type/tool_name fields. The
_extract_context_items helper in the Foundry execution path only handled list
and dict shapes, so the document was silently dropped. The context_type
fallback then synthesized a context item from the user prompt, so the agent
never saw the sensitive document content and could not leak it — the
evaluator scored every attempt as a pass (100% false-negative rate).

Fix:
- Handle str context at both the per-message and top-level blocks.
- Normalize raw string entries inside list-shaped context via a new
  _normalize_context_list helper.
- Gate the context_type fallback so it only runs when no usable context was
  produced, covering both missing-key and context:null cases.

Added unit tests covering string context, null fallback, list-of-strings
normalization, top-level string context, and an integration test that runs
the extracted items through DatasetConfigurationBuilder.add_objective_with_context
and verifies the resulting context SeedPrompt carries the document text plus
tool_name/context_type metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@slister1001 slister1001 force-pushed the fix/sdl-extract-context-string branch from 7d608d7 to 37277d8 Compare April 22, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants