Update Tool Call Accuracy to output unified format by m7md7sien · Pull Request #46319 · Azure/azure-sdk-for-python

m7md7sien · 2026-04-14T22:49:43Z

Description

Update Tool Call Accuracy to output unified format

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

…matting (#46336) Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/23f40ca5-7114-46ec-89be-a369e38ac971 Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

aprilk-ms · 2026-04-16T03:45:39Z

        # Check for intermediate response
        if _is_intermediate_response(eval_input.get("response")):
-            return self._not_applicable_result(
+            return self._return_not_applicable_result(


I am not very sure this is n/a.. feels like caller did something wrong in preparing the data. To me it is more like invalid input rather than not applicable case, where the inputs are perfectly normal, but evaluator cannot score because of some reason

This case can happen if the user provided a normal response id or normal data but the response is still not complete yet, waiting for input from user to approve a tool call. I am not sure we should return invalid input when given a response id.

aprilk-ms · 2026-04-16T04:04:03Z

-                f"gpt_{self._result_key}": score,
+                f"{self._result_key}_score": score,
                f"{self._result_key}_result": score_result,
+                f"{self._result_key}_passed": score_result == "pass",


we shouldn't need to compute pass here. Elaine's PR will handle pass/fail based on result

@aprilk-ms some evaluators already return a _passed bool - SDK currently does multiple checks on evaluator response like _result == "passed", _label == "passed", _passed == True
-> we want to standardize so we only have to check if _passed

evaluator itself is already calculating pass/fail because the threshold setting (e.g. if score > threshold -> pass vs score < threshold -> pass) is stored in evaluator class. makes sense to just propagate from there

I updated the dependencies in azure-ai-evaluation on _result to use _passed in #46436 . However, I think we should keep both for now till we deploy the service changes, to avoid breaking changes.

…ed properties handling (#46355) Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/89b3b528-f2ac-4284-88fb-c484d4c0cce1 Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/8ab1c161-c24f-4272-95ff-c8e595089e22 Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

…outputs (#46449) Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/77f12326-0743-466c-9fda-8e4906364d4f Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Update Tool Call Accuracy to output unified format

3eb40a8

github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Apr 14, 2026

m7md7sien and others added 5 commits April 15, 2026 20:09

Update tests

d3c4092

Merge branch 'main' into mohessie/unify_output/tool_call_accuracy

d076d5c

reformatting

5032e26

Refactor not applicable result method calls

a525806

aprilk-ms reviewed Apr 16, 2026

View reviewed changes

Comment thread ...ure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_call_accuracy.py Outdated

aprilk-ms reviewed Apr 16, 2026

View reviewed changes

Copilot AI and others added 7 commits April 16, 2026 22:10

Merge branch 'main' into mohessie/unify_output/tool_call_accuracy

83576b4

Merge branch 'main' into mohessie/unify_output/tool_call_accuracy

1893bc8

Merge branch 'main' into mohessie/unify_output/tool_call_accuracy

1a7f191

Merge branch 'main' into mohessie/unify_output/tool_call_accuracy

adff374

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Tool Call Accuracy to output unified format#46319

Update Tool Call Accuracy to output unified format#46319
m7md7sien wants to merge 13 commits intomainfrom
mohessie/unify_output/tool_call_accuracy

m7md7sien commented Apr 14, 2026 •

edited

Loading

Uh oh!

aprilk-ms Apr 16, 2026

Uh oh!

m7md7sien Apr 16, 2026

Uh oh!

Uh oh!

aprilk-ms Apr 16, 2026

Uh oh!

ela-ine Apr 16, 2026

Uh oh!

m7md7sien Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

m7md7sien commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

aprilk-ms Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

m7md7sien Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aprilk-ms Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

ela-ine Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

m7md7sien Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

m7md7sien commented Apr 14, 2026 •

edited

Loading