Skip to content

[Lean Squad] feat(fv): Task 2 — informal spec for ResponseFileHelper.SplitCommandLine#7899

Merged
Evangelink merged 1 commit intomainfrom
lean-squad/task2-splitcommandline-2026-04-28-af0bc0029033d7ce
Apr 29, 2026
Merged

[Lean Squad] feat(fv): Task 2 — informal spec for ResponseFileHelper.SplitCommandLine#7899
Evangelink merged 1 commit intomainfrom
lean-squad/task2-splitcommandline-2026-04-28-af0bc0029033d7ce

Conversation

@Evangelink
Copy link
Copy Markdown
Member

Extracts a precise informal specification for the SplitCommandLine tokeniser in ResponseFileHelper. Documents:

  • State machine with two variables (seeking, seekingQuote)
  • 23 named properties across 5 groups (trivial, whitespace, quotes, structural invariants, composition)
  • Edge case analysis including unclosed-quote behaviour (Issue 2/3)
  • Asymmetry: unclosed opening quote silently discards content; unclosed mid-word quote is included in output token
  • Adjacent quoted segments emit two separate tokens (Issue 4)
  • No escape sequence support (Issue 1)
  • No existing unit tests (Issue 5)
  • Approximations for Lean 4 model (List Char, decidable isWhitespace)

Advances ResponseFileHelper.SplitCommandLine from phase 1 to phase 2.

Fixes #7880

Extracts a precise informal specification for the SplitCommandLine
tokeniser in ResponseFileHelper. Documents:

- State machine with two variables (seeking, seekingQuote)
- 23 named properties across 5 groups (trivial, whitespace, quotes,
  structural invariants, composition)
- Edge case analysis including unclosed-quote behaviour (Issue 2/3)
- Asymmetry: unclosed opening quote silently discards content;
  unclosed mid-word quote is included in output token
- Adjacent quoted segments emit two separate tokens (Issue 4)
- No escape sequence support (Issue 1)
- No existing unit tests (Issue 5)
- Approximations for Lean 4 model (List Char, decidable isWhitespace)

Advances ResponseFileHelper.SplitCommandLine from phase 1 to phase 2.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 28, 2026 07:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds formal-verification documentation that captures the current behavior of ResponseFileHelper.SplitCommandLine as an informal (human-readable) specification, and updates the FV target tracking to reflect progress.

Changes:

  • Adds an informal spec document describing the tokenizer’s state machine, properties, and edge cases.
  • Updates FV target phase/status for ResponseFileHelper.SplitCommandLine from Phase 1 to Phase 2.
Show a summary per file
File Description
formal-verification/specs/responsefilehelper_splitcommandline_informal.md New informal spec for SplitCommandLine including transitions, properties, and edge-case analysis.
formal-verification/TARGETS.md Marks ResponseFileHelper.SplitCommandLine as Phase 2 (“Informal spec extracted”).

Copilot's findings

Comments suppressed due to low confidence (2)

formal-verification/specs/responsefilehelper_splitcommandline_informal.md:113

  • Property 13 title says "Adjacent quoted segments merge" but the example/output and later Issue 4 indicate they do not merge; they emit two separate tokens (e.g., ""a""b"" → ["a", "b"]). Please rename/reword this property to match the described behavior to avoid a self-contradicting spec.
13. **Adjacent quoted segments merge**: `SplitCommandLine("\"a\"\"b\"") = ["a", "b"]` — NOTE: separate yields because closing `"` triggers a yield when `seeking == TokenStart`

formal-verification/specs/responsefilehelper_splitcommandline_informal.md:149

  • This edge-case row appears to show the closed quoted input ("abc") but labels it as unclosed ("abc) and expects []. The unclosed-quote-at-start case in the later analysis is SplitCommandLine("\"abc") (no trailing quote). Please fix the input representation so the table matches the described behavior.
| `"\"abc\""` (unclosed: `"abc`) | `[]` | Unclosed quote, `seeking==TokenStart` at end → no token emitted! |
| `"abc\""` (unclosed: `abc"`) | `["abc"]` | Word started before quote; toggle to QuoteEnd mid-word; end yields word |
  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Workflow: Expert Code Reviewer
Date: 2026-04-28
Repository: microsoft/testfx
PR: #7899 — Informal spec for ResponseFileHelper.SplitCommandLine


This PR adds only .md files (a phase-status update in TARGETS.md and a new informal specification). The spec itself is the deliverable, so accuracy is the primary concern.

I verified all state-machine traces in the spec against the actual implementation in ResponseFileHelper.cs. The overall structure, state-machine description, transition tables, and most edge cases are correct. Two inaccuracies were found that would corrupt the Lean formalisation if carried forward.

Key Findings

  1. [Correctness] Line 113 — Property 13 heading contradicts its own output
    Property 13 is named "Adjacent quoted segments merge", but the expected output ["a", "b"] shows they produce separate tokens, not a merged one. The inline note acknowledges separation, making the heading directly misleading. In a proof assistant, this property name risks encoding the wrong invariant.

  2. [Correctness] Line 148 — Unclosed-quote edge-case example string is wrong
    The notation `"\"abc\""` is the C# string literal for "abc" (5 chars, properly closed), which yields ["abc"] — not []. The intended input is the 4-char unclosed string "abc, whose C# literal is "\"abc". The parenthetical "(unclosed: "abc)" clarifies intent but the table cell itself is incorrect, which is what a tool or prover would consume.

Positive Observations

  • The state-machine description accurately reflects the implementation's two-variable approach (seeking / seekingQuote).
  • All other traced edge cases (empty string, whitespace-only, "", " ", abc"def"ghi, "abc"def) produce the correct expected outputs.
  • The asymmetry between Issue 2 (unclosed opening quote → silent discard) and Issue 3 (unclosed mid-word quote → content included) is correctly identified and explained.
  • The note about "foo""bar" → two tokens (Issue 4) is correct and well-explained.
  • The Lean modelling approximations section is sound.

Recommendations

  1. Rename Property 13 to reflect actual behavior, e.g. "Adjacent quoted segments are NOT concatenated — each emits a separate token".
  2. Fix line 148 edge-case input to use `"abc` (raw 4-char value) or the correct C# literal "\"abc" (no trailing \").

Generated by Expert Code Reviewer 🧠

🧠 Reviewed by Expert Code Reviewer 🧠

@Evangelink Evangelink merged commit 6774645 into main Apr 29, 2026
29 checks passed
@Evangelink Evangelink deleted the lean-squad/task2-splitcommandline-2026-04-28-af0bc0029033d7ce branch April 29, 2026 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Lean Squad] feat(fv): Task 2 — informal spec for ResponseFileHelper.SplitCommandLine

3 participants