Skip to content

[BUG] Windows: unhandled read ENOTCONN in child_process.spawn corrupts TUI alt-screen #2864

@nschwerzler

Description

@nschwerzler

[BUG] Windows: unhandled read ENOTCONN in child_process.spawn corrupts TUI alt-screen

Environment

Copilot CLI v1.0.32 (host also has v1.0.34 installed; repro observed on 1.0.32)
Node v24.11.1
OS Windows 11 Enterprise 10.0.26200 x64
Terminal Windows Terminal, PowerShell host, alt-screen enabled (ALT_SCREEN: true feature flag on in every session)

Summary

copilot.exe raises an unhandled Error: read ENOTCONN from deep inside child_process.spawn → createSocket → tryReadStart on Windows. The CLI's own uncaught-exception handler catches it and logs it three times (once as Uncaught Exception, twice as (suppressed, within error cooldown)), but Node has already written the default stack dump to the TTY while the CLI is in alt-screen TUI mode. The alt-screen is never restored (ESC [?1049l never sent), leaving the terminal in a half-painted state. Any subsequent user input echoes against the stale alt-screen buffer as visible garbage — the user perceives this as "CLI crashed and left weird characters on the screen."

The process itself survives (telemetry continues after the exception) but the socket/child-process state is undefined, and some time later the session ends abruptly mid-turn.

User-visible symptom

"Copilot CLI keeps crashing. It seems to just end and I get weird characters showing up in the CLI window."

Repro observed

Two independent fatal events on the same day, 2h 23m apart, in two different long-running sessions. Byte-identical stack traces.

# UTC PID Context
1 10:44:45.953Z 5388 ~1h into session, during AI request group
2 13:07:45.705Z 93012 ~1h into session, during AI request group

Stack trace (identical on both hits)

Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)

Followed immediately by:

[ERROR] Uncaught Exception: read ENOTCONN
[ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
[ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN

Root cause (our reading)

read ENOTCONN raised on a freshly-constructed net.Socket inside child_process.spawn is a known Windows-specific race: the child's IPC named pipe half-closes between when the parent allocates its Socket wrapper and when Node asks libuv to tryReadStart. It manifests most often on hosts that spawn many short-lived child processes and is orthogonal to any single child — it fired from Copilot's own execFile call for a hook/tool.

Related upstream Node.js threads: nodejs/node#27097 and the broader tryReadStart / ENOTCONN discussion. Any TUI CLI using the execa / cross-spawn family on Windows is exposed.

Two problems for copilot-cli to fix

1. Terminal is left corrupted ("weird characters")

Because alt-screen is active and Node's default stack dump goes directly to stderr/TTY, the restore sequence never runs. The uncaught-exception handler should, before logging/suppressing:

process.stderr.write('\x1b[?1049l\x1b[?25h\x1b[0m');

to pop the alt-screen, re-show the cursor, and reset SGR. Without that, the user's terminal stays in the TUI buffer with partial paint damage.

2. Handler swallows an unrecoverable error

The cooldown-suppression pattern keeps the process running, but read ENOTCONN on createSocket means the IPC pipe is already gone — there is no state to recover. Options, in decreasing order of preference:

  • Retry the failed spawn once; if it fails again, surface the failure to the caller (hook/tool wrapper) and let the turn fail cleanly.
  • Or log, restore the TTY, and process.exit(1) so the outer shell can re-launch.
  • Do not silently keep the session alive with an undefined child-process state — this is what produces the "it kept going but eventually died mid-request" symptom.

Ask

  1. Repro read ENOTCONN at child_process.spawn → createSocket → tryReadStart on Windows + Node 24.11.x with a long-lived TUI session doing frequent hook/tool execFile calls. Two independent hits on one machine in a single day with the same stack.
  2. In the uncaught-exception handler, restore the TTY (\x1b[?1049l\x1b[?25h\x1b[0m) before logging/suppressing when alt-screen is active.
  3. Decide whether read ENOTCONN from child_process internals should be retried-once or hard-fail the turn rather than suppressed with a cooldown.
  4. Optional: wrap execFile / spawn with a small Windows-aware retry for ENOTCONN, and/or document the race in the FAQ.

What this is not

Evidence — raw log snippets

Hit #1 — PID 5388 @ 10:44:45.953Z (exception handler output verbatim)

10:44:45.953Z [ERROR] Uncaught Exception: read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)
10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)
10:44:47.834Z [INFO] [Telemetry] cli.tool_call:
{
  "tool_name": "glob",
  "result_type": "SUCCESS",
  "duration_ms": 2986,
  ...
}

Notable: 1.88 seconds after the three-line swallow, a glob tool call completes SUCCESS in the same process — proving the handler kept the process alive with undefined spawn state rather than failing the turn.

Hit #2 — PID 93012 @ 13:07:45.705Z (byte-identical stack, different session)

13:07:45.705Z [ERROR] Uncaught Exception: read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)
13:07:45.705Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
13:07:45.705Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)
13:08:19.939Z [INFO] [Telemetry] cli.telemetry:
{
  "kind": "assistant_usage",
  "properties": {
    "initiator": "agent",
    ...
  }
}

Notable: 34 seconds after the swallow, assistant_usage telemetry fires in the same PID — further confirming the process survived the fatal exception. Session ended abruptly later mid-turn.

Alt-screen feature flag is on (from the same session's telemetry earlier in the log)

"features": {
  ...
  "ALT_SCREEN": true,
  ...
}

This flag is what makes the unrestored TUI buffer visible to the user as "weird characters" after the Node stack dump lands on the TTY.


Suggested labels

bug · platform:windows · tui · child_process

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:platform-windowsWindows-specific: PowerShell, cmd, Git Bash, WSL, Windows Terminalarea:terminal-renderingDisplay and rendering: flickering, scrolling, line wrapping, output formatting

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions