[BUG] Windows: unhandled read ENOTCONN in child_process.spawn corrupts TUI alt-screen
Environment
|
|
| Copilot CLI |
v1.0.32 (host also has v1.0.34 installed; repro observed on 1.0.32) |
| Node |
v24.11.1 |
| OS |
Windows 11 Enterprise 10.0.26200 x64 |
| Terminal |
Windows Terminal, PowerShell host, alt-screen enabled (ALT_SCREEN: true feature flag on in every session) |
Summary
copilot.exe raises an unhandled Error: read ENOTCONN from deep inside child_process.spawn → createSocket → tryReadStart on Windows. The CLI's own uncaught-exception handler catches it and logs it three times (once as Uncaught Exception, twice as (suppressed, within error cooldown)), but Node has already written the default stack dump to the TTY while the CLI is in alt-screen TUI mode. The alt-screen is never restored (ESC [?1049l never sent), leaving the terminal in a half-painted state. Any subsequent user input echoes against the stale alt-screen buffer as visible garbage — the user perceives this as "CLI crashed and left weird characters on the screen."
The process itself survives (telemetry continues after the exception) but the socket/child-process state is undefined, and some time later the session ends abruptly mid-turn.
User-visible symptom
"Copilot CLI keeps crashing. It seems to just end and I get weird characters showing up in the CLI window."
Repro observed
Two independent fatal events on the same day, 2h 23m apart, in two different long-running sessions. Byte-identical stack traces.
| # |
UTC |
PID |
Context |
| 1 |
10:44:45.953Z |
5388 |
~1h into session, during AI request group |
| 2 |
13:07:45.705Z |
93012 |
~1h into session, during AI request group |
Stack trace (identical on both hits)
Error: read ENOTCONN
at tryReadStart (node:net:716:20)
at Socket._read (node:net:731:5)
at Readable.read (node:internal/streams/readable:737:12)
at Socket.read (node:net:785:39)
at new Socket (node:net:494:12)
at Object.Socket (node:net:363:41)
at createSocket (node:internal/child_process:336:14)
at ChildProcess.spawn (node:internal/child_process:451:23)
at spawn (node:child_process:796:9)
at execFile (node:child_process:349:17)
Followed immediately by:
[ERROR] Uncaught Exception: read ENOTCONN
[ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
[ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Root cause (our reading)
read ENOTCONN raised on a freshly-constructed net.Socket inside child_process.spawn is a known Windows-specific race: the child's IPC named pipe half-closes between when the parent allocates its Socket wrapper and when Node asks libuv to tryReadStart. It manifests most often on hosts that spawn many short-lived child processes and is orthogonal to any single child — it fired from Copilot's own execFile call for a hook/tool.
Related upstream Node.js threads: nodejs/node#27097 and the broader tryReadStart / ENOTCONN discussion. Any TUI CLI using the execa / cross-spawn family on Windows is exposed.
Two problems for copilot-cli to fix
1. Terminal is left corrupted ("weird characters")
Because alt-screen is active and Node's default stack dump goes directly to stderr/TTY, the restore sequence never runs. The uncaught-exception handler should, before logging/suppressing:
process.stderr.write('\x1b[?1049l\x1b[?25h\x1b[0m');
to pop the alt-screen, re-show the cursor, and reset SGR. Without that, the user's terminal stays in the TUI buffer with partial paint damage.
2. Handler swallows an unrecoverable error
The cooldown-suppression pattern keeps the process running, but read ENOTCONN on createSocket means the IPC pipe is already gone — there is no state to recover. Options, in decreasing order of preference:
- Retry the failed
spawn once; if it fails again, surface the failure to the caller (hook/tool wrapper) and let the turn fail cleanly.
- Or log, restore the TTY, and
process.exit(1) so the outer shell can re-launch.
- Do not silently keep the session alive with an undefined child-process state — this is what produces the "it kept going but eventually died mid-request" symptom.
Ask
- Repro
read ENOTCONN at child_process.spawn → createSocket → tryReadStart on Windows + Node 24.11.x with a long-lived TUI session doing frequent hook/tool execFile calls. Two independent hits on one machine in a single day with the same stack.
- In the uncaught-exception handler, restore the TTY (
\x1b[?1049l\x1b[?25h\x1b[0m) before logging/suppressing when alt-screen is active.
- Decide whether
read ENOTCONN from child_process internals should be retried-once or hard-fail the turn rather than suppressed with a cooldown.
- Optional: wrap
execFile / spawn with a small Windows-aware retry for ENOTCONN, and/or document the race in the FAQ.
What this is not
Evidence — raw log snippets
Hit #1 — PID 5388 @ 10:44:45.953Z (exception handler output verbatim)
10:44:45.953Z [ERROR] Uncaught Exception: read ENOTCONN
Error: read ENOTCONN
at tryReadStart (node:net:716:20)
at Socket._read (node:net:731:5)
at Readable.read (node:internal/streams/readable:737:12)
at Socket.read (node:net:785:39)
at new Socket (node:net:494:12)
at Object.Socket (node:net:363:41)
at createSocket (node:internal/child_process:336:14)
at ChildProcess.spawn (node:internal/child_process:451:23)
at spawn (node:child_process:796:9)
at execFile (node:child_process:349:17)
10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
at tryReadStart (node:net:716:20)
at Socket._read (node:net:731:5)
at Readable.read (node:internal/streams/readable:737:12)
at Socket.read (node:net:785:39)
at new Socket (node:net:494:12)
at Object.Socket (node:net:363:41)
at createSocket (node:internal/child_process:336:14)
10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
at tryReadStart (node:net:716:20)
at Socket._read (node:net:731:5)
at Readable.read (node:internal/streams/readable:737:12)
at Socket.read (node:net:785:39)
at new Socket (node:net:494:12)
at Object.Socket (node:net:363:41)
at createSocket (node:internal/child_process:336:14)
at ChildProcess.spawn (node:internal/child_process:451:23)
at spawn (node:child_process:796:9)
at execFile (node:child_process:349:17)
10:44:47.834Z [INFO] [Telemetry] cli.tool_call:
{
"tool_name": "glob",
"result_type": "SUCCESS",
"duration_ms": 2986,
...
}
Notable: 1.88 seconds after the three-line swallow, a glob tool call completes SUCCESS in the same process — proving the handler kept the process alive with undefined spawn state rather than failing the turn.
Hit #2 — PID 93012 @ 13:07:45.705Z (byte-identical stack, different session)
13:07:45.705Z [ERROR] Uncaught Exception: read ENOTCONN
Error: read ENOTCONN
at tryReadStart (node:net:716:20)
at Socket._read (node:net:731:5)
at Readable.read (node:internal/streams/readable:737:12)
at Socket.read (node:net:785:39)
at new Socket (node:net:494:12)
at Object.Socket (node:net:363:41)
at createSocket (node:internal/child_process:336:14)
at ChildProcess.spawn (node:internal/child_process:451:23)
at spawn (node:child_process:796:9)
at execFile (node:child_process:349:17)
13:07:45.705Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
at tryReadStart (node:net:716:20)
at Socket._read (node:net:731:5)
at Readable.read (node:internal/streams/readable:737:12)
at Socket.read (node:net:785:39)
at new Socket (node:net:494:12)
at Object.Socket (node:net:363:41)
at createSocket (node:internal/child_process:336:14)
13:07:45.705Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
at tryReadStart (node:net:716:20)
at Socket._read (node:net:731:5)
at Readable.read (node:internal/streams/readable:737:12)
at Socket.read (node:net:785:39)
at new Socket (node:net:494:12)
at Object.Socket (node:net:363:41)
at createSocket (node:internal/child_process:336:14)
at ChildProcess.spawn (node:internal/child_process:451:23)
at spawn (node:child_process:796:9)
at execFile (node:child_process:349:17)
13:08:19.939Z [INFO] [Telemetry] cli.telemetry:
{
"kind": "assistant_usage",
"properties": {
"initiator": "agent",
...
}
}
Notable: 34 seconds after the swallow, assistant_usage telemetry fires in the same PID — further confirming the process survived the fatal exception. Session ended abruptly later mid-turn.
Alt-screen feature flag is on (from the same session's telemetry earlier in the log)
"features": {
...
"ALT_SCREEN": true,
...
}
This flag is what makes the unrestored TUI buffer visible to the user as "weird characters" after the Node stack dump lands on the TTY.
Suggested labels
bug · platform:windows · tui · child_process
[BUG] Windows: unhandled
read ENOTCONNinchild_process.spawncorrupts TUI alt-screenEnvironment
v1.0.32(host also hasv1.0.34installed; repro observed on 1.0.32)v24.11.1ALT_SCREEN: truefeature flag on in every session)Summary
copilot.exeraises an unhandledError: read ENOTCONNfrom deep insidechild_process.spawn → createSocket → tryReadStarton Windows. The CLI's own uncaught-exception handler catches it and logs it three times (once asUncaught Exception, twice as(suppressed, within error cooldown)), but Node has already written the default stack dump to the TTY while the CLI is in alt-screen TUI mode. The alt-screen is never restored (ESC [?1049lnever sent), leaving the terminal in a half-painted state. Any subsequent user input echoes against the stale alt-screen buffer as visible garbage — the user perceives this as "CLI crashed and left weird characters on the screen."The process itself survives (telemetry continues after the exception) but the socket/child-process state is undefined, and some time later the session ends abruptly mid-turn.
User-visible symptom
Repro observed
Two independent fatal events on the same day, 2h 23m apart, in two different long-running sessions. Byte-identical stack traces.
10:44:45.953Z13:07:45.705ZStack trace (identical on both hits)
Followed immediately by:
Root cause (our reading)
read ENOTCONNraised on a freshly-constructednet.Socketinsidechild_process.spawnis a known Windows-specific race: the child's IPC named pipe half-closes between when the parent allocates itsSocketwrapper and when Node asks libuv totryReadStart. It manifests most often on hosts that spawn many short-lived child processes and is orthogonal to any single child — it fired from Copilot's ownexecFilecall for a hook/tool.Related upstream Node.js threads:
nodejs/node#27097and the broadertryReadStart/ ENOTCONN discussion. Any TUI CLI using theexeca/cross-spawnfamily on Windows is exposed.Two problems for copilot-cli to fix
1. Terminal is left corrupted ("weird characters")
Because alt-screen is active and Node's default stack dump goes directly to stderr/TTY, the restore sequence never runs. The uncaught-exception handler should, before logging/suppressing:
to pop the alt-screen, re-show the cursor, and reset SGR. Without that, the user's terminal stays in the TUI buffer with partial paint damage.
2. Handler swallows an unrecoverable error
The cooldown-suppression pattern keeps the process running, but
read ENOTCONNoncreateSocketmeans the IPC pipe is already gone — there is no state to recover. Options, in decreasing order of preference:spawnonce; if it fails again, surface the failure to the caller (hook/tool wrapper) and let the turn fail cleanly.process.exit(1)so the outer shell can re-launch.Ask
read ENOTCONNatchild_process.spawn → createSocket → tryReadStarton Windows + Node 24.11.x with a long-lived TUI session doing frequent hook/toolexecFilecalls. Two independent hits on one machine in a single day with the same stack.\x1b[?1049l\x1b[?25h\x1b[0m) before logging/suppressing when alt-screen is active.read ENOTCONNfromchild_processinternals should be retried-once or hard-fail the turn rather than suppressed with a cooldown.execFile/spawnwith a small Windows-aware retry for ENOTCONN, and/or document the race in the FAQ.What this is not
Evidence — raw log snippets
Hit #1 — PID 5388 @ 10:44:45.953Z (exception handler output verbatim)
Notable: 1.88 seconds after the three-line swallow, a
globtool call completesSUCCESSin the same process — proving the handler kept the process alive with undefined spawn state rather than failing the turn.Hit #2 — PID 93012 @ 13:07:45.705Z (byte-identical stack, different session)
Notable: 34 seconds after the swallow,
assistant_usagetelemetry fires in the same PID — further confirming the process survived the fatal exception. Session ended abruptly later mid-turn.Alt-screen feature flag is on (from the same session's telemetry earlier in the log)
This flag is what makes the unrestored TUI buffer visible to the user as "weird characters" after the Node stack dump lands on the TTY.
Suggested labels
bug·platform:windows·tui·child_process