Skip to content

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress) #2865

@nschwerzler

Description

@nschwerzler

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

Environment

Copilot CLI v1.0.32 (host also has v1.0.34 installed; repro observed on 1.0.32)
Node v24.11.1
OS Windows 11 Enterprise 10.0.26200 x64
Terminal Windows Terminal, PowerShell host, alt-screen enabled (ALT_SCREEN: true feature flag on in every session)

Summary

This issue is about how copilot-cli's uncaught-exception handler behaves when an exception fires while the TUI is in alt-screen mode — not about what raised the exception. In our case the underlying race is #2864 (read ENOTCONN in child_process.spawn), but the user-visible symptom — "weird characters on the screen" — is caused entirely by the handler, not by the race. The same suppression path will corrupt the terminal for any uncaught exception raised while alt-screen is active.

When an uncaught exception fires:

  1. Node writes its default stack dump directly to stderr/TTY. The TTY is in alt-screen mode, so the dump paints into the alt-screen buffer without the user seeing it cleanly.
  2. The CLI's own handler logs the exception once as Uncaught Exception, then twice more as Uncaught Exception (suppressed, within error cooldown), and does not exit.
  3. The handler never emits the alt-screen-pop sequence (ESC [?1049l), never re-shows the cursor (ESC [?25h), never resets SGR (ESC [0m).
  4. The process survives the exception with undefined internal state (in our repro, the IPC pipe for the child is gone). Subsequent prompt draws, user keystrokes, and terminal echo land on the damaged alt-screen buffer.

Result: the user sees garbled characters, thinks the CLI "crashed and left weird characters," and some time later the session ends abruptly mid-turn (because the internal state really was unrecoverable, even though the handler pretended otherwise).

This is a distinct defect from the underlying race. Fixing #2864 (stop raising ENOTCONN) fixes that one race; fixing this issue (restore the TTY before suppressing, or exit instead of suppressing unrecoverable errors) prevents corrupted-terminal UX from every other uncaught exception too.

User-visible symptom

"Copilot CLI keeps crashing. It seems to just end and I get weird characters showing up in the CLI window."

The "weird characters" are the tail of Node's default stack dump plus the user's subsequent keystrokes, echoed against a stale alt-screen buffer.

Repro signature

Any uncaught exception raised while ALT_SCREEN: true is active. In our telemetry the trigger is read ENOTCONN from child_process.spawn → createSocket → tryReadStart (see #2864 for the race itself). Two hits on the same host 2h 23m apart produced byte-identical corruption.

# UTC PID Behavior after handler fired
1 2026-04-20T10:44:45.953Z 5388 Suppressed ×2, telemetry continues ~N minutes, session ends without session.shutdown marker
2 2026-04-20T13:07:45.705Z 93012 Same pattern

Handler output (verbatim from pid-5388-uncaught-context.log)

2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception: read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    ...
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    ...

The key observation: the handler logs the stack three times in the same millisecond and does nothing about the TTY. Anything after this point in the log is coming from a session that believes it is still healthy.

Two problems

1. TTY never restored before logging/suppressing

The alt-screen-pop, cursor, and SGR reset sequences are never written. They need to go to process.stderr (the TTY, not the structured log sink) before the handler's own log lines, otherwise the last thing on the user's real screen is Node's raw stack dump painted into a hidden buffer.

Minimal fix:

// at the top of the uncaught-exception handler, before any logging
if (process.stderr.isTTY && altScreenIsActive) {
    process.stderr.write('\x1b[?1049l\x1b[?25h\x1b[0m');
}

This alone eliminates the "weird characters" symptom for every class of uncaught exception, regardless of origin.

2. Cooldown suppression applied to unrecoverable errors

The current handler pattern (log once as Uncaught Exception, log subsequent copies as (suppressed, within error cooldown), keep running) is reasonable for recoverable defects like a transient write. It is not safe for errors that mean internal state is already gone — read ENOTCONN on a freshly constructed net.Socket inside child_process.spawn means the IPC pipe is lost and any subsequent use of the child handle is undefined behavior.

Suggested policy:

  • Classify the uncaught exception. Errors originating in node:internal/child_process, node:net, or any socket/pipe syscall should be treated as unrecoverable.
  • Unrecoverable: restore the TTY (from fix Create ownership.yaml #1), log, flush logs, process.exit(1). The outer shell can re-launch.
  • Recoverable (e.g. JSON parse, transient network): keep the current cooldown-suppression behavior, but still restore the TTY from alt-screen first so a stray stack dump doesn't corrupt the display.

The current "suppress everything and keep going" path is what produces the "it kept working for a while and then died mid-request" symptom. Exit once, cleanly, instead.

Ask

  1. In the uncaught-exception handler, write \x1b[?1049l\x1b[?25h\x1b[0m to process.stderr before the existing log calls when alt-screen is active. This fix is independent of the rest of this issue and is strictly a UX improvement for every uncaught exception path.
  2. Decide whether Uncaught Exception (suppressed, within error cooldown) is the right policy for errors from child_process / net internals. Current behavior silently corrupts session state.
  3. Confirm whether the alt-screen feature flag (ALT_SCREEN: true is present in every session's startup log on this host) has a documented disable switch users can flip as a workaround while this is being addressed.

What this is not

Evidence

  • pid-5388-uncaught-context.log and pid-93012-uncaught-context.log — two independent hits, same timestamp-ms handler triple-log pattern, no TTY reset sequences before or after.
  • Session telemetry for both PIDs continues after the handler fires (process survives the exception).
  • Neither session ends with a session.shutdown marker in the process log — the "still alive after unrecoverable error" path terminates abruptly later.

Filing alongside #2864. Both together explain the end-to-end "weird characters then crash" symptom on Windows; either one in isolation only explains half.

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

Environment

Copilot CLI v1.0.32 (host also has v1.0.34 installed; repro observed on 1.0.32)
Node v24.11.1
OS Windows 11 Enterprise 10.0.26200 x64
Terminal Windows Terminal, PowerShell host, alt-screen enabled (ALT_SCREEN: true feature flag on in every session)

Summary

This issue is about how copilot-cli's uncaught-exception handler behaves when an exception fires while the TUI is in alt-screen mode — not about what raised the exception. In our case the underlying race is #2864 (read ENOTCONN in child_process.spawn), but the user-visible symptom — "weird characters on the screen" — is caused entirely by the handler, not by the race. The same suppression path will corrupt the terminal for any uncaught exception raised while alt-screen is active.

When an uncaught exception fires:

  1. Node writes its default stack dump directly to stderr/TTY. The TTY is in alt-screen mode, so the dump paints into the alt-screen buffer without the user seeing it cleanly.
  2. The CLI's own handler logs the exception once as Uncaught Exception, then twice more as Uncaught Exception (suppressed, within error cooldown), and does not exit.
  3. The handler never emits the alt-screen-pop sequence (ESC [?1049l), never re-shows the cursor (ESC [?25h), never resets SGR (ESC [0m).
  4. The process survives the exception with undefined internal state (in our repro, the IPC pipe for the child is gone). Subsequent prompt draws, user keystrokes, and terminal echo land on the damaged alt-screen buffer.

Result: the user sees garbled characters, thinks the CLI "crashed and left weird characters," and some time later the session ends abruptly mid-turn (because the internal state really was unrecoverable, even though the handler pretended otherwise).

This is a distinct defect from the underlying race. Fixing #2864 (stop raising ENOTCONN) fixes that one race; fixing this issue (restore the TTY before suppressing, or exit instead of suppressing unrecoverable errors) prevents corrupted-terminal UX from every other uncaught exception too.

User-visible symptom

"Copilot CLI keeps crashing. It seems to just end and I get weird characters showing up in the CLI window."

The "weird characters" are the tail of Node's default stack dump plus the user's subsequent keystrokes, echoed against a stale alt-screen buffer.

Repro signature

Any uncaught exception raised while ALT_SCREEN: true is active. In our telemetry the trigger is read ENOTCONN from child_process.spawn → createSocket → tryReadStart (see #2864 for the race itself). Two hits on the same host 2h 23m apart produced byte-identical corruption.

# UTC PID Behavior after handler fired
1 2026-04-20T10:44:45.953Z 5388 Suppressed ×2, telemetry continues ~N minutes, session ends without session.shutdown marker
2 2026-04-20T13:07:45.705Z 93012 Same pattern

Handler output (verbatim from pid-5388-uncaught-context.log)

2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception: read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    ...
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    ...

The key observation: the handler logs the stack three times in the same millisecond and does nothing about the TTY. Anything after this point in the log is coming from a session that believes it is still healthy.

Two problems

1. TTY never restored before logging/suppressing

The alt-screen-pop, cursor, and SGR reset sequences are never written. They need to go to process.stderr (the TTY, not the structured log sink) before the handler's own log lines, otherwise the last thing on the user's real screen is Node's raw stack dump painted into a hidden buffer.

Minimal fix:

// at the top of the uncaught-exception handler, before any logging
if (process.stderr.isTTY && altScreenIsActive) {
    process.stderr.write('\x1b[?1049l\x1b[?25h\x1b[0m');
}

This alone eliminates the "weird characters" symptom for every class of uncaught exception, regardless of origin.

2. Cooldown suppression applied to unrecoverable errors

The current handler pattern (log once as Uncaught Exception, log subsequent copies as (suppressed, within error cooldown), keep running) is reasonable for recoverable defects like a transient write. It is not safe for errors that mean internal state is already gone — read ENOTCONN on a freshly constructed net.Socket inside child_process.spawn means the IPC pipe is lost and any subsequent use of the child handle is undefined behavior.

Suggested policy:

  • Classify the uncaught exception. Errors originating in node:internal/child_process, node:net, or any socket/pipe syscall should be treated as unrecoverable.
  • Unrecoverable: restore the TTY (from fix Create ownership.yaml #1), log, flush logs, process.exit(1). The outer shell can re-launch.
  • Recoverable (e.g. JSON parse, transient network): keep the current cooldown-suppression behavior, but still restore the TTY from alt-screen first so a stray stack dump doesn't corrupt the display.

The current "suppress everything and keep going" path is what produces the "it kept working for a while and then died mid-request" symptom. Exit once, cleanly, instead.

Ask

  1. In the uncaught-exception handler, write \x1b[?1049l\x1b[?25h\x1b[0m to process.stderr before the existing log calls when alt-screen is active. This fix is independent of the rest of this issue and is strictly a UX improvement for every uncaught exception path.
  2. Decide whether Uncaught Exception (suppressed, within error cooldown) is the right policy for errors from child_process / net internals. Current behavior silently corrupts session state.
  3. Confirm whether the alt-screen feature flag (ALT_SCREEN: true is present in every session's startup log on this host) has a documented disable switch users can flip as a workaround while this is being addressed.

What this is not

Evidence

  • pid-5388-uncaught-context.log and pid-93012-uncaught-context.log — two independent hits, same timestamp-ms handler triple-log pattern, no TTY reset sequences before or after.
  • Session telemetry for both PIDs continues after the handler fires (process survives the exception).
  • Neither session ends with a session.shutdown marker in the process log — the "still alive after unrecoverable error" path terminates abruptly later.

Filing alongside #2864. Both together explain the end-to-end "weird characters then crash" symptom on Windows; either one in isolation only explains half.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:platform-windowsWindows-specific: PowerShell, cmd, Git Bash, WSL, Windows Terminalarea:terminal-renderingDisplay and rendering: flickering, scrolling, line wrapping, output formatting

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions