[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

# [BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

## Environment

| | |
|---|---|
| **Copilot CLI** | `v1.0.32` (host also has `v1.0.34` installed; repro observed on 1.0.32) |
| **Node** | `v24.11.1` |
| **OS** | Windows 11 Enterprise 10.0.26200 x64 |
| **Terminal** | Windows Terminal, PowerShell host, alt-screen enabled (`ALT_SCREEN: true` feature flag on in every session) |

## Summary

This issue is about **how copilot-cli's uncaught-exception handler behaves when an exception fires while the TUI is in alt-screen mode** — not about what raised the exception. In our case the underlying race is [#2864](https://github.com/github/copilot-cli/issues/2864) (`read ENOTCONN` in `child_process.spawn`), but the user-visible symptom — "weird characters on the screen" — is caused entirely by the handler, not by the race. The same suppression path will corrupt the terminal for *any* uncaught exception raised while alt-screen is active.

When an uncaught exception fires:

1. Node writes its default stack dump directly to stderr/TTY. The TTY is in alt-screen mode, so the dump paints *into* the alt-screen buffer without the user seeing it cleanly.
2. The CLI's own handler logs the exception once as `Uncaught Exception`, then twice more as `Uncaught Exception (suppressed, within error cooldown)`, and **does not exit**.
3. The handler never emits the alt-screen-pop sequence (`ESC [?1049l`), never re-shows the cursor (`ESC [?25h`), never resets SGR (`ESC [0m`).
4. The process survives the exception with undefined internal state (in our repro, the IPC pipe for the child is gone). Subsequent prompt draws, user keystrokes, and terminal echo land on the damaged alt-screen buffer.

Result: the user sees garbled characters, thinks the CLI "crashed and left weird characters," and some time later the session ends abruptly mid-turn (because the internal state really *was* unrecoverable, even though the handler pretended otherwise).

**This is a distinct defect from the underlying race.** Fixing #2864 (stop raising ENOTCONN) fixes that one race; fixing this issue (restore the TTY before suppressing, or exit instead of suppressing unrecoverable errors) prevents corrupted-terminal UX from *every other* uncaught exception too.

## User-visible symptom

> "Copilot CLI keeps crashing. It seems to just end and I get weird characters showing up in the CLI window."

The "weird characters" are the tail of Node's default stack dump plus the user's subsequent keystrokes, echoed against a stale alt-screen buffer.

## Repro signature

Any uncaught exception raised while `ALT_SCREEN: true` is active. In our telemetry the trigger is `read ENOTCONN` from `child_process.spawn → createSocket → tryReadStart` (see [#2864](https://github.com/github/copilot-cli/issues/2864) for the race itself). Two hits on the same host 2h 23m apart produced byte-identical corruption.

| # | UTC | PID | Behavior after handler fired |
|---|-----|-----|------------------------------|
| 1 | `2026-04-20T10:44:45.953Z` | 5388  | Suppressed ×2, telemetry continues ~N minutes, session ends without `session.shutdown` marker |
| 2 | `2026-04-20T13:07:45.705Z` | 93012 | Same pattern |

## Handler output (verbatim from `pid-5388-uncaught-context.log`)

```
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception: read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    ...
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    ...
```

The key observation: the handler logs the stack three times in the same millisecond and does nothing about the TTY. Anything after this point in the log is coming from a session that believes it is still healthy.

## Two problems

### 1. TTY never restored before logging/suppressing

The alt-screen-pop, cursor, and SGR reset sequences are never written. They need to go to `process.stderr` (the TTY, not the structured log sink) **before** the handler's own log lines, otherwise the last thing on the user's real screen is Node's raw stack dump painted into a hidden buffer.

Minimal fix:

```js
// at the top of the uncaught-exception handler, before any logging
if (process.stderr.isTTY && altScreenIsActive) {
    process.stderr.write('\x1b[?1049l\x1b[?25h\x1b[0m');
}
```

This alone eliminates the "weird characters" symptom for every class of uncaught exception, regardless of origin.

### 2. Cooldown suppression applied to unrecoverable errors

The current handler pattern (log once as `Uncaught Exception`, log subsequent copies as `(suppressed, within error cooldown)`, keep running) is reasonable for *recoverable* defects like a transient write. It is not safe for errors that mean internal state is already gone — `read ENOTCONN` on a freshly constructed `net.Socket` inside `child_process.spawn` means the IPC pipe is lost and any subsequent use of the child handle is undefined behavior.

Suggested policy:

- **Classify** the uncaught exception. Errors originating in `node:internal/child_process`, `node:net`, or any `socket`/`pipe` syscall should be treated as unrecoverable.
- **Unrecoverable:** restore the TTY (from fix #1), log, flush logs, `process.exit(1)`. The outer shell can re-launch.
- **Recoverable (e.g. JSON parse, transient network):** keep the current cooldown-suppression behavior, but still restore the TTY from alt-screen first so a stray stack dump doesn't corrupt the display.

The current "suppress everything and keep going" path is what produces the "it kept working for a while and then died mid-request" symptom. Exit once, cleanly, instead.

## Ask

1. In the uncaught-exception handler, write `\x1b[?1049l\x1b[?25h\x1b[0m` to `process.stderr` **before** the existing log calls when alt-screen is active. This fix is independent of the rest of this issue and is strictly a UX improvement for every uncaught exception path.
2. Decide whether `Uncaught Exception (suppressed, within error cooldown)` is the right policy for errors from `child_process` / `net` internals. Current behavior silently corrupts session state.
3. Confirm whether the alt-screen feature flag (`ALT_SCREEN: true` is present in every session's startup log on this host) has a documented disable switch users can flip as a workaround while this is being addressed.

## What this is not

- **Not the underlying race.** The race that raises `read ENOTCONN` is covered in [#2864](https://github.com/github/copilot-cli/issues/2864). This issue is solely about how the handler treats *any* uncaught exception when alt-screen is active.
- **Not a duplicate of #2639.** #2639 is the session-shutdown `write EPIPE` pattern on extension close; this handler path fires mid-session, not on shutdown.

## Evidence

- `pid-5388-uncaught-context.log` and `pid-93012-uncaught-context.log` — two independent hits, same timestamp-ms handler triple-log pattern, no TTY reset sequences before or after.
- Session telemetry for both PIDs continues after the handler fires (process survives the exception).
- Neither session ends with a `session.shutdown` marker in the process log — the "still alive after unrecoverable error" path terminates abruptly later.

---

*Filing alongside [#2864](https://github.com/github/copilot-cli/issues/2864). Both together explain the end-to-end "weird characters then crash" symptom on Windows; either one in isolation only explains half.*
# [BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

## Environment

| | |
|---|---|
| **Copilot CLI** | `v1.0.32` (host also has `v1.0.34` installed; repro observed on 1.0.32) |
| **Node** | `v24.11.1` |
| **OS** | Windows 11 Enterprise 10.0.26200 x64 |
| **Terminal** | Windows Terminal, PowerShell host, alt-screen enabled (`ALT_SCREEN: true` feature flag on in every session) |

## Summary

This issue is about **how copilot-cli's uncaught-exception handler behaves when an exception fires while the TUI is in alt-screen mode** — not about what raised the exception. In our case the underlying race is [#2864](https://github.com/github/copilot-cli/issues/2864) (`read ENOTCONN` in `child_process.spawn`), but the user-visible symptom — "weird characters on the screen" — is caused entirely by the handler, not by the race. The same suppression path will corrupt the terminal for *any* uncaught exception raised while alt-screen is active.

When an uncaught exception fires:

1. Node writes its default stack dump directly to stderr/TTY. The TTY is in alt-screen mode, so the dump paints *into* the alt-screen buffer without the user seeing it cleanly.
2. The CLI's own handler logs the exception once as `Uncaught Exception`, then twice more as `Uncaught Exception (suppressed, within error cooldown)`, and **does not exit**.
3. The handler never emits the alt-screen-pop sequence (`ESC [?1049l`), never re-shows the cursor (`ESC [?25h`), never resets SGR (`ESC [0m`).
4. The process survives the exception with undefined internal state (in our repro, the IPC pipe for the child is gone). Subsequent prompt draws, user keystrokes, and terminal echo land on the damaged alt-screen buffer.

Result: the user sees garbled characters, thinks the CLI "crashed and left weird characters," and some time later the session ends abruptly mid-turn (because the internal state really *was* unrecoverable, even though the handler pretended otherwise).

**This is a distinct defect from the underlying race.** Fixing #2864 (stop raising ENOTCONN) fixes that one race; fixing this issue (restore the TTY before suppressing, or exit instead of suppressing unrecoverable errors) prevents corrupted-terminal UX from *every other* uncaught exception too.

## User-visible symptom

> "Copilot CLI keeps crashing. It seems to just end and I get weird characters showing up in the CLI window."

The "weird characters" are the tail of Node's default stack dump plus the user's subsequent keystrokes, echoed against a stale alt-screen buffer.

## Repro signature

Any uncaught exception raised while `ALT_SCREEN: true` is active. In our telemetry the trigger is `read ENOTCONN` from `child_process.spawn → createSocket → tryReadStart` (see [#2864](https://github.com/github/copilot-cli/issues/2864) for the race itself). Two hits on the same host 2h 23m apart produced byte-identical corruption.

| # | UTC | PID | Behavior after handler fired |
|---|-----|-----|------------------------------|
| 1 | `2026-04-20T10:44:45.953Z` | 5388  | Suppressed ×2, telemetry continues ~N minutes, session ends without `session.shutdown` marker |
| 2 | `2026-04-20T13:07:45.705Z` | 93012 | Same pattern |

## Handler output (verbatim from `pid-5388-uncaught-context.log`)

```
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception: read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    at Socket._read (node:net:731:5)
    at Readable.read (node:internal/streams/readable:737:12)
    at Socket.read (node:net:785:39)
    at new Socket (node:net:494:12)
    at Object.Socket (node:net:363:41)
    at createSocket (node:internal/child_process:336:14)
    at ChildProcess.spawn (node:internal/child_process:451:23)
    at spawn (node:child_process:796:9)
    at execFile (node:child_process:349:17)
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    at tryReadStart (node:net:716:20)
    ...
2026-04-20T10:44:45.953Z [ERROR] Uncaught Exception (suppressed, within error cooldown): read ENOTCONN
Error: read ENOTCONN
    ...
```

The key observation: the handler logs the stack three times in the same millisecond and does nothing about the TTY. Anything after this point in the log is coming from a session that believes it is still healthy.

## Two problems

### 1. TTY never restored before logging/suppressing

The alt-screen-pop, cursor, and SGR reset sequences are never written. They need to go to `process.stderr` (the TTY, not the structured log sink) **before** the handler's own log lines, otherwise the last thing on the user's real screen is Node's raw stack dump painted into a hidden buffer.

Minimal fix:

```js
// at the top of the uncaught-exception handler, before any logging
if (process.stderr.isTTY && altScreenIsActive) {
    process.stderr.write('\x1b[?1049l\x1b[?25h\x1b[0m');
}
```

This alone eliminates the "weird characters" symptom for every class of uncaught exception, regardless of origin.

### 2. Cooldown suppression applied to unrecoverable errors

The current handler pattern (log once as `Uncaught Exception`, log subsequent copies as `(suppressed, within error cooldown)`, keep running) is reasonable for *recoverable* defects like a transient write. It is not safe for errors that mean internal state is already gone — `read ENOTCONN` on a freshly constructed `net.Socket` inside `child_process.spawn` means the IPC pipe is lost and any subsequent use of the child handle is undefined behavior.

Suggested policy:

- **Classify** the uncaught exception. Errors originating in `node:internal/child_process`, `node:net`, or any `socket`/`pipe` syscall should be treated as unrecoverable.
- **Unrecoverable:** restore the TTY (from fix #1), log, flush logs, `process.exit(1)`. The outer shell can re-launch.
- **Recoverable (e.g. JSON parse, transient network):** keep the current cooldown-suppression behavior, but still restore the TTY from alt-screen first so a stray stack dump doesn't corrupt the display.

The current "suppress everything and keep going" path is what produces the "it kept working for a while and then died mid-request" symptom. Exit once, cleanly, instead.

## Ask

1. In the uncaught-exception handler, write `\x1b[?1049l\x1b[?25h\x1b[0m` to `process.stderr` **before** the existing log calls when alt-screen is active. This fix is independent of the rest of this issue and is strictly a UX improvement for every uncaught exception path.
2. Decide whether `Uncaught Exception (suppressed, within error cooldown)` is the right policy for errors from `child_process` / `net` internals. Current behavior silently corrupts session state.
3. Confirm whether the alt-screen feature flag (`ALT_SCREEN: true` is present in every session's startup log on this host) has a documented disable switch users can flip as a workaround while this is being addressed.

## What this is not

- **Not the underlying race.** The race that raises `read ENOTCONN` is covered in [#2864](https://github.com/github/copilot-cli/issues/2864). This issue is solely about how the handler treats *any* uncaught exception when alt-screen is active.
- **Not a duplicate of #2639.** #2639 is the session-shutdown `write EPIPE` pattern on extension close; this handler path fires mid-session, not on shutdown.

## Evidence

- `pid-5388-uncaught-context.log` and `pid-93012-uncaught-context.log` — two independent hits, same timestamp-ms handler triple-log pattern, no TTY reset sequences before or after.
- Session telemetry for both PIDs continues after the handler fires (process survives the exception).
- Neither session ends with a `session.shutdown` marker in the process log — the "still alive after unrecoverable error" path terminates abruptly later.

---

*Filing alongside [#2864](https://github.com/github/copilot-cli/issues/2864). Both together explain the end-to-end "weird characters then crash" symptom on Windows; either one in isolation only explains half.*



Copilot CLI	`v1.0.32` (host also has `v1.0.34` installed; repro observed on 1.0.32)
Node	`v24.11.1`
OS	Windows 11 Enterprise 10.0.26200 x64
Terminal	Windows Terminal, PowerShell host, alt-screen enabled (`ALT_SCREEN: true` feature flag on in every session)


Copilot CLI	`v1.0.32` (host also has `v1.0.34` installed; repro observed on 1.0.32)
Node	`v24.11.1`
OS	Windows 11 Enterprise 10.0.26200 x64
Terminal	Windows Terminal, PowerShell host, alt-screen enabled (`ALT_SCREEN: true` feature flag on in every session)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress) #2865

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

Environment

Summary

User-visible symptom

Repro signature

Handler output (verbatim from `pid-5388-uncaught-context.log`)

Two problems

1. TTY never restored before logging/suppressing

2. Cooldown suppression applied to unrecoverable errors

Ask

What this is not

Evidence

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

Environment

Summary

User-visible symptom

Repro signature

Handler output (verbatim from `pid-5388-uncaught-context.log`)

Two problems

1. TTY never restored before logging/suppressing

2. Cooldown suppression applied to unrecoverable errors

Ask

What this is not

Evidence

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

#	UTC	PID	Behavior after handler fired
1	`2026-04-20T10:44:45.953Z`	5388	Suppressed ×2, telemetry continues ~N minutes, session ends without `session.shutdown` marker
2	`2026-04-20T13:07:45.705Z`	93012	Same pattern

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress) #2865

Description

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

Environment

Summary

User-visible symptom

Repro signature

Handler output (verbatim from pid-5388-uncaught-context.log)

Two problems

1. TTY never restored before logging/suppressing

2. Cooldown suppression applied to unrecoverable errors

Ask

What this is not

Evidence

[BUG] Windows: uncaught-exception handler leaves alt-screen TUI in corrupted state (no TTY reset before suppress)

Environment

Summary

User-visible symptom

Repro signature

Handler output (verbatim from pid-5388-uncaught-context.log)

Two problems

1. TTY never restored before logging/suppressing

2. Cooldown suppression applied to unrecoverable errors

Ask

What this is not

Evidence

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Handler output (verbatim from `pid-5388-uncaught-context.log`)

Handler output (verbatim from `pid-5388-uncaught-context.log`)