Blocking socket during handshake allows denial of service via partial TCP message

# Blocking socket during handshake allows denial of service via partial TCP message

## Summary

The clickhouse-cpp client uses blocking sockets with no receive timeout during the TCP handshake phase. A malicious server, man-in-the-middle, or network fault that delivers a partial handshake response can cause the client thread to hang **indefinitely** on `recv()`, effectively denying service.

## Environment

- **Library**: clickhouse-cpp (all current versions)
- **Affected platforms**: All (Linux, macOS, Windows)
- **Affected component**: `Client::Impl::Handshake()` → `ReceiveHello()`

## Root Cause

Three factors combine to create the vulnerability:

1. **`connection_recv_timeout` defaults to 0** ([client.h, line 120](https://github.com/ClickHouse/clickhouse-cpp/blob/master/clickhouse/client.h#L120)), which means `SO_RCVTIMEO` is never effectively set — `recv()` blocks forever.

2. **Socket is switched to blocking mode after connect** ([socket.cpp, `SocketConnect()`](https://github.com/ClickHouse/clickhouse-cpp/blob/master/clickhouse/base/socket.cpp)): `SetNonBlock(*s, false)` is called once the connection succeeds.

3. **`ReceiveHello()` performs multiple sequential blocking reads** ([client.cpp, `ReceiveHello()`](https://github.com/ClickHouse/clickhouse-cpp/blob/master/clickhouse/client.cpp)): 8 `WireFormat::Read*()` calls, each of which can individually block on `recv()` if data stops arriving mid-message.

## Why `SetConnectionRecvTimeout` is not a viable workaround

`SO_RCVTIMEO` is set **once** on the socket during `SocketConnect()` and applies to **all** `recv()` calls for the socket's entire lifetime. Setting it to a short value (e.g. 10s) would protect the handshake, but would also kill long-running queries — a `SELECT` that takes minutes or hours to compute on the server side would timeout on the client before any data arrives.

There is currently **no way to set a different timeout for the handshake vs. query execution**.

## Concrete attack / failure scenarios

Any of the following byte sequences sent by a rogue server (or injected by a MITM) after the client sends its `Hello` packet will hang the client thread forever:

### Scenario 1: Single continuation byte (1 byte total)

```
0x80
```

The `ReadVarint64()` loop reads byte `0x80`: continuation bit is set (meaning "more bytes follow"), value bits = 0. It loops to call `ReadByte()` again → `recv()` → **blocks forever**.

### Scenario 2: Valid packet type + truncated string length (2 bytes total)

```
0x00 0x85
```

- `0x00` = varint 0 = `ServerCodes::Hello` ✓
- `0x85` = start of string length varint, continuation bit set → `ReadByte()` → `recv()` → **blocks forever**

### Scenario 3: Valid header + truncated string body (7 bytes total)

```
0x00 0x0A 'C' 'l' 'i' 'c' 'k'
```

- `0x00` = `ServerCodes::Hello`
- `0x0A` = string length 10
- 5 of 10 expected bytes arrive

`ReadAll()` reads the 5 available bytes, loops back for the remaining 5 → `recv()` → **blocks forever**.

### Scenario 4: Network fault mid-handshake

Server sends a valid Hello response, but TCP connection drops (RST lost, half-open state) between any two of the 8 read operations in `ReceiveHello()`. With no `SO_RCVTIMEO`, the client hangs indefinitely. Even TCP keepalive (disabled by default, 75s detection time when enabled) doesn't help against a server that keeps the connection open but stops sending.

## Affected code path

```
Client::Impl::ResetConnection()
└─ Handshake()                              # client.cpp
   ├─ SendHello()                           # succeeds
   └─ ReceiveHello()                        # client.cpp:1065
      ├─ ReadVarint64 (packet_type)         ← BLOCK POINT 1
      ├─ ReadString   (server_name)         ← BLOCK POINT 2 (varint len + body)
      ├─ ReadUInt64   (version_major)       ← BLOCK POINT 3
      ├─ ReadUInt64   (version_minor)       ← BLOCK POINT 4
      ├─ ReadUInt64   (revision)            ← BLOCK POINT 5
      ├─ ReadString   (timezone)            ← BLOCK POINT 6
      ├─ ReadString   (display_name)        ← BLOCK POINT 7
      └─ ReadUInt64   (version_patch)       ← BLOCK POINT 8
```

Each `Read*` call eventually reaches:
```
WireFormat::ReadVarint64 / ReadAll
  → BufferedInput::DoNext
    → SocketInput::DoRead
      → ::recv(fd, buf, len, 0)   // blocks forever when SO_RCVTIMEO=0
```

## Proposed fix

Add a **handshake-scoped timeout** that is independent of the query recv timeout. Two options:

### Option A: New `connection_handshake_timeout` option (recommended)

Add a dedicated option with a safe default:

```cpp
DECLARE_FIELD(connection_handshake_timeout, std::chrono::milliseconds,
              SetConnectionHandshakeTimeout, std::chrono::seconds(30));
```

In `ResetConnection()`, temporarily apply it to the socket before `Handshake()`, then restore the user's `connection_recv_timeout` after:

```cpp
void Client::Impl::ResetConnection() {
    auto socket = socket_factory_->connect(options_, current_endpoint_.value());

    // Apply handshake timeout
    SetSocketTimeout(socket->fd(), options_.connection_handshake_timeout);

    InitializeStreams(std::move(socket));
    inserting_ = false;

    if (!Handshake()) {
        throw ProtocolError("fail to connect to " + options_.host);
    }

    // Restore user's recv timeout (0 = infinite for long queries)
    SetSocketTimeout(socket->fd(), options_.connection_recv_timeout);
}
```

### Option B: Reuse `connection_connect_timeout` for handshake

Apply the existing `connection_connect_timeout` (default 5s) as a temporary `SO_RCVTIMEO` during `Handshake()` only. No new options needed, but conflates two different operations.

## Impact

- **Severity**: Medium-High (denial of service / thread hang)
- **Attack vector**: Network (MITM, rogue server, network fault)
- **User impact**: Client thread hangs permanently, no recovery possible without process kill
- **Workaround**: None that doesn't also break long-running queries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocking socket during handshake allows denial of service via partial TCP message #486

Blocking socket during handshake allows denial of service via partial TCP message

Summary

Environment

Root Cause

Why `SetConnectionRecvTimeout` is not a viable workaround

Concrete attack / failure scenarios

Scenario 1: Single continuation byte (1 byte total)

Scenario 2: Valid packet type + truncated string length (2 bytes total)

Scenario 3: Valid header + truncated string body (7 bytes total)

Scenario 4: Network fault mid-handshake

Affected code path

Proposed fix

Option A: New `connection_handshake_timeout` option (recommended)

Option B: Reuse `connection_connect_timeout` for handshake

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Blocking socket during handshake allows denial of service via partial TCP message #486

Description

Blocking socket during handshake allows denial of service via partial TCP message

Summary

Environment

Root Cause

Why SetConnectionRecvTimeout is not a viable workaround

Concrete attack / failure scenarios

Scenario 1: Single continuation byte (1 byte total)

Scenario 2: Valid packet type + truncated string length (2 bytes total)

Scenario 3: Valid header + truncated string body (7 bytes total)

Scenario 4: Network fault mid-handshake

Affected code path

Proposed fix

Option A: New connection_handshake_timeout option (recommended)

Option B: Reuse connection_connect_timeout for handshake

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why `SetConnectionRecvTimeout` is not a viable workaround

Option A: New `connection_handshake_timeout` option (recommended)

Option B: Reuse `connection_connect_timeout` for handshake