Skip to content

Blocking socket during handshake allows denial of service via partial TCP message #486

@jyberg

Description

@jyberg

Blocking socket during handshake allows denial of service via partial TCP message

Summary

The clickhouse-cpp client uses blocking sockets with no receive timeout during the TCP handshake phase. A malicious server, man-in-the-middle, or network fault that delivers a partial handshake response can cause the client thread to hang indefinitely on recv(), effectively denying service.

Environment

  • Library: clickhouse-cpp (all current versions)
  • Affected platforms: All (Linux, macOS, Windows)
  • Affected component: Client::Impl::Handshake()ReceiveHello()

Root Cause

Three factors combine to create the vulnerability:

  1. connection_recv_timeout defaults to 0 (client.h, line 120), which means SO_RCVTIMEO is never effectively set — recv() blocks forever.

  2. Socket is switched to blocking mode after connect (socket.cpp, SocketConnect()): SetNonBlock(*s, false) is called once the connection succeeds.

  3. ReceiveHello() performs multiple sequential blocking reads (client.cpp, ReceiveHello()): 8 WireFormat::Read*() calls, each of which can individually block on recv() if data stops arriving mid-message.

Why SetConnectionRecvTimeout is not a viable workaround

SO_RCVTIMEO is set once on the socket during SocketConnect() and applies to all recv() calls for the socket's entire lifetime. Setting it to a short value (e.g. 10s) would protect the handshake, but would also kill long-running queries — a SELECT that takes minutes or hours to compute on the server side would timeout on the client before any data arrives.

There is currently no way to set a different timeout for the handshake vs. query execution.

Concrete attack / failure scenarios

Any of the following byte sequences sent by a rogue server (or injected by a MITM) after the client sends its Hello packet will hang the client thread forever:

Scenario 1: Single continuation byte (1 byte total)

0x80

The ReadVarint64() loop reads byte 0x80: continuation bit is set (meaning "more bytes follow"), value bits = 0. It loops to call ReadByte() again → recv()blocks forever.

Scenario 2: Valid packet type + truncated string length (2 bytes total)

0x00 0x85
  • 0x00 = varint 0 = ServerCodes::Hello
  • 0x85 = start of string length varint, continuation bit set → ReadByte()recv()blocks forever

Scenario 3: Valid header + truncated string body (7 bytes total)

0x00 0x0A 'C' 'l' 'i' 'c' 'k'
  • 0x00 = ServerCodes::Hello
  • 0x0A = string length 10
  • 5 of 10 expected bytes arrive

ReadAll() reads the 5 available bytes, loops back for the remaining 5 → recv()blocks forever.

Scenario 4: Network fault mid-handshake

Server sends a valid Hello response, but TCP connection drops (RST lost, half-open state) between any two of the 8 read operations in ReceiveHello(). With no SO_RCVTIMEO, the client hangs indefinitely. Even TCP keepalive (disabled by default, 75s detection time when enabled) doesn't help against a server that keeps the connection open but stops sending.

Affected code path

Client::Impl::ResetConnection()
└─ Handshake()                              # client.cpp
   ├─ SendHello()                           # succeeds
   └─ ReceiveHello()                        # client.cpp:1065
      ├─ ReadVarint64 (packet_type)         ← BLOCK POINT 1
      ├─ ReadString   (server_name)         ← BLOCK POINT 2 (varint len + body)
      ├─ ReadUInt64   (version_major)       ← BLOCK POINT 3
      ├─ ReadUInt64   (version_minor)       ← BLOCK POINT 4
      ├─ ReadUInt64   (revision)            ← BLOCK POINT 5
      ├─ ReadString   (timezone)            ← BLOCK POINT 6
      ├─ ReadString   (display_name)        ← BLOCK POINT 7
      └─ ReadUInt64   (version_patch)       ← BLOCK POINT 8

Each Read* call eventually reaches:

WireFormat::ReadVarint64 / ReadAll
  → BufferedInput::DoNext
    → SocketInput::DoRead
      → ::recv(fd, buf, len, 0)   // blocks forever when SO_RCVTIMEO=0

Proposed fix

Add a handshake-scoped timeout that is independent of the query recv timeout. Two options:

Option A: New connection_handshake_timeout option (recommended)

Add a dedicated option with a safe default:

DECLARE_FIELD(connection_handshake_timeout, std::chrono::milliseconds,
              SetConnectionHandshakeTimeout, std::chrono::seconds(30));

In ResetConnection(), temporarily apply it to the socket before Handshake(), then restore the user's connection_recv_timeout after:

void Client::Impl::ResetConnection() {
    auto socket = socket_factory_->connect(options_, current_endpoint_.value());

    // Apply handshake timeout
    SetSocketTimeout(socket->fd(), options_.connection_handshake_timeout);

    InitializeStreams(std::move(socket));
    inserting_ = false;

    if (!Handshake()) {
        throw ProtocolError("fail to connect to " + options_.host);
    }

    // Restore user's recv timeout (0 = infinite for long queries)
    SetSocketTimeout(socket->fd(), options_.connection_recv_timeout);
}

Option B: Reuse connection_connect_timeout for handshake

Apply the existing connection_connect_timeout (default 5s) as a temporary SO_RCVTIMEO during Handshake() only. No new options needed, but conflates two different operations.

Impact

  • Severity: Medium-High (denial of service / thread hang)
  • Attack vector: Network (MITM, rogue server, network fault)
  • User impact: Client thread hangs permanently, no recovery possible without process kill
  • Workaround: None that doesn't also break long-running queries

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions