Skip to content

Fix UB in ColumnStringBlock::AppendUnsafe when appending empty string_view#489

Open
iliaal wants to merge 1 commit intoClickHouse:masterfrom
iliaal:fix-empty-stringview-memcpy-ub
Open

Fix UB in ColumnStringBlock::AppendUnsafe when appending empty string_view#489
iliaal wants to merge 1 commit intoClickHouse:masterfrom
iliaal:fix-empty-stringview-memcpy-ub

Conversation

@iliaal
Copy link
Copy Markdown

@iliaal iliaal commented Apr 25, 2026

Summary

ColumnStringBlock::AppendUnsafe calls memcpy(pos, str.data(), str.size()) without short-circuiting on str.size() == 0. When the input string_view was built from an empty std::string, str.data() can be nullptr, and glibc declares memcpy's source pointer with __attribute__((nonnull)) regardless of the size argument. UBSan flags every empty-string append:

runtime error: null pointer passed as argument 2, which is declared to never be null
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clickhouse/columns/string.cpp:139:21

The bug is benign in practice because libc no-ops memcpy(_, NULL, 0), but the false positive fires on any workload that round-trips empty values through String or LowCardinality(String) columns. That makes -fsanitize=undefined builds noisy and hides real findings.

Patch

One-line guard around the memcpy. Behavior is unchanged for non-empty inputs, and the return value matches the previous shape: a string_view pointing at pos with length 0 for empty inputs.

Repro

I hit this in a downstream PHP extension that wraps the library (iliaal/php_clickhouse). Inserting an empty value into a LowCardinality(String) column under -fsanitize=address,undefined reproduces it on every read path. Happy to add a unit test in ut/ if you'd prefer it gated by make test rather than relying on existing coverage.

memcpy's source pointer is declared nonnull regardless of the size
argument. When AppendUnsafe is called with an empty string_view that
was constructed from an empty std::string, str.data() can be null
(the standard allows it), so UBSan flags every empty-string append:

  runtime error: null pointer passed as argument 2,
    which is declared to never be null
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
    clickhouse/columns/string.cpp:139:21

Guard the memcpy with a zero-size short-circuit. Behavior is
unchanged for non-empty inputs and the returned string_view still
points at pos with length 0 for empty inputs.
@iliaal iliaal requested review from mzitnik and slabko as code owners April 25, 2026 21:06
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 25, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants