Skip to content

add some basic doc re: kopia incremental backup sizes#2161

Open
weshayutin wants to merge 1 commit intoopenshift:oadp-devfrom
weshayutin:kopia_incremental
Open

add some basic doc re: kopia incremental backup sizes#2161
weshayutin wants to merge 1 commit intoopenshift:oadp-devfrom
weshayutin:kopia_incremental

Conversation

@weshayutin
Copy link
Copy Markdown
Contributor

@weshayutin weshayutin commented Apr 13, 2026

Why the changes were made

Need some written guidance w/ what to expect re: backup size and incremental backups.

How to test the changes made

read and test w/ dm backups and oc get dataupload

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guide explaining Kopia's incremental backup mechanism, including scenarios such as unchanged files (no transfer), partial edits (efficient transfer), full rewrites, and duplicate file deduplication.
    • Updated troubleshooting documentation with link to the new incremental backups guide.

Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
@openshift-ci openshift-ci bot requested review from mpryc and sseago April 13, 2026 14:48
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: weshayutin
Once this PR has been reviewed and has the lgtm label, please assign shawn-hurley for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

Walkthrough

Added comprehensive documentation explaining Kopia's library-level incremental backup mechanism, covering content-defined chunking with Buzhash algorithm, chunk deduplication via content-addressed storage, and user-facing backup scenarios. Updated troubleshooting guide with a reference link to the new documentation.

Changes

Cohort / File(s) Summary
New Incremental Backup Documentation
docs/kopia-incremental.md
New comprehensive documentation describing content-defined chunking (CDC) using DYNAMIC-4M-BUZHASH, chunk identification via BLAKE2B-256-128 hashing, deduplication logic, and user scenarios (unchanged files, partial edits, full rewrites, duplicates). Includes object model overview of library layers.
Updated Reference Documentation
docs/kopia_troubleshooting.md
Added reference link to new incremental backup documentation in the "Documentation" section.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely describes the main change: adding basic documentation about Kopia incremental backup sizes, which matches the changeset of adding a new doc file and updating references.
Description check ✅ Passed The description follows the required template with both sections completed: it explains the motivation (guidance on backup sizes and incremental behavior) and provides testing instructions (read docs, test with dm backups and oc get dataupload).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed PR modifies only markdown documentation files and does not add, modify, or remove any Ginkgo test files or test code.
Test Structure And Quality ✅ Passed PR only modifies documentation files; no Ginkgo test code to review against quality criteria.
Microshift Test Compatibility ✅ Passed This pull request adds only documentation files with no new Ginkgo e2e tests being introduced. The check passes because there are no test definitions to evaluate.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR contains only documentation changes with no new Ginkgo e2e tests added, so the custom check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed The PR contains only documentation additions with no deployment manifests, operator code, or controller modifications.
Ote Binary Stdout Contract ✅ Passed The OTE Binary Stdout Contract check applies to executable process-level code that could corrupt JSON communication with openshift-tests. This pull request contains only markdown documentation files with no executable code or stdout-writing logic, so the check passes.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR contains only documentation file additions; no Ginkgo e2e test code patterns found, so check applicability condition is not met.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docs/kopia-incremental.md (2)

18-35: Pin code references to immutable upstream permalinks.

repo/content/content_manager.go and inline snippets may drift over time. Add a commit-pinned GitHub permalink (or explicit Kopia version note) so readers can verify behavior against a stable source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/kopia-incremental.md` around lines 18 - 35, The documentation references
live source snippets (e.g., IDFromHash, bm.hashData,
bm.getContentInfoReadLocked, bm.addToPackUnlocked and variables like hashOutput
and bi) which can drift; update the doc to include an immutable upstream
reference by adding a commit-pinned GitHub permalink (or explicit Kopia version
tag) pointing to the exact commit/tag in repo/content/content_manager.go that
corresponds to the shown code, and annotate the snippet with that permalink or
version note so readers can verify behavior against the stable source.

7-12: Use probabilistic and version-dependent language for CDC behavior and expected transfer sizes.

Lines 7–12 and 59–65 currently state CDC behavior and transfer outcomes as absolute facts (e.g., "The default algorithm is DYNAMIC-4M-BUZHASH, which means," "~4–8 MB transferred," "re-synchronizes within ~64 bytes"). However, official Kopia documentation contains no explicit guarantees for exact boundary behavior or incremental transfer sizes. CDC outcomes depend on content layout, policy version, and configuration; these should be described as "typical," "common," or "expected" rather than deterministic.

Consider softening the language in both locations:

  • Replace "The default algorithm is…which means" with "typical defaults often use…which typically means"
  • Replace absolute transfer-size claims with ranges like "commonly around one to a few chunks, not necessarily the full file"
  • Soften technical specifics (e.g., "re-synchronizes within ~64 bytes") to "boundaries stabilize quickly after the modified region"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/kopia-incremental.md` around lines 7 - 12, The docs currently state
absolute guarantees (e.g., "DYNAMIC-4M-BUZHASH", "Buzhash32", "~4–8 MB
transferred", "re-synchronizes within ~64 bytes"); change these to
probabilistic, version-dependent wording: replace lines referencing "The default
algorithm is DYNAMIC-4M-BUZHASH" with language like "typical defaults often use
DYNAMIC-4M-BUZHASH," change exact transfer-size claims ("~4–8 MB transferred",
"commonly around one to a few chunks") to "commonly" or "typically" ranges and
note dependence on content, policy version, and configuration, and soften
deterministic statements such as "re-synchronizes within ~64 bytes" to
"boundaries typically stabilize quickly after the modified region" while
preserving the mentions of Buzhash32 and rolling-window behavior for context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/kopia-incremental.md`:
- Around line 18-35: The documentation references live source snippets (e.g.,
IDFromHash, bm.hashData, bm.getContentInfoReadLocked, bm.addToPackUnlocked and
variables like hashOutput and bi) which can drift; update the doc to include an
immutable upstream reference by adding a commit-pinned GitHub permalink (or
explicit Kopia version tag) pointing to the exact commit/tag in
repo/content/content_manager.go that corresponds to the shown code, and annotate
the snippet with that permalink or version note so readers can verify behavior
against the stable source.
- Around line 7-12: The docs currently state absolute guarantees (e.g.,
"DYNAMIC-4M-BUZHASH", "Buzhash32", "~4–8 MB transferred", "re-synchronizes
within ~64 bytes"); change these to probabilistic, version-dependent wording:
replace lines referencing "The default algorithm is DYNAMIC-4M-BUZHASH" with
language like "typical defaults often use DYNAMIC-4M-BUZHASH," change exact
transfer-size claims ("~4–8 MB transferred", "commonly around one to a few
chunks") to "commonly" or "typically" ranges and note dependence on content,
policy version, and configuration, and soften deterministic statements such as
"re-synchronizes within ~64 bytes" to "boundaries typically stabilize quickly
after the modified region" while preserving the mentions of Buzhash32 and
rolling-window behavior for context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f0653f5a-e2a8-41bd-a884-349408195c12

📥 Commits

Reviewing files that changed from the base of the PR and between 7e53a85 and d17bb58.

📒 Files selected for processing (2)
  • docs/kopia-incremental.md
  • docs/kopia_troubleshooting.md

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 13, 2026

@weshayutin: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.23-e2e-test-aws d17bb58 link false /test 4.23-e2e-test-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant