add some basic doc re: kopia incremental backup sizes#2161
add some basic doc re: kopia incremental backup sizes#2161weshayutin wants to merge 1 commit intoopenshift:oadp-devfrom
Conversation
Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: weshayutin The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughAdded comprehensive documentation explaining Kopia's library-level incremental backup mechanism, covering content-defined chunking with Buzhash algorithm, chunk deduplication via content-addressed storage, and user-facing backup scenarios. Updated troubleshooting guide with a reference link to the new documentation. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
docs/kopia-incremental.md (2)
18-35: Pin code references to immutable upstream permalinks.
repo/content/content_manager.goand inline snippets may drift over time. Add a commit-pinned GitHub permalink (or explicit Kopia version note) so readers can verify behavior against a stable source.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/kopia-incremental.md` around lines 18 - 35, The documentation references live source snippets (e.g., IDFromHash, bm.hashData, bm.getContentInfoReadLocked, bm.addToPackUnlocked and variables like hashOutput and bi) which can drift; update the doc to include an immutable upstream reference by adding a commit-pinned GitHub permalink (or explicit Kopia version tag) pointing to the exact commit/tag in repo/content/content_manager.go that corresponds to the shown code, and annotate the snippet with that permalink or version note so readers can verify behavior against the stable source.
7-12: Use probabilistic and version-dependent language for CDC behavior and expected transfer sizes.Lines 7–12 and 59–65 currently state CDC behavior and transfer outcomes as absolute facts (e.g., "The default algorithm is DYNAMIC-4M-BUZHASH, which means," "~4–8 MB transferred," "re-synchronizes within ~64 bytes"). However, official Kopia documentation contains no explicit guarantees for exact boundary behavior or incremental transfer sizes. CDC outcomes depend on content layout, policy version, and configuration; these should be described as "typical," "common," or "expected" rather than deterministic.
Consider softening the language in both locations:
- Replace "The default algorithm is…which means" with "typical defaults often use…which typically means"
- Replace absolute transfer-size claims with ranges like "commonly around one to a few chunks, not necessarily the full file"
- Soften technical specifics (e.g., "re-synchronizes within ~64 bytes") to "boundaries stabilize quickly after the modified region"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/kopia-incremental.md` around lines 7 - 12, The docs currently state absolute guarantees (e.g., "DYNAMIC-4M-BUZHASH", "Buzhash32", "~4–8 MB transferred", "re-synchronizes within ~64 bytes"); change these to probabilistic, version-dependent wording: replace lines referencing "The default algorithm is DYNAMIC-4M-BUZHASH" with language like "typical defaults often use DYNAMIC-4M-BUZHASH," change exact transfer-size claims ("~4–8 MB transferred", "commonly around one to a few chunks") to "commonly" or "typically" ranges and note dependence on content, policy version, and configuration, and soften deterministic statements such as "re-synchronizes within ~64 bytes" to "boundaries typically stabilize quickly after the modified region" while preserving the mentions of Buzhash32 and rolling-window behavior for context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docs/kopia-incremental.md`:
- Around line 18-35: The documentation references live source snippets (e.g.,
IDFromHash, bm.hashData, bm.getContentInfoReadLocked, bm.addToPackUnlocked and
variables like hashOutput and bi) which can drift; update the doc to include an
immutable upstream reference by adding a commit-pinned GitHub permalink (or
explicit Kopia version tag) pointing to the exact commit/tag in
repo/content/content_manager.go that corresponds to the shown code, and annotate
the snippet with that permalink or version note so readers can verify behavior
against the stable source.
- Around line 7-12: The docs currently state absolute guarantees (e.g.,
"DYNAMIC-4M-BUZHASH", "Buzhash32", "~4–8 MB transferred", "re-synchronizes
within ~64 bytes"); change these to probabilistic, version-dependent wording:
replace lines referencing "The default algorithm is DYNAMIC-4M-BUZHASH" with
language like "typical defaults often use DYNAMIC-4M-BUZHASH," change exact
transfer-size claims ("~4–8 MB transferred", "commonly around one to a few
chunks") to "commonly" or "typically" ranges and note dependence on content,
policy version, and configuration, and soften deterministic statements such as
"re-synchronizes within ~64 bytes" to "boundaries typically stabilize quickly
after the modified region" while preserving the mentions of Buzhash32 and
rolling-window behavior for context.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: f0653f5a-e2a8-41bd-a884-349408195c12
📒 Files selected for processing (2)
docs/kopia-incremental.mddocs/kopia_troubleshooting.md
|
@weshayutin: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Why the changes were made
Need some written guidance w/ what to expect re: backup size and incremental backups.
How to test the changes made
read and test w/ dm backups and
oc get datauploadSummary by CodeRabbit