Skip to content

Backfill fixes and edges#2088

Open
newren wants to merge 3 commits intogitgitgadget:masterfrom
newren:backfill-fixes-and-edges
Open

Backfill fixes and edges#2088
newren wants to merge 3 commits intogitgitgadget:masterfrom
newren:backfill-fixes-and-edges

Conversation

@newren
Copy link
Copy Markdown

@newren newren commented Apr 15, 2026

This topic fixes a few minor issues in git backfill (from ds/backfill-revs this cycle), although some might see the third patch as more feature than fix, and the first two patches are pretty minor and probably do not merit consideration before the release this late in the cycle.

Overview:

  • Patch 1: As a wise man once said, "Sending arbitrary command-line arguments to setup_revisions() creates an opportunity for behavior you are not expecting. For instance, can users...supply --first-parent? What happens if they add an --author filter?" ;-) I think these particular cases might work, but other rev-list options don't make sense, so let's error on ones that don't.
  • Patch 2: Making documentation more consistent with other commands
  • Patch 3: Tweak the ranges so we actually prevent on-demand blob downloading better with a new --[no-]include-edges flag.

cc: Derrick Stolee stolee@gmail.com

newren added 3 commits April 15, 2026 11:33
Some rev-list options accepted by setup_revisions() are silently
ignored or actively counterproductive when used with 'git backfill',
because the path-walk API has its own tree-walking logic that bypasses
the mechanisms these options rely on:

  * -S/-G (pickaxe) and --diff-filter work by computing per-commit
    diffs in get_revision_1() and filtering commits whose diffs don't
    match.  Since backfill's goal is to download all blobs reachable
    from commits in the range, filtering out commits based on diff
    content would silently skip blobs -- the opposite of what users
    want.

  * --follow disables path pruning (revs->prune) and only makes
    sense for tracking a single file through renames in log output.
    It has no useful interaction with backfill.

  * -L (line-log) computes line-level diffs to track the evolution
    of a function or line range.  Like pickaxe, it filters commits
    based on diff content, which would cause blobs to be silently
    skipped.

  * --diff-merges controls how merge commit diffs are displayed.
    The path-walk API walks trees directly and never computes
    per-commit diffs, so this option would be silently ignored.

  * --filter (object filtering, e.g. --filter=blob:none) is used by
    the list-objects traversal but is completely ignored by the
    path-walk API, so it would silently do nothing.

Rather than letting users think these options are being honored,
reject them with a clear error message.

Signed-off-by: Elijah Newren <newren@gmail.com>
302aff0 (backfill: accept revision arguments, 2026-03-26) added
support for passing revision arguments to 'git backfill' but documented
them only with a prose sentence:

    You may also specify the commit limiting options from
    git-rev-list(1).

No other command that accepts revision arguments documents them this
way.  Commands like log, shortlog, and replay define a formal
<revision-range> entry and include rev-list-options.adoc.  Commands like
bundle, fast-export, and filter-branch, which pass arguments through to
the revision machinery without including the full options file, still
define a formal <git-rev-list-args> entry explaining what is accepted.

Add a formal <revision-range> entry in the synopsis and OPTIONS section,
following the convention used by other commands, and mention that
commit-limiting options from git-rev-list(1) are also accepted.

Signed-off-by: Elijah Newren <newren@gmail.com>
Commit 302aff0 (backfill: accept revision arguments, 2026-03-26) added
support for accepting revision arguments to backfill.  This allows users
to do things like

   git backfill --remotes ^v2.3.0

and then run many commands without triggering on-demand downloads of
blobs.  However, if they have topics based on v2.3.0, they will likely
still trigger on-demand downloads.  Consider, for example, the command

   git log -p v2.3.0..topic

This would still trigger on-demand blob loadings after the backfill
command above, because the commit(s) with A as a parent will need to
diff against the blobs in A.  In fact, multiple commands need blobs from
the lower boundary of the revision range:

   * git log -p A..B                # After backfill A..B
   * git replay --onto TARGET A..B  # After backfill TARGET^! A..B
   * git checkout A && git merge B  # After backfill A...B

Add an extra --[no-]include-edges flag to allow grabbing blobs from
edge commits.  Since the point of backfill is to prevent on-demand blob
loading and these are common commands, default to --include-edges.

Signed-off-by: Elijah Newren <newren@gmail.com>
@newren
Copy link
Copy Markdown
Author

newren commented Apr 15, 2026

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 15, 2026

Submitted as pull.2088.git.1776297482.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2088/newren/backfill-fixes-and-edges-v1

To fetch this version to local tag pr-2088/newren/backfill-fixes-and-edges-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2088/newren/backfill-fixes-and-edges-v1

Comment thread builtin/backfill.c
@@ -78,6 +78,28 @@ static int fill_missing_blobs(const char *path UNUSED,
return 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 4/15/2026 7:58 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Some rev-list options accepted by setup_revisions() are silently
> ignored or actively counterproductive when used with 'git backfill',
> because the path-walk API has its own tree-walking logic that bypasses
> the mechanisms these options rely on:
> 
>   * -S/-G (pickaxe) and --diff-filter work by computing per-commit
>     diffs in get_revision_1() and filtering commits whose diffs don't
>     match.  Since backfill's goal is to download all blobs reachable
>     from commits in the range, filtering out commits based on diff
>     content would silently skip blobs -- the opposite of what users
>     want.
> 
>   * --follow disables path pruning (revs->prune) and only makes
>     sense for tracking a single file through renames in log output.
>     It has no useful interaction with backfill.
> 
>   * -L (line-log) computes line-level diffs to track the evolution
>     of a function or line range.  Like pickaxe, it filters commits
>     based on diff content, which would cause blobs to be silently
>     skipped.

I think these make a lot of sense, especially because these
computations require downloading missing blobs in order to find
the diffs that justify some of the choices of commit filtering.
 
>   * --diff-merges controls how merge commit diffs are displayed.
>     The path-walk API walks trees directly and never computes
>     per-commit diffs, so this option would be silently ignored.

I think there are a few other "format" based options that were
silently ignored on purpose, because there's no output. Perhaps
we should change the use of options like this to a warning instead
of a failure?

>   * --filter (object filtering, e.g. --filter=blob:none) is used by
>     the list-objects traversal but is completely ignored by the
>     path-walk API, so it would silently do nothing.

This is correct to remove because while it doesn't work with
path-walk right now, it might in the future. We don't want the
filter to mess with the functionality of 'git backfill' that sets
its own scope for which blobs to download.

> Rather than letting users think these options are being honored,
> reject them with a clear error message.

I agree that the majority of these should be hard failures. As
mentioned, some could be soft warnings. That could be an
adjustment to make in the future, so is not blocking for this
patch.

> +static void reject_unsupported_rev_list_options(struct rev_info *revs)
> +{
> +	if (revs->diffopt.pickaxe)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    (revs->diffopt.pickaxe_opts & DIFF_PICKAXE_REGEX) ? "-G" : "-S");
> +	if (revs->diffopt.filter || revs->diffopt.filter_not)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "--diff-filter");
> +	if (revs->diffopt.flags.follow_renames)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "--follow");
> +	if (revs->line_level_traverse)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "-L");
> +	if (revs->explicit_diff_merges)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "--diff-merges");
> +	if (revs->filter.choice)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "--filter");
> +}
> +

My only nit-pick suggestion is to make the translated string a
macro so it can be more obvious that it is repeated exactly.

Thanks,
-Stolee

@@ -9,7 +9,7 @@ git-backfill - Download missing objects in a partial clone
SYNOPSIS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 4/15/2026 7:58 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> 302aff09223f (backfill: accept revision arguments, 2026-03-26) added
> support for passing revision arguments to 'git backfill' but documented
> them only with a prose sentence:
> 
>     You may also specify the commit limiting options from
>     git-rev-list(1).
> 
> No other command that accepts revision arguments documents them this
> way.  Commands like log, shortlog, and replay define a formal
> <revision-range> entry and include rev-list-options.adoc.  Commands like
> bundle, fast-export, and filter-branch, which pass arguments through to
> the revision machinery without including the full options file, still
> define a formal <git-rev-list-args> entry explaining what is accepted.
> 
> Add a formal <revision-range> entry in the synopsis and OPTIONS section,
> following the convention used by other commands, and mention that
> commit-limiting options from git-rev-list(1) are also accepted.

Thanks for your attention to detail here. I like this version.

-Stolee

@@ -9,7 +9,7 @@ git-backfill - Download missing objects in a partial clone
SYNOPSIS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 4/15/2026 7:58 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>

> Add an extra --[no-]include-edges flag to allow grabbing blobs from
> edge commits.  Since the point of backfill is to prevent on-demand blob
> loading and these are common commands, default to --include-edges.

I like this option and your motivation for including it.

> @@ -116,6 +117,8 @@ static int do_backfill(struct backfill_context *ctx)
>  	/* Walk from HEAD if otherwise unspecified. */
>  	if (!ctx->revs.pending.nr)
>  		add_head_to_pending(&ctx->revs);
> +	if (ctx->include_edges)
> +		ctx->revs.edge_hint = 1;

This would still work if...

>  		.revs = REV_INFO_INIT,
> +		.include_edges = 1,

...this was initialized to -1 to allow for "no user option".

We don't need this change unless we were deciding to make a
config option that specified a different default. That seems
like overkill right now, so this doesn't need a change. Just
something that I like to think about.

I also like how your tests don't just verify the backfill
behavior but the ultimate behavior of 'git log' and friends
after the fact.

Thanks,
-Stolee

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 16, 2026

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 4/15/2026 7:57 PM, Elijah Newren via GitGitGadget wrote:
> This topic fixes a few minor issues in git backfill (from ds/backfill-revs
> this cycle), although some might see the third patch as more feature than
> fix, and the first two patches are pretty minor and probably do not merit
> consideration before the release this late in the cycle.
> 
> Overview:
> 
>  * Patch 1: As a wise man once said, "Sending arbitrary command-line
>    arguments to setup_revisions() creates an opportunity for behavior you
>    are not expecting. For instance, can users...supply --first-parent? What
>    happens if they add an --author filter?" ;-) I think these particular
>    cases might work, but other rev-list options don't make sense, so let's
>    error on ones that don't.

I know that --first-parent was one of the options I _did_ want to
include as a potential option (it helps focus the set to a "core" of
commits and we can get more on-demand off the core if needed). Yes,
--author is a little silly, but it didn't seem necessary to block it.

I agree with the reasons you gave to block _most_ of the options you
blocked. The output-formatting options don't need to be a hard failure,
but that could be a later improvement. For now, I think your change is
entirely positive so doesn't need change.

>  * Patch 2: Making documentation more consistent with other commands
>  * Patch 3: Tweak the ranges so we actually prevent on-demand blob
>    downloading better with a new --[no-]include-edges flag.

I gave notes on every patch, but no meaningful changes are required.

Thanks for helping to polish this feature!
-Stolee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant