perf(push_down_filter): skip filter revalidation#21667
perf(push_down_filter): skip filter revalidation#21667kumarUjjawal wants to merge 2 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
@kumarUjjawal
Thanks for working on this.
I have one blocking concern around making the unchecked Filter constructor public, plus one small cleanup suggestion.
| /// that are already known to be valid filter expressions. Like | ||
| /// [`Self::try_new`], this removes nested aliases from the predicate. | ||
| #[doc(hidden)] | ||
| pub fn new_unchecked(predicate: Expr, input: Arc<LogicalPlan>) -> Self { |
There was a problem hiding this comment.
I am a bit concerned about making new_unchecked public here. #[doc(hidden)] keeps it out of the generated docs, but it is still part of the public API and can be called by external users.
Even though Filter has public fields today, adding a named public unchecked constructor makes it much easier and more discoverable to bypass the boolean-type validation. Could this stay pub(crate), or use another crate-private visibility, so the unchecked path remains scoped to optimizer internals?
| pub fn make_filter(predicate: Expr, input: Arc<LogicalPlan>) -> Result<LogicalPlan> { | ||
| Filter::try_new(predicate, input).map(LogicalPlan::Filter) | ||
| /// Creates a filter node without re-validating predicate type. | ||
| fn make_filter(predicate: Expr, input: Arc<LogicalPlan>) -> Result<LogicalPlan> { |
There was a problem hiding this comment.
make_filter() no longer seems to have a failure path, but it still returns Result<LogicalPlan>. That makes the call sites read as though validation or another fallible operation is still happening here.
Would it be clearer for this helper to return LogicalPlan, or maybe Filter, directly and keep Result only around the rewrite steps that can actually fail?
|
Thanks @kosiew both points looks good, I have made the changes. Can you run the benchmark for this. |
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing fix/slow_push_down_filter (84b9621) to cc4717a (merge-base) diff File an issue against this benchmark runner |
|
run benchmark push_down_filter |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing fix/slow_push_down_filter (84b9621) to cc4717a (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#21667 (comment)).
File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
@kumarUjjawal Can you investigate further? |
|
Thanks @kosiew I will look into it |
|
I don't have bandwidth at the moment for this, i'm going to mark as draft for now. |
Which issue does this PR close?
Rationale for this change
PushDownFilterrebuilds manyFilternodes on very large plans. Each rebuild usedFilter::try_new, which re-ran predicate typevalidation. On thesql_planner_extendedbenchmark, that repeated workwas a big part of planning time.This change adds an unchecked filter constructor for internal optimizer use and uses it inside
PushDownFilterwhen the predicate is already a known-valid filter expression.What changes are included in this PR?
Filter::new_uncheckedfor internal optimizer useFilter::try_newPushDownFilterto use the unchecked path when rebuildingfilters
behavior
Are these changes tested?
Yes
Are there any user-facing changes?
No