Skip to content

Add lambda support and array_transform udf#21679

Open
gstvg wants to merge 82 commits intoapache:mainfrom
gstvg:lambda_and_array_transform
Open

Add lambda support and array_transform udf#21679
gstvg wants to merge 82 commits intoapache:mainfrom
gstvg:lambda_and_array_transform

Conversation

@gstvg
Copy link
Copy Markdown
Contributor

@gstvg gstvg commented Apr 16, 2026

This a clean version of #18921 to make it easier to review

this is a breaking change due to adding variant to Expr enum, new methods on traits Session, FunctionRegistry and ContextProvider and a new arg on TaskContext::new

This PR adds support for lambdas with column capture and the array_transform function used to test the lambda implementation.

Example usage:

SELECT array_transform([2, 3], v -> v != 2);

[false, true]

-- arbitrally nested lambdas are also supported
SELECT array_transform([[[2, 3]]], m -> array_transform(m, l -> array_transform(l, v -> v*2)));

[[[4, 6]]]

Note: column capture has been removed for now and will be added on a follow on PR, see #21172

Some comments on code snippets of this doc show what value each struct, variant or field would hold after planning the first example above. Some literals are simplified pseudo code

3 new Expr variants are added, HigherOrderFunction, owing a new trait HigherOrderUDF, which is like a ScalarFunction/ScalarUDFImpl with support for lambdas, Lambda, for the lambda body and it's parameters names, and LambdaVariable, which is like Column but for lambdas parameters.

Their logical representations:

enum Expr {
    // array_transform([2, 3], v -> v != 2)
    HigherOrderFunction(HigherOrderFunction),
    // v -> v != 2
    Lambda(Lambda),
    // v, of the lambda body: v != 2
    LambdaVariable(LambdaVariable),
   ...
}

// array_transform([2, 3], v -> v != 2)
struct HigherOrderFunction {
    // global instance of array_transform
    pub func: Arc<dyn HigherOrderUDF>,
    // [Expr::ScalarValue([2, 3]), Expr::Lambda(v -> v != 2)]
    pub args: Vec<Expr>,
}

// v -> v != 2
struct Lambda {
    // ["v"]
    pub params: Vec<String>,
    // v != 2
    pub body: Box<Expr>,
}

// v, of the lambda body: v != 2
struct LambdaVariable {
    // "v"
    pub name: String,
    // Field::new("", DataType::Int32, false) 
    // Note: a follow on PR will make this field optional
    // to free expr_api from specifying it beforehand, 
    // and add resolve_lambda_variables method to Expr,
    // similar to Expr::Placeholder, see #21172
    pub field: FieldRef, 
    pub spans: Spans,
}

The example would be planned into a tree like this:

HigherOrderFunctionExpression
  name: array_transform
  children:
    1. ListExpression [2,3]
    2. LambdaExpression
         parameters: ["v"]
         body:
            BinaryExpression (!=)
              left:
                 LambdaVariableExpression("v", Field::new("", Int32, false))
              right:
                 LiteralExpression("2")

The physical counterparts definition:

struct HigherOrderFunctionExpr {
    // global instance of array_transform
    fun: Arc<dyn HigherOrderUDF>,
    // "array_transform"
    name: String,
    // [LiteralExpr([2, 3], LambdaExpr("v -> v != 2"))]
    args: Vec<Arc<dyn PhysicalExpr>>,
    // [1], the positions at args that contains lambdas
    lambda_positions: Vec<usize>,
    // Field::new("", DataType::new_list(DataType::Boolean, false), false)
    return_field: FieldRef,
    config_options: Arc<ConfigOptions>, 
}


struct LambdaExpr {
    // ["v"]
    params: Vec<String>,
    // v -> v != 2
    body: Arc<dyn PhysicalExpr>,
}

struct LambdaVariable {
    // Field::new("v", DataType::Int32, false)
    field: FieldRef,
    // 0, the first and only parameter, "v"
    index: usize,
}

Note: For those who primarly wants to check if this lambda implementation supports their usecase and don't want to spend much time here, it's okay to skip most collapsed blocks, as those serve mostly to help code reviewers, with the exception of HigherOrderUDF and the array_transform implementation of HigherOrderUDF relevant methods, collapsed due to their size

The added HigherOrderUDF trait is almost a clone of ScalarUDFImpl, with the exception of:

  1. return_field_from_args and invoke_with_args, where now args.args is a list of enums with two variants: Value or Lambda instead of a list of values
  2. the addition of lambda_parameters, which return a Field for each parameter supported for every lambda argument based on the Field of the non lambda arguments
  3. the removal of return_field and the deprecated ones is_nullable and display_name.
  4. Not yet includes analogues to the methods preimage, placement, evaluate_bounds, propagate_constraints, output_ordering and preserves_lex_ordering
HigherOrderUDF
trait HigherOrderUDF {
    /// Return the field of all the parameters supported by all the supported lambdas of this function
    /// based on the field of the value arguments. If a lambda support multiple parameters, or if multiple
    /// lambdas are supported and some are optional, all should be returned,
    /// regardless of whether they are used on a particular invocation
    ///
    /// Tip: If you have a [`HigherOrderFunction`] invocation, you can call the helper
    /// [`HigherOrderFunction::lambda_parameters`] instead of this method directly
    ///
    /// [`HigherOrderFunction`]: crate::expr::HigherOrderFunction
    /// [`HigherOrderFunction::lambda_parameters`]: crate::expr::HigherOrderFunction::lambda_parameters
    ///
    /// Example for array_transform:
    ///
    /// `array_transform([2.0, 8.0], v -> v > 4.0)`
    ///
    /// ```ignore
    /// let lambda_parameters = array_transform.lambda_parameters(&[
    ///      Arc::new(Field::new("", DataType::new_list(DataType::Float32, false))), // the Field of the literal `[2, 8]`
    /// ])?;
    ///
    /// assert_eq!(
    ///      lambda_parameters,
    ///      vec![
    ///         // the lambda supported parameters, regardless of how many are actually used
    ///         vec![
    ///             // the value being transformed
    ///             Field::new("", DataType::Float32, false),
    ///         ]
    ///      ]
    /// )
    /// ```
    ///
    /// The implementation can assume that some other part of the code has coerced
    /// the actual argument types to match [`Self::signature`].
    fn lambda_parameters(&self, value_fields: &[FieldRef]) -> Result<Vec<Vec<Field>>>;
    fn return_field_from_args(&self, args: LambdaReturnFieldArgs) -> Result<FieldRef>;
    fn invoke_with_args(&self, args: HigherOrderFunctionArgs) -> Result<ColumnarValue>;
   // ... omitted methods that are similar in ScalarUDFImpl
}

/// An argument to a lambda function
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum ValueOrLambda<V, L> {
    /// A value with associated data
    Value(V),
    /// A lambda with associated data
    Lambda(L),
}

/// Information about arguments passed to the function
///
/// This structure contains metadata about how the function was called
/// such as the type of the arguments, any scalar arguments and if the
/// arguments can (ever) be null
///
/// See [`HigherOrderUDF::return_field_from_args`] for more information
#[derive(Clone, Debug)]
pub struct LambdaReturnFieldArgs<'a> {
    /// The data types of the arguments to the function
    ///
    /// If argument `i` to the function is a lambda, it will be the field of the result of the
    /// lambda if evaluated with the parameters returned from [`HigherOrderUDF::lambda_parameters`]
    ///
    /// For example, with `array_transform([1], v -> v == 5)`
    /// this field will be `[
    ///     ValueOrLambda::Value(Field::new("", DataType::List(DataType::Int32), false)),
    ///     ValueOrLambda::Lambda(Field::new("", DataType::Boolean, false))
    /// ]`
    pub arg_fields: &'a [ValueOrLambda<FieldRef, FieldRef>],
    /// Is argument `i` to the function a scalar (constant)?
    ///
    /// If the argument `i` is not a scalar, it will be None
    ///
    /// For example, if a function is called like `array_transform([1], v -> v == 5)`
    /// this field will be `[Some(ScalarValue::List(...), None]`
    pub scalar_arguments: &'a [Option<&'a ScalarValue>],
}

/// Arguments passed to [`HigherOrderUDF::invoke_with_args`] when invoking a
/// lambda function.
#[derive(Debug, Clone)]
pub struct HigherOrderFunctionArgs {
    /// The evaluated arguments and lambdas to the function
    pub args: Vec<ValueOrLambda<ColumnarValue, LambdaArgument>>,
    /// Field associated with each arg, if it exists
    /// For lambdas, it will be the field of the result of
    /// the lambda if evaluated with the parameters
    /// returned from [`HigherOrderUDF::lambda_parameters`]
    pub arg_fields: Vec<ValueOrLambda<FieldRef, FieldRef>>,
    /// The number of rows in record batch being evaluated
    pub number_rows: usize,
    /// The return field of the lambda function returned
    /// (from `return_field_from_args`) when creating the
    /// physical expression from the logical expression
    pub return_field: FieldRef,
    /// The config options at execution time
    pub config_options: Arc<ConfigOptions>,
}

/// A lambda argument to a HigherOrderFunction
#[derive(Clone, Debug)]
pub struct LambdaArgument {
    /// The parameters defined in this lambda
    ///
    /// For example, for `array_transform([2], v -> -v)`,
    /// this will be `vec![Field::new("v", DataType::Int32, true)]`
    params: Vec<FieldRef>,
    /// The body of the lambda
    ///
    /// For example, for `array_transform([2], v -> -v)`,
    /// this will be the physical expression of `-v`
    body: Arc<dyn PhysicalExpr>,
}

impl LambdaArgument {
    /// Evaluate this lambda
    /// `args` should evalute to the value of each parameter
    /// of the correspondent lambda returned in [HigherOrderUDF::lambda_parameters].
    pub fn evaluate(
        &self,
        args: &[&dyn Fn() -> Result<ArrayRef>],
    ) -> Result<ColumnarValue> {
        let columns = args
            .iter()
            .take(self.params.len())
            .map(|arg| arg())
            .collect::<Result<_>>()?;

        let schema = Arc::new(Schema::new(self.params.clone()));

        let batch = RecordBatch::try_new(schema, columns)?;

        self.body.evaluate(&batch)
    }
}
array_transform lambda_parameters implementation
impl HigherOrderUDF for ArrayTransform {
fn lambda_parameters(&self, value_fields: &[FieldRef]) -> Result<Vec<Vec<Field>>> {
        let list = if value_fields.len() == 1 {
            &value_fields[0]
        } else {
            return plan_err!(
                "{} function requires 1 value arguments, got {}",
                self.name(),
                value_fields.len()
            );
        };

        let field = match list.data_type() {
            DataType::List(field) => field,
            DataType::LargeList(field) => field,
            DataType::FixedSizeList(field, _) => field,
            _ => return plan_err!("expected list, got {list}"),
        };

        // we don't need to check whether the lambda contains more than two parameters,
        // e.g. array_transform([], (v, i, j) -> v+i+j), as datafusion will do that for us
        let value = Field::new("", field.data_type().clone(), field.is_nullable())
            .with_metadata(field.metadata().clone());

        Ok(vec![vec![value]])
    }
}
array_transform return_field_from_args implementation
fn value_lambda_pair<'a, V: Debug, L: Debug>(
    name: &str,
    args: &'a [ValueOrLambda<V, L>],
) -> Result<(&'a V, &'a L)> {
    let [value, lambda] = take_function_args(name, args)?;

    let (ValueOrLambda::Value(value), ValueOrLambda::Lambda(lambda)) = (value, lambda)
    else {
        return plan_err!(
            "{name} expects a value followed by a lambda, got {value:?} and {lambda:?}"
        );
    };

    Ok((value, lambda))
}

impl HigherOrderUDF for ArrayTransform {
    fn return_field_from_args(
        &self,
        args: HigherOrderReturnFieldArgs,
    ) -> Result<Arc<Field>> {
        let (list, lambda) = value_lambda_pair(self.name(), args.arg_fields)?;

        // lambda is the resulting field of executing the lambda body
        // with the parameters returned in lambda_parameters
        let field = Arc::new(Field::new(
            Field::LIST_FIELD_DEFAULT_NAME,
            lambda.data_type().clone(),
            lambda.is_nullable(),
        ));

        let return_type = match list.data_type() {
            DataType::List(_) => DataType::List(field),
            DataType::LargeList(_) => DataType::LargeList(field),
            DataType::FixedSizeList(_, size) => DataType::FixedSizeList(field, *size),
            other => plan_err!("expected list, got {other}")?,
        };

        Ok(Arc::new(Field::new("", return_type, list.is_nullable())))
    }
}
array_transform invoke_with_args implementation
impl HigherOrderUDF for ArrayTransform {
fn invoke_with_args(&self, args: HigherOrderFunctionArgs) -> Result<ColumnarValue> {
        let (list, lambda) = value_lambda_pair(self.name(), &args.args)?;

        let list_array = list.to_array(args.number_rows)?;

        // Fast path for fully null input array and also the only way to safely work with
        // a fully null fixed size list array as it can't be handled by remove_list_null_values below
        if list_array.null_count() == list_array.len() {
            return Ok(ColumnarValue::Array(new_null_array(
                args.return_type(),
                list_array.len(),
            )));
        }

        // as per list_values docs, if list_array is sliced, list_values will be sliced too,
        // so before constructing the transformed array below, we must adjust the list offsets with
        // adjust_offsets_for_slice
        let list_values = list_values(&list_array)?;

        // by passing closures, lambda.evaluate can evaluate only those actually needed
        let values_param = || Ok(Arc::clone(&list_values));

        // call the transforming lambda
        let transformed_values = lambda
            .evaluate(&[&values_param])?
            .into_array(list_values.len())?;

        let field = match args.return_field.data_type() {
            DataType::List(field)
            | DataType::LargeList(field)
            | DataType::FixedSizeList(field, _) => Arc::clone(field),
            _ => {
                return exec_err!(
                    "{} expected ScalarFunctionArgs.return_field to be a list, got {}",
                    self.name(),
                    args.return_field
                );
            }
        };

        let transformed_list = match list_array.data_type() {
            DataType::List(_) => {
                let list = list_array.as_list();

                // since we called list_values above which would return sliced values for
                // a sliced list, we must adjust the offsets here as otherwise they would be invalid
                let adjusted_offsets = adjust_offsets_for_slice(list);

                Arc::new(ListArray::new(
                    field,
                    adjusted_offsets,
                    transformed_values,
                    list.nulls().cloned(),
                )) as ArrayRef
            }
            DataType::LargeList(_) => {
                let large_list = list_array.as_list();

                // since we called list_values above which would return sliced values for
                // a sliced list, we must adjust the offsets here as otherwise they would be invalid
                let adjusted_offsets = adjust_offsets_for_slice(large_list);

                Arc::new(LargeListArray::new(
                    field,
                    adjusted_offsets,
                    transformed_values,
                    large_list.nulls().cloned(),
                ))
            }
            DataType::FixedSizeList(_, value_length) => {
                Arc::new(FixedSizeListArray::new(
                    field,
                    *value_length,
                    transformed_values,
                    list_array.as_fixed_size_list().nulls().cloned(),
                ))
            }
            other => exec_err!("expected list, got {other}")?,
        };

        Ok(ColumnarValue::Array(transformed_list))
    }
}
How relevant HigherOrderUDF methods would be called and what they would return during planning and evaluation of the example
// this is called at sql planning
let lambda_parameters = lambda_udf.lambda_parameters(&[
    Field::new("", DataType::new_list(DataType::Int32, false), false), // the Field of the [2, 3] literal
])?;

assert_eq!(
    lambda_parameters,
    vec![
            // the parameters that *can* be declared on the lambda, and not only 
            // those actually declared: the implementation doesn't need to care 
            // about it
            vec![
                Field::new("", DataType::Int32, false), // the list inner value
            ]]
);



// this is called every time ExprSchemable is called on a HigherOrderFunction
let return_field = array_transform.return_field_from_args(&LambdaReturnFieldArgs {
    arg_fields: &[
        ValueOrLambda::Value(Field::new("", DataType::new_list(DataType::Int32, false), false)),
        ValueOrLambda::Lambda(Field::new("", DataType::Boolean, false)), // the return_field of the expression "v != 2" when "v" is of the type returned in lambda_parameters
    ],
    scalar_arguments // irrelevant
})?;

assert_eq!(return_field, Field::new("", DataType::new_list(DataType::Boolean, false), false));



let value = array_transform.evaluate(&HigherOrderFunctionArgs {
    args: vec![
        ValueOrLambda::Value(List([2, 3])),
        ValueOrLambda::Lambda(LambdaArgument of `v -> v != 2`),
    ],
    arg_fields, // same as above
    number_rows: 1,
    return_field, // same as above
    config_options, // irrelevant
})?;

assert_eq!(value, BooleanArray::from([false, true]))


A pair HigherOrderUDF/HigherOrderUDFImpl like ScalarFunction was not used because those exist only to maintain backwards compatibility with the older API #8045


Why LambdaVariable and not Column:

Existing tree traversals that operate on columns would break if some column nodes referenced to a lambda parameter and not a real column. In the example query, projection pushdown would try to push the lambda parameter "v", which won't exist in table "t".

Example of code of another traversal that would break:

fn minimize_join_filter(expr: Arc<dyn PhysicalExpr>, ...) -> JoinFilter {
    let mut used_columns = HashSet::new();
    expr.apply(|expr| {
        if let Some(col) = expr.as_any().downcast_ref::<Column>() {
            // if this is a lambda column, this function will break
            used_columns.insert(col.index());
        }
        Ok(TreeNodeRecursion::Continue)
    });
    ...
}

Furthermore, the implemention of ExprSchemable and PhysicalExpr::return_field for Column expects that the schema it receives as a argument contains an entry for its name, which is not the case for lambda parameters.

By including a FieldRef on LambdaVariable that should be resolved during construction time in the sql planner, ExprSchemable and PhysicalExpr::return_field simply return it's own Field:

LambdaVariable ExprSchemable and PhysicalExpr::return_field implementation
impl ExprSchemable for Expr {
   fn to_field(
        &self,
        schema: &dyn ExprSchema,
    ) -> Result<(Option<TableReference>, Arc<Field>)> {
        let (relation, schema_name) = self.qualified_name();
        let field = match self {
           Expr::LambdaVariable(l) => Ok(Arc::clone(&l.field)),
           ...
        }?;

        Ok((
            relation,
            Arc::new(field.as_ref().clone().with_name(schema_name)),
        ))
    }
    ...
}

impl PhysicalExpr for LambdaVariable {
    fn return_field(&self, _input_schema: &Schema) -> Result<FieldRef> {
        Ok(Arc::clone(&self.field))
    }
    ...
}

Possible alternatives discarded due to complexity, requiring downstream changes and implementation size:
  1. Add a new set of TreeNode methods that provides the set of lambdas parameters names seen during the traversal, so column nodes can be tested if they refer to a regular column or to a lambda parameter. Any downstream user that wants to support lambdas would need use those methods instead of the existing ones. This also would add 1k+ lines to the PR.
impl Expr {
    pub fn transform_with_lambdas_params<
        F: FnMut(Self, &HashSet<String>) -> Result<Transformed<Self>>,
    >(
        self,
        mut f: F,
    ) -> Result<Transformed<Self>> {}
}

How minimize_join_filter would looks like:

fn minimize_join_filter(expr: Arc<dyn PhysicalExpr>, ...) -> JoinFilter {
    let mut used_columns = HashSet::new();
    expr.apply_with_lambdas_params(|expr, lambdas_params| {
        if let Some(col) = expr.as_any().downcast_ref::<Column>() {
            // dont include lambdas parameters
            if !lambdas_params.contains(col.name()) {
                used_columns.insert(col.index());
            }
        }
        Ok(TreeNodeRecursion::Continue)
    })
    ...
}
  1. Add a flag to the Column node indicating if it refers to a lambda parameter. Still requires checking for it on existing tree traversals that works on Columns (30+) and also downstream.
//logical
struct Column {
    pub relation: Option<TableReference>,
    pub name: String,
    pub spans: Spans,
    pub is_lambda_parameter: bool,
}

//physical
struct Column {
    name: String,
    index: usize,
    is_lambda_parameter: bool,
}

How minimize_join_filter would look like:

fn minimize_join_filter(expr: Arc<dyn PhysicalExpr>, ...) -> JoinFilter {
    let mut used_columns = HashSet::new();
    expr.apply(|expr| {
        if let Some(col) = expr.as_any().downcast_ref::<Column>() {
            // dont include lambdas parameters
            if !col.is_lambda_parameter {
                used_columns.insert(col.index());
            }
        }
        Ok(TreeNodeRecursion::Continue)
    })
    ...
}
  1. Add a new set of TreeNode methods that provides a schema that includes the lambdas parameters for the scope of the node being visited/transformed:
impl Expr {
    pub fn transform_with_schema<
        F: FnMut(Self, &DFSchema) -> Result<Transformed<Self>>,
    >(
        self,
        schema: &DFSchema,
        f: F,
    ) -> Result<Transformed<Self>> { ... }
    ... other methods
}

For any given HigherOrderFunction found during the traversal, a new schema is created for each lambda argument that contains it's parameter, returned from HigherOrderUDF::lambda_parameters
How it would look like:

pub fn infer_placeholder_types(self, schema: &DFSchema) -> Result<(Expr, bool)> {
        let mut has_placeholder = false;
        // Provide the schema as the first argument. 
        // Transforming closure receive an adjusted_schema as argument
        self.transform_with_schema(schema, |mut expr, adjusted_schema| {
            match &mut expr {
                // Default to assuming the arguments are the same type
                Expr::BinaryExpr(BinaryExpr { left, op: _, right }) => {
                    // use adjusted_schema and not schema. Those expressions may contain 
                    // columns referring to a lambda parameter, which Field would only be
                    // available in adjusted_schema and not in schema
                    rewrite_placeholder(left.as_mut(), right.as_ref(), adjusted_schema)?;
                    rewrite_placeholder(right.as_mut(), left.as_ref(), adjusted_schema)?;
                }
    ....
  1. Make available trought LogicalPlan and ExecutionPlan nodes a schema that includes all lambdas parameters from all expressions owned by the node, and use this schema for tree traversals. For nodes which won't own any expression, the regular schema can be returned
impl LogicalPlan {
    fn lambda_extended_schema(&self) -> &DFSchema;
}

trait ExecutionPlan {
    fn lambda_extended_schema(&self) -> &DFSchema;
}

//usage
impl LogicalPlan {
    pub fn replace_params_with_values(
            self,
            param_values: &ParamValues,
        ) -> Result<LogicalPlan> {
            self.transform_up_with_subqueries(|plan| {
                // use plan.lambda_extended_schema() containing lambdas parameters
                // instead of plan.schema() which wont
                let lambda_extended_schema = Arc::clone(plan.lambda_extended_schema());
                let name_preserver = NamePreserver::new(&plan);
                plan.map_expressions(|e| {
                    // if this expression is child of lambda and contain columns referring it's parameters
                    // the lambda_extended_schema already contain them
                    let (e, has_placeholder) = e.infer_placeholder_types(&lambda_extended_schema)?;
    ....

@rluvaton
Copy link
Copy Markdown
Member

rluvaton commented Apr 24, 2026

@gstvg do you want to write a blog post for this with internal implementation detail, explaining the benefits and so on and we will publish it in datafusion website

Cc @comphead

@comphead
Copy link
Copy Markdown
Contributor

@gstvg do you want to write a blog post for this with internal implementation detail, explaining the benefits and so on and we will publish it in datafusion website

Cc @comphead

+1 for the blogpost

@comphead
Copy link
Copy Markdown
Contributor

@gstvg please rebase the PR, I'll do some quick checks from datafusion-spark crate and Comet to see how it flows downstream

@gstvg
Copy link
Copy Markdown
Contributor Author

gstvg commented Apr 24, 2026

Many thanks for you all, @rluvaton @comphead @LiaCastaneda @pepijnve @martin-g. This branch is now up to date. I still must merge from my progressive_lambda_parameters branch, there's a breaking change there, please don't merge this yet.
Also, there's two breaking changes from #21172, a new argument o LambdaArgument::evaluate and LambdaVariable field becoming an option, but I will pull only them here instead: add the argument but don't use it, and make field an option that is always Some. I'm working on those right now

Finally, I don't like to rush for my own PR's but I do think it's very important that the first 4 major features of #21172 lands on the same release as this, they all are small compared to this one.

I would really love to write a blog post, but sometimes I get blocked when writing stuff, I will try my best. And for now I will focus on the 4 major features cited above.

I would advise not to support fixed size lists due to the performance implication for coping valid values

I will remove it for now, we can re-add it later if it makes sense

@comphead
Copy link
Copy Markdown
Contributor

run benchmark tpch tcpds

@comphead
Copy link
Copy Markdown
Contributor

run benchmark tpch tpcds

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4316887499-1834-85pn4 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing lambda_and_array_transform (bc3f873) to 7d5ddca (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4316887499-1835-djtp7 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing lambda_and_array_transform (bc3f873) to 7d5ddca (merge-base) diff
BENCH_NAME=tcpds
BENCH_COMMAND=cargo bench --features=parquet --bench tcpds
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
  Downloaded deranged v0.5.8
  Downloaded clap_derive v4.6.1
  Downloaded ciborium-ll v0.2.2
  Downloaded ciborium-io v0.2.2
  Downloaded base64-simd v0.8.0
  Downloaded atomic-waker v1.1.2
  Downloaded atoi v2.0.0
  Downloaded cmov v0.5.3
  Downloaded cfg_aliases v0.2.1
  Downloaded async-stream v0.3.6
  Downloaded async-recursion v1.1.1
  Downloaded async-ffi v0.5.0
  Downloaded aws-smithy-runtime-api-macros v1.0.0
  Downloaded aws-smithy-async v1.2.14
  Downloaded allocator-api2 v0.2.21
  Downloaded aws-smithy-observability v0.2.6
    Blocking waiting for file lock on package cache
error: no bench target named `tcpds` in default-run packages

help: a target with a similar name exists: `gcd`

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4316889113-1836-dnrvx 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing lambda_and_array_transform (bc3f873) to 7d5ddca (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4316889113-1837-l7khc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing lambda_and_array_transform (bc3f873) to 7d5ddca (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and lambda_and_array_transform
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃     lambda_and_array_transform ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 40.07 / 41.75 ±1.89 / 45.19 ms │ 40.70 / 41.39 ±1.19 / 43.76 ms │ no change │
│ QQuery 2  │ 20.97 / 21.35 ±0.28 / 21.62 ms │ 20.99 / 21.22 ±0.20 / 21.50 ms │ no change │
│ QQuery 3  │ 37.94 / 39.92 ±2.03 / 43.27 ms │ 38.34 / 39.46 ±1.26 / 41.91 ms │ no change │
│ QQuery 4  │ 18.11 / 18.31 ±0.16 / 18.56 ms │ 18.39 / 19.01 ±0.72 / 20.30 ms │ no change │
│ QQuery 5  │ 47.74 / 50.15 ±1.27 / 51.48 ms │ 48.13 / 51.24 ±1.72 / 53.17 ms │ no change │
│ QQuery 6  │ 17.20 / 17.51 ±0.27 / 17.89 ms │ 17.67 / 18.33 ±1.04 / 20.41 ms │ no change │
│ QQuery 7  │ 53.40 / 55.44 ±1.31 / 57.31 ms │ 54.23 / 55.62 ±1.13 / 57.23 ms │ no change │
│ QQuery 8  │ 47.95 / 48.06 ±0.08 / 48.17 ms │ 48.15 / 48.62 ±0.24 / 48.82 ms │ no change │
│ QQuery 9  │ 53.32 / 54.02 ±0.69 / 55.06 ms │ 53.83 / 54.70 ±0.68 / 55.82 ms │ no change │
│ QQuery 10 │ 65.23 / 65.78 ±0.68 / 67.12 ms │ 66.18 / 67.14 ±1.04 / 68.72 ms │ no change │
│ QQuery 11 │ 14.10 / 14.41 ±0.47 / 15.32 ms │ 14.23 / 14.36 ±0.12 / 14.54 ms │ no change │
│ QQuery 12 │ 27.34 / 27.71 ±0.31 / 28.27 ms │ 27.55 / 27.82 ±0.22 / 28.09 ms │ no change │
│ QQuery 13 │ 37.79 / 38.58 ±1.02 / 40.52 ms │ 37.03 / 38.47 ±0.80 / 39.17 ms │ no change │
│ QQuery 14 │ 27.58 / 27.85 ±0.16 / 28.07 ms │ 28.15 / 28.77 ±1.14 / 31.05 ms │ no change │
│ QQuery 15 │ 33.41 / 33.94 ±0.79 / 35.48 ms │ 33.78 / 34.61 ±1.25 / 37.06 ms │ no change │
│ QQuery 16 │ 15.06 / 15.24 ±0.12 / 15.44 ms │ 15.39 / 15.52 ±0.10 / 15.69 ms │ no change │
│ QQuery 17 │ 78.12 / 79.70 ±1.62 / 82.08 ms │ 79.67 / 81.73 ±1.81 / 84.66 ms │ no change │
│ QQuery 18 │ 74.54 / 75.89 ±0.77 / 76.63 ms │ 75.19 / 76.80 ±1.35 / 78.70 ms │ no change │
│ QQuery 19 │ 37.35 / 37.55 ±0.25 / 37.96 ms │ 37.49 / 37.92 ±0.31 / 38.40 ms │ no change │
│ QQuery 20 │ 39.35 / 39.72 ±0.34 / 40.13 ms │ 39.80 / 40.15 ±0.30 / 40.55 ms │ no change │
│ QQuery 21 │ 63.35 / 64.42 ±0.62 / 65.26 ms │ 62.08 / 64.43 ±1.18 / 65.23 ms │ no change │
│ QQuery 22 │ 17.04 / 17.46 ±0.30 / 17.80 ms │ 17.02 / 17.17 ±0.22 / 17.61 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 884.75ms │
│ Total Time (lambda_and_array_transform)   │ 894.47ms │
│ Average Time (HEAD)                       │  40.22ms │
│ Average Time (lambda_and_array_transform) │  40.66ms │
│ Queries Faster                            │        0 │
│ Queries Slower                            │        0 │
│ Queries with No Change                    │       22 │
│ Queries with Failure                      │        0 │
└───────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.6 GiB
Avg memory 5.0 GiB
CPU user 33.5s
CPU sys 2.5s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.6 GiB
Avg memory 5.0 GiB
CPU user 34.0s
CPU sys 2.3s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and lambda_and_array_transform
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃     lambda_and_array_transform ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 40.55 / 41.43 ±1.06 / 43.14 ms │ 40.25 / 41.49 ±1.29 / 43.07 ms │ no change │
│ QQuery 2  │ 20.93 / 21.09 ±0.26 / 21.60 ms │ 20.82 / 21.34 ±0.49 / 22.24 ms │ no change │
│ QQuery 3  │ 41.32 / 41.64 ±0.22 / 41.96 ms │ 37.74 / 40.10 ±1.64 / 41.78 ms │ no change │
│ QQuery 4  │ 18.14 / 18.32 ±0.18 / 18.63 ms │ 18.12 / 18.37 ±0.15 / 18.52 ms │ no change │
│ QQuery 5  │ 48.94 / 50.10 ±1.10 / 51.89 ms │ 50.57 / 51.35 ±0.64 / 52.46 ms │ no change │
│ QQuery 6  │ 17.38 / 17.54 ±0.19 / 17.88 ms │ 17.14 / 18.10 ±1.19 / 20.43 ms │ no change │
│ QQuery 7  │ 54.33 / 55.55 ±0.90 / 56.90 ms │ 54.91 / 56.60 ±1.74 / 59.97 ms │ no change │
│ QQuery 8  │ 47.75 / 48.35 ±0.36 / 48.81 ms │ 47.79 / 48.35 ±0.34 / 48.72 ms │ no change │
│ QQuery 9  │ 53.48 / 54.68 ±1.23 / 56.58 ms │ 53.71 / 55.38 ±2.46 / 60.28 ms │ no change │
│ QQuery 10 │ 65.40 / 65.72 ±0.33 / 66.21 ms │ 65.16 / 66.69 ±1.70 / 69.70 ms │ no change │
│ QQuery 11 │ 13.88 / 14.33 ±0.31 / 14.85 ms │ 13.97 / 14.31 ±0.23 / 14.68 ms │ no change │
│ QQuery 12 │ 27.12 / 27.72 ±0.55 / 28.75 ms │ 27.49 / 27.98 ±0.53 / 28.83 ms │ no change │
│ QQuery 13 │ 37.36 / 38.38 ±0.96 / 40.06 ms │ 37.18 / 38.19 ±0.75 / 39.12 ms │ no change │
│ QQuery 14 │ 27.87 / 28.01 ±0.08 / 28.09 ms │ 28.02 / 28.09 ±0.07 / 28.22 ms │ no change │
│ QQuery 15 │ 33.64 / 34.08 ±0.77 / 35.62 ms │ 33.53 / 34.56 ±1.27 / 37.06 ms │ no change │
│ QQuery 16 │ 15.09 / 15.28 ±0.14 / 15.52 ms │ 15.18 / 15.34 ±0.13 / 15.53 ms │ no change │
│ QQuery 17 │ 78.12 / 78.86 ±0.51 / 79.40 ms │ 78.19 / 79.78 ±1.47 / 82.29 ms │ no change │
│ QQuery 18 │ 75.65 / 77.27 ±1.64 / 80.39 ms │ 74.56 / 75.32 ±0.62 / 76.03 ms │ no change │
│ QQuery 19 │ 37.34 / 37.61 ±0.26 / 38.02 ms │ 37.19 / 37.45 ±0.27 / 37.91 ms │ no change │
│ QQuery 20 │ 39.36 / 40.72 ±2.49 / 45.70 ms │ 39.37 / 40.05 ±0.78 / 41.57 ms │ no change │
│ QQuery 21 │ 62.51 / 63.49 ±1.14 / 65.44 ms │ 63.44 / 64.78 ±0.73 / 65.37 ms │ no change │
│ QQuery 22 │ 16.91 / 17.21 ±0.28 / 17.71 ms │ 16.86 / 17.11 ±0.23 / 17.55 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 887.38ms │
│ Total Time (lambda_and_array_transform)   │ 890.72ms │
│ Average Time (HEAD)                       │  40.34ms │
│ Average Time (lambda_and_array_transform) │  40.49ms │
│ Queries Faster                            │        0 │
│ Queries Slower                            │        0 │
│ Queries with No Change                    │       22 │
│ Queries with Failure                      │        0 │
└───────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.6 GiB
Avg memory 5.0 GiB
CPU user 33.6s
CPU sys 2.5s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.6 GiB
Avg memory 5.0 GiB
CPU user 33.6s
CPU sys 2.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and lambda_and_array_transform
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃               lambda_and_array_transform ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.89 / 7.36 ±0.82 / 8.99 ms │              6.99 / 7.47 ±0.90 / 9.26 ms │     no change │
│ QQuery 2  │        143.68 / 144.17 ±0.52 / 144.92 ms │        144.33 / 144.70 ±0.37 / 145.38 ms │     no change │
│ QQuery 3  │        113.14 / 113.67 ±0.40 / 114.31 ms │        112.66 / 113.58 ±0.93 / 115.19 ms │     no change │
│ QQuery 4  │     1226.37 / 1231.33 ±3.71 / 1236.79 ms │    1256.91 / 1338.21 ±66.72 / 1455.62 ms │  1.09x slower │
│ QQuery 5  │        173.56 / 175.14 ±1.99 / 179.04 ms │        171.59 / 175.84 ±2.61 / 179.44 ms │     no change │
│ QQuery 6  │       792.15 / 828.11 ±21.81 / 859.82 ms │       828.99 / 893.15 ±46.51 / 973.42 ms │  1.08x slower │
│ QQuery 7  │        335.41 / 337.37 ±1.22 / 339.13 ms │        337.42 / 339.54 ±1.34 / 341.52 ms │     no change │
│ QQuery 8  │        113.20 / 113.86 ±0.53 / 114.67 ms │        113.57 / 114.05 ±0.40 / 114.54 ms │     no change │
│ QQuery 9  │         99.90 / 103.25 ±2.54 / 105.61 ms │        103.33 / 109.45 ±7.50 / 123.72 ms │  1.06x slower │
│ QQuery 10 │        101.51 / 102.39 ±0.64 / 103.38 ms │        103.15 / 104.27 ±0.75 / 105.39 ms │     no change │
│ QQuery 11 │       823.95 / 842.13 ±13.57 / 865.21 ms │        890.10 / 899.40 ±5.24 / 904.86 ms │  1.07x slower │
│ QQuery 12 │           42.83 / 43.71 ±0.60 / 44.54 ms │           43.29 / 43.61 ±0.24 / 43.93 ms │     no change │
│ QQuery 13 │        385.97 / 389.71 ±2.66 / 393.15 ms │        389.54 / 391.79 ±2.04 / 395.05 ms │     no change │
│ QQuery 14 │        984.70 / 986.55 ±1.35 / 988.33 ms │        975.55 / 985.71 ±6.16 / 991.76 ms │     no change │
│ QQuery 15 │           14.62 / 15.01 ±0.34 / 15.65 ms │           14.69 / 14.98 ±0.34 / 15.51 ms │     no change │
│ QQuery 16 │              7.26 / 7.37 ±0.13 / 7.63 ms │              7.56 / 7.62 ±0.07 / 7.76 ms │     no change │
│ QQuery 17 │        220.93 / 223.45 ±1.73 / 226.04 ms │        221.38 / 223.59 ±2.05 / 227.13 ms │     no change │
│ QQuery 18 │        124.70 / 125.50 ±0.75 / 126.85 ms │        126.28 / 127.48 ±1.21 / 129.25 ms │     no change │
│ QQuery 19 │        153.63 / 155.04 ±1.32 / 157.48 ms │        153.71 / 154.71 ±0.53 / 155.23 ms │     no change │
│ QQuery 20 │           12.74 / 13.07 ±0.17 / 13.22 ms │           12.98 / 13.19 ±0.22 / 13.56 ms │     no change │
│ QQuery 21 │           19.05 / 19.17 ±0.08 / 19.28 ms │           18.78 / 19.29 ±0.36 / 19.81 ms │     no change │
│ QQuery 22 │        478.16 / 482.34 ±2.94 / 487.36 ms │        472.14 / 475.26 ±1.90 / 478.07 ms │     no change │
│ QQuery 23 │       809.49 / 822.86 ±10.25 / 836.14 ms │        811.75 / 814.26 ±1.81 / 817.11 ms │     no change │
│ QQuery 24 │        373.33 / 377.86 ±5.91 / 388.87 ms │        371.39 / 374.88 ±2.38 / 377.38 ms │     no change │
│ QQuery 25 │        332.25 / 333.25 ±1.08 / 335.31 ms │        330.28 / 332.52 ±1.52 / 335.05 ms │     no change │
│ QQuery 26 │           76.64 / 77.72 ±1.30 / 80.26 ms │           76.88 / 77.27 ±0.59 / 78.45 ms │     no change │
│ QQuery 27 │              6.89 / 6.98 ±0.12 / 7.21 ms │              7.04 / 7.13 ±0.12 / 7.36 ms │     no change │
│ QQuery 28 │        147.22 / 148.52 ±1.17 / 150.65 ms │        147.80 / 148.91 ±1.09 / 150.79 ms │     no change │
│ QQuery 29 │        270.74 / 273.07 ±1.66 / 275.76 ms │        271.32 / 274.92 ±4.59 / 283.75 ms │     no change │
│ QQuery 30 │           41.27 / 41.74 ±0.26 / 41.96 ms │           41.09 / 41.76 ±0.61 / 42.79 ms │     no change │
│ QQuery 31 │        164.83 / 166.13 ±0.86 / 167.26 ms │        164.45 / 166.54 ±1.97 / 170.27 ms │     no change │
│ QQuery 32 │           13.17 / 13.31 ±0.20 / 13.69 ms │           13.16 / 13.40 ±0.37 / 14.14 ms │     no change │
│ QQuery 33 │        138.63 / 140.01 ±1.45 / 142.65 ms │        138.95 / 140.14 ±1.19 / 142.44 ms │     no change │
│ QQuery 34 │              6.82 / 6.96 ±0.19 / 7.33 ms │              6.78 / 7.02 ±0.23 / 7.45 ms │     no change │
│ QQuery 35 │         99.97 / 101.11 ±0.82 / 102.52 ms │        100.00 / 101.07 ±0.82 / 102.24 ms │     no change │
│ QQuery 36 │              6.56 / 6.72 ±0.14 / 6.97 ms │              6.67 / 6.82 ±0.14 / 7.06 ms │     no change │
│ QQuery 37 │              8.12 / 8.24 ±0.06 / 8.31 ms │              8.35 / 8.46 ±0.07 / 8.56 ms │     no change │
│ QQuery 38 │           86.51 / 88.69 ±3.04 / 94.71 ms │           86.85 / 89.05 ±3.20 / 95.36 ms │     no change │
│ QQuery 39 │        116.07 / 117.31 ±1.03 / 118.81 ms │        116.33 / 118.77 ±1.32 / 120.19 ms │     no change │
│ QQuery 40 │        102.42 / 107.71 ±4.35 / 112.71 ms │        103.26 / 108.06 ±4.23 / 115.91 ms │     no change │
│ QQuery 41 │           14.06 / 14.21 ±0.22 / 14.64 ms │           14.13 / 14.34 ±0.29 / 14.91 ms │     no change │
│ QQuery 42 │        106.42 / 107.01 ±0.49 / 107.47 ms │        105.98 / 107.73 ±2.10 / 111.60 ms │     no change │
│ QQuery 43 │              5.74 / 6.34 ±1.09 / 8.53 ms │              5.81 / 5.94 ±0.13 / 6.18 ms │ +1.07x faster │
│ QQuery 44 │           11.68 / 13.12 ±2.75 / 18.63 ms │           11.65 / 11.95 ±0.28 / 12.43 ms │ +1.10x faster │
│ QQuery 45 │           47.96 / 48.66 ±0.63 / 49.84 ms │           48.25 / 49.05 ±0.50 / 49.80 ms │     no change │
│ QQuery 46 │              8.48 / 8.62 ±0.17 / 8.94 ms │              8.33 / 8.59 ±0.18 / 8.88 ms │     no change │
│ QQuery 47 │        674.35 / 679.59 ±3.29 / 683.05 ms │        670.29 / 678.41 ±7.11 / 687.48 ms │     no change │
│ QQuery 48 │        271.08 / 275.37 ±3.23 / 281.08 ms │        272.36 / 276.55 ±3.53 / 282.52 ms │     no change │
│ QQuery 49 │        247.50 / 249.78 ±2.17 / 253.54 ms │        248.55 / 251.56 ±2.30 / 254.34 ms │     no change │
│ QQuery 50 │        201.72 / 204.44 ±2.83 / 209.47 ms │        201.08 / 206.66 ±6.97 / 220.32 ms │     no change │
│ QQuery 51 │        178.53 / 181.16 ±2.76 / 186.15 ms │        174.16 / 178.05 ±2.07 / 179.72 ms │     no change │
│ QQuery 52 │        105.96 / 107.33 ±0.80 / 108.44 ms │        106.75 / 107.04 ±0.22 / 107.27 ms │     no change │
│ QQuery 53 │        101.66 / 102.88 ±0.85 / 104.30 ms │        102.16 / 104.60 ±3.30 / 111.07 ms │     no change │
│ QQuery 54 │        144.23 / 144.94 ±0.85 / 146.60 ms │        145.28 / 146.89 ±1.63 / 149.96 ms │     no change │
│ QQuery 55 │        105.18 / 106.40 ±0.69 / 107.11 ms │        105.33 / 106.55 ±0.79 / 107.45 ms │     no change │
│ QQuery 56 │        138.97 / 139.77 ±0.61 / 140.74 ms │        140.85 / 141.84 ±0.81 / 142.98 ms │     no change │
│ QQuery 57 │        164.36 / 165.42 ±1.30 / 167.95 ms │        166.65 / 168.90 ±1.90 / 171.92 ms │     no change │
│ QQuery 58 │        310.38 / 311.19 ±0.76 / 312.38 ms │        311.42 / 312.39 ±0.75 / 313.72 ms │     no change │
│ QQuery 59 │        193.47 / 194.99 ±1.33 / 197.13 ms │        194.31 / 195.30 ±1.55 / 198.34 ms │     no change │
│ QQuery 60 │        140.33 / 141.11 ±0.41 / 141.50 ms │        142.77 / 144.68 ±1.32 / 146.59 ms │     no change │
│ QQuery 61 │           13.45 / 13.54 ±0.10 / 13.70 ms │           14.12 / 14.19 ±0.06 / 14.29 ms │     no change │
│ QQuery 62 │       845.74 / 859.54 ±10.69 / 874.21 ms │       916.49 / 944.12 ±15.99 / 958.77 ms │  1.10x slower │
│ QQuery 63 │        101.01 / 104.53 ±4.70 / 113.84 ms │        104.81 / 105.43 ±0.67 / 106.48 ms │     no change │
│ QQuery 64 │        654.92 / 659.70 ±4.02 / 665.29 ms │        698.55 / 710.80 ±6.47 / 717.85 ms │  1.08x slower │
│ QQuery 65 │        241.82 / 244.18 ±1.81 / 246.35 ms │        276.75 / 284.84 ±5.70 / 292.36 ms │  1.17x slower │
│ QQuery 66 │        214.74 / 223.09 ±7.99 / 232.98 ms │        233.96 / 242.35 ±7.03 / 251.18 ms │  1.09x slower │
│ QQuery 67 │        290.40 / 295.37 ±7.70 / 310.71 ms │        322.82 / 328.88 ±8.80 / 345.99 ms │  1.11x slower │
│ QQuery 68 │            8.52 / 10.33 ±3.21 / 16.73 ms │              9.43 / 9.60 ±0.11 / 9.72 ms │ +1.08x faster │
│ QQuery 69 │           97.39 / 98.19 ±0.60 / 99.09 ms │        101.50 / 102.65 ±0.85 / 103.86 ms │     no change │
│ QQuery 70 │        312.03 / 318.84 ±4.62 / 324.93 ms │        328.76 / 337.43 ±8.43 / 348.85 ms │  1.06x slower │
│ QQuery 71 │        132.36 / 133.88 ±0.78 / 134.53 ms │        136.45 / 139.29 ±3.01 / 144.92 ms │     no change │
│ QQuery 72 │       579.12 / 591.53 ±10.03 / 604.90 ms │        617.65 / 623.99 ±5.49 / 631.35 ms │  1.05x slower │
│ QQuery 73 │              6.68 / 6.81 ±0.21 / 7.23 ms │              7.13 / 7.24 ±0.13 / 7.49 ms │  1.06x slower │
│ QQuery 74 │        524.65 / 528.62 ±2.51 / 532.39 ms │       634.04 / 652.67 ±12.95 / 668.04 ms │  1.23x slower │
│ QQuery 75 │        266.51 / 268.75 ±2.35 / 273.19 ms │        270.91 / 274.57 ±3.74 / 281.76 ms │     no change │
│ QQuery 76 │        129.74 / 131.41 ±2.19 / 135.59 ms │        132.96 / 133.93 ±1.17 / 136.09 ms │     no change │
│ QQuery 77 │        186.12 / 187.83 ±1.51 / 190.49 ms │        189.06 / 190.74 ±1.71 / 193.04 ms │     no change │
│ QQuery 78 │        329.42 / 330.07 ±0.56 / 330.73 ms │        342.42 / 344.66 ±1.59 / 346.59 ms │     no change │
│ QQuery 79 │        225.48 / 227.46 ±1.62 / 229.35 ms │        244.44 / 249.51 ±5.78 / 258.66 ms │  1.10x slower │
│ QQuery 80 │        318.94 / 320.56 ±1.08 / 321.69 ms │        325.19 / 327.79 ±3.30 / 334.29 ms │     no change │
│ QQuery 81 │           25.58 / 26.28 ±0.88 / 28.01 ms │           26.67 / 26.98 ±0.28 / 27.36 ms │     no change │
│ QQuery 82 │           38.83 / 39.35 ±0.37 / 39.75 ms │           39.94 / 40.63 ±0.59 / 41.46 ms │     no change │
│ QQuery 83 │           36.99 / 37.23 ±0.15 / 37.44 ms │           38.07 / 38.12 ±0.06 / 38.24 ms │     no change │
│ QQuery 84 │           45.74 / 46.25 ±0.44 / 47.01 ms │           46.95 / 48.05 ±1.41 / 50.83 ms │     no change │
│ QQuery 85 │        140.00 / 141.17 ±1.49 / 144.03 ms │        142.54 / 144.50 ±2.00 / 148.35 ms │     no change │
│ QQuery 86 │           36.76 / 37.12 ±0.25 / 37.52 ms │           37.60 / 38.13 ±0.33 / 38.57 ms │     no change │
│ QQuery 87 │              3.47 / 3.53 ±0.07 / 3.67 ms │              3.57 / 3.68 ±0.15 / 3.96 ms │     no change │
│ QQuery 88 │         99.30 / 100.55 ±1.59 / 103.67 ms │        100.78 / 102.19 ±0.97 / 103.29 ms │     no change │
│ QQuery 89 │        114.69 / 115.81 ±0.94 / 117.27 ms │        117.81 / 121.35 ±5.58 / 132.46 ms │     no change │
│ QQuery 90 │           22.07 / 22.51 ±0.26 / 22.77 ms │           22.91 / 23.33 ±0.30 / 23.71 ms │     no change │
│ QQuery 91 │           57.55 / 58.25 ±1.08 / 60.39 ms │           59.81 / 60.51 ±0.73 / 61.73 ms │     no change │
│ QQuery 92 │           55.85 / 56.02 ±0.14 / 56.19 ms │           57.33 / 57.84 ±0.54 / 58.51 ms │     no change │
│ QQuery 93 │        180.48 / 181.73 ±1.37 / 183.91 ms │        186.35 / 189.67 ±3.38 / 195.55 ms │     no change │
│ QQuery 94 │           59.80 / 60.24 ±0.38 / 60.81 ms │           61.41 / 62.59 ±1.31 / 65.04 ms │     no change │
│ QQuery 95 │        125.34 / 126.06 ±0.59 / 127.05 ms │        127.18 / 127.82 ±0.40 / 128.42 ms │     no change │
│ QQuery 96 │           68.31 / 69.21 ±0.78 / 70.49 ms │           69.51 / 72.10 ±3.95 / 79.97 ms │     no change │
│ QQuery 97 │        116.25 / 117.30 ±0.84 / 118.76 ms │        119.42 / 121.20 ±1.53 / 123.33 ms │     no change │
│ QQuery 98 │        148.65 / 149.58 ±0.55 / 150.32 ms │        153.53 / 155.55 ±1.11 / 156.85 ms │     no change │
│ QQuery 99 │ 10688.93 / 10735.09 ±66.88 / 10867.69 ms │ 10898.46 / 11002.08 ±65.49 / 11073.42 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 30177.74ms │
│ Total Time (lambda_and_array_transform)   │ 31205.89ms │
│ Average Time (HEAD)                       │   304.83ms │
│ Average Time (lambda_and_array_transform) │   315.21ms │
│ Queries Faster                            │          3 │
│ Queries Slower                            │         14 │
│ Queries with No Change                    │         82 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 155.0s
Peak memory 6.7 GiB
Avg memory 5.7 GiB
CPU user 253.0s
CPU sys 7.9s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 160.0s
Peak memory 6.3 GiB
Avg memory 5.7 GiB
CPU user 260.5s
CPU sys 9.1s
Peak spill 0 B

File an issue against this benchmark runner

@comphead
Copy link
Copy Markdown
Contributor

Thanks @gstvg Im planning to test this PR with downstream tomorrow

@gstvg gstvg force-pushed the lambda_and_array_transform branch from 07e4125 to c6324c4 Compare April 26, 2026 21:35
Comment on lines +356 to +553
/// Return the field of all the parameters supported by the lambdas in `fields`.
/// If a lambda support multiple parameters, all should be returned, regardless of
/// whether they are used or not on a particular invocation
///
/// Tip: If you have a [`HigherOrderFunction`] invocation, you can call the helper
/// [`HigherOrderFunction::lambda_parameters`] instead of this method directly
///
/// The name of the returned fields are ignored.
///
/// This function is repeatedelly called until [LambdaParametersProgress::Complete] is returned, with
/// `step` increased by one at each invocation, starting at 0.
///
/// For functions which all lambda parameters depend only on the field of it's value arguments,
/// this can return [LambdaParametersProgress::Complete] at step 0. Taking as an example a strict
/// array_reduce with the signature `(arr: [V], initial_value: I, (I, V) -> I, (I) -> O) -> O`, which
/// requires it's initial value to be the exact same type of it's merge output, which is also the
/// parameter of it's finish lambda, the expression
///
/// `array_reduce([1.2, 2.1], 0.0, (acc, v) -> acc + v + 1.5, v -> v > 5.1)`
///
/// would result in this function being called as the following:
///
/// ```ignore
/// let lambda_parameters = array_reduce.lambda_parameters(
/// 0,
/// &[
/// // the Field of the literal `[1.2, 2.1]`, the array being reduced
/// ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
/// // the Field of the literal `0.0`, the initial value
/// ValueOrLambda::Value(Arc::new(Field::new("", DataType::Float32, true))),
/// // the Field of the output of the merge lambda, which is unknown at this point because it depends
/// // on the return of this call
/// ValueOrLambda::Lambda(None),
/// // the Field of the output of the finish lambda, unknown for the same reason as above
/// ValueOrLambda::Lambda(None),
/// ])?;
///
/// assert_eq!(
/// lambda_parameters,
/// LambdaParametersProgress::Complete(vec![
/// // the finish lambda supported parameters, regardless of how many are actually used
/// vec![
/// // the accumulator which is the field of the initial value
/// Arc::new(Field::new("ignored_name", DataType::Float32, true)),
/// // the array values being reduced
/// Arc::new(Field::new("", DataType::Float32, true)),
/// ],
/// // the merge lambda supported parameters
/// vec![
/// // the reduced value which is the field of the initial value
/// Arc::new(Field::new("ignored_name", DataType::Float32, true)),
/// ],
/// ])
/// );
/// ```
///
/// For functions which lambda parameters depends on the output of other lambdas, or on their own lambda,
/// this can return [LambdaParametersProgress::Partial] until all dependencies are met. Note that for
/// lambda with cyclic dependencies, you likely want to use [HigherOrderUDF::coerce_values_for_lambdas] too.
/// Take as an example a flexible array_reduce with the signature `(arr: [V], initial_value: I, (ACC, V) -> ACC, (ACC) -> O) -> O`.
/// It has a cyclic dependency in the merge lambda, and a dependency of the finish lambda in the merge lambda,
/// and only requires the initial value to be *coercible* to the output of the merge lambda, which is defined by
/// it's [HigherOrderUDF::coerce_values_for_lambdas] implementation. The expression
///
/// `array_reduce([1.2, 2.1], 0, (acc, v) -> acc + v + 1.5, v -> v > 5.1)`
///
/// would result in this function being called as the following:
///
/// ```ignore
/// let lambda_parameters = array_reduce.lambda_parameters(
/// 0,
/// &[
/// // the Field of the literal `[1.2, 2.1]`, the array being reduced
/// ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
/// // the Field of the literal `0`, the initial value
/// ValueOrLambda::Value(Arc::new(Field::new("", DataType::Int32, true))),
/// // the Field of the output of the merge lambda, which is unknown at this point because it depends on
/// // the return this call
/// ValueOrLambda::Lambda(None),
/// // the Field of the output of the finish lambda, unknown for the same reason as above
/// ValueOrLambda::Lambda(None),
/// ])?;
///
/// assert_eq!(
/// lambda_parameters,
/// LambdaParametersProgress::Partial(vec![
/// // the finish lambda supported parameters, regardless of how many are actually used
/// Some(vec![
/// // at step 0, use the field of the initial value
/// Arc::new(Field::new("ignored_name", DataType::Int32, true)),
/// // the array values being reduced
/// Arc::new(Field::new("", DataType::Float32, true)),
/// ]),
/// // the merge lambda supported parameters, unknown at this point due to dependency on the merge output
/// None,
/// ])
/// );
///
/// let lambda_parameters = array_reduce.lambda_parameters(
/// 1,
/// &[
/// // the Field of the literal `[1.2, 2.1]`, the array being reduced
/// ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
/// // the Field of the literal `0`, the initial value
/// ValueOrLambda::Value(Arc::new(Field::new("", DataType::Int32, true))),
/// // the Field of the output of the merge lambda, which could be inferred to be a Float32 based on the
/// // returned values of the previous step
/// ValueOrLambda::Value(Arc::new(Field::new("", DataType::Float32, true))),
/// // the Field of the output of the finish lambda, which is unknown at this point because it depends
/// // on the return of this call
/// ValueOrLambda::Lambda(None),
/// ])?;
///
/// assert_eq!(
/// lambda_parameters,
/// LambdaParametersProgress::Complete(vec![
/// // the finish lambda supported parameters, regardless of how many are actually used
/// vec![
/// // the finish lambda own output now used as it's accumulator
/// Arc::new(Field::new("ignored_name", DataType::Float32, true)),
/// // the array values being reduced
/// Arc::new(Field::new("", DataType::Float32, true)),
/// ],
/// // the merge lambda supported parameters, which is the output of the merge lambda,
/// vec![
/// // the output of the merge lambda
/// Arc::new(Field::new("", DataType::Float32, true)),
/// ],
/// ])
/// );
///
/// let coerce_to = array_reduce.coerce_values_for_lambdas(&[
/// // the literal `[1.2, 2.1]` data type, the array being reduced
/// ValueOrLambda::Value(DataType::new_list(DataType::Float32, true)),
/// // the literal `0` data type, the initial value
/// ValueOrLambda::Value(DataType::Int32),
/// // the output data type of the merge lambda
/// ValueOrLambda::Lambda(DataType::Float32),
/// // the output data type of the finish lambda
/// ValueOrLambda::Lambda(DataType::Boolean),
/// ])?;
///
/// assert_eq!(
/// coerce_to,
/// vec![
/// // return the same type for the array being reduced
/// DataType::new_list(DataType::Float32, true),
/// // coerce the initial value to the output of the merge lambda
/// DataType::Float32,
/// ]
/// );
///
/// ```
///
/// Note this may also be called at step 0 with all lambda outputs already set, and in that case,
/// [LambdaParametersProgress::Complete] must be returned
///
/// The implementation can assume that some other part of the code has coerced
/// the actual argument types to match [`Self::signature`], except the coercion defined by
/// [Self::coerce_values_for_lambdas], if applicable.
///
/// [`HigherOrderFunction`]: crate::expr::HigherOrderFunction
/// [`HigherOrderFunction::lambda_parameters`]: crate::expr::HigherOrderFunction::lambda_parameters
fn lambda_parameters(
&self,
step: usize,
fields: &[ValueOrLambda<FieldRef, Option<FieldRef>>],
) -> Result<LambdaParametersProgress>;

/// Coerce value arguments of a function call to types that the function can evaluate also taking into
/// account the *output type of it's lambdas*. This differs from [HigherOrderUDF::coerce_value_types]
/// that only has access to the type of it's value arguments. So that this method is called, the
/// function must have it's [HigherOrderSignature::coerce_values_for_lambdas] set to true
///
/// See the [type coercion module](crate::type_coercion)
/// documentation for more details on type coercion
///
/// # Parameters
/// * `fields`: The argument types of the value arguments of this function, or the output type of lambdas
///
/// # Return value
/// A Vec with the same number of [ValueOrLambda::Value] in `fields`. DataFusion will `CAST` the
/// function call arguments to these specific types.
///
/// For example, a flexible array_reduce implementation (see [Self::lambda_parameters] docs), when working
/// with the expression below, may want to coerce it's initial value argument, the *integer* `0`,
/// to match the output it's merge function, which is a *float*:
///
/// `array_reduce([1.2, 2.1], 0, (acc, v) -> acc + v + 1.5, v -> v > 2.0)`
fn coerce_values_for_lambdas(
&self,
_fields: &[ValueOrLambda<DataType, DataType>],
) -> Result<Vec<DataType>> {
not_impl_err!(
"{} coerce_values_for_lambdas is not implemented",
self.name()
)
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gstvg
Copy link
Copy Markdown
Contributor Author

gstvg commented Apr 26, 2026

@rluvaton @comphead @LiaCastaneda @pepijnve I pushed 284b5d2 to support flexible array_reduce signature, and pinged you all at the main entry point of the commit #21679 (comment)

Also pulled breaking changes from lambda capture (5b42e48) and making lambda variable field optional(e83107e), and removed direct support for fixed size lists in array_transform and coerced them to normal lists instead (7bc1f55)

Finally, I noted two minor and breaking improvements but didn't want to delay this even more but we can still do it:

  1. Make logical and physical lambda able to store it's fields, and not only it's names:
struct Lambda {
    pub params: LambdaParams,
    pub body: Box<Expr>,
}

pub enum LambdaParams {
    Bound(Vec<FieldRef>),
    // sql planner produces bound lambdas, a follow-on pr will add a helper to bound all lambdas of a expr tree and/or a logical plan
    Unbound(Vec<String>),
}

pub struct LambdaExpr {
    params: Vec<FieldRef>,
    body: Arc<dyn PhysicalExpr>,
}

This would eliminate the following two calls to lambda_parameters(it would only be called during sql planning), as the lambda should already contain the fields(unbound lambdas raises an error during planning as unbound lambda variable):

let mut lambda_parameters = match self.fun().lambda_parameters(0, &fields)? {
LambdaParametersProgress::Partial(_) => {
return plan_err!(
"{} lambda_parameters returned a partial result when the return type of all it's lambdas were provided",
self.name()
);
}
LambdaParametersProgress::Complete(items) => items.into_iter(),
};

Expr::HigherOrderFunction(invocation @ HigherOrderFunction { func, args }) => {
let num_lambdas = args
.iter()
.filter(|arg| matches!(arg, Expr::Lambda(_)))
.count();
let mut lambda_parameters =
invocation.lambda_parameters(input_dfschema)?.into_iter();
if num_lambdas > lambda_parameters.len() {
return plan_err!(
"{} lambda_parameters returned only {} values for {num_lambdas} lambdas",
func.name(),
lambda_parameters.len()
);
}
let physical_args = args
.iter()
.map(|arg| match arg {
Expr::Lambda(lambda) => {
let lambda_parameters = lambda_parameters
.next()
.ok_or_else(|| {
internal_datafusion_err!(
"lambda_parameters len should have been checked above"
)
})?
.into_iter()
.zip(&lambda.params)
.map(|(field, name)| field.renamed(name.as_str()))
.collect();
let lambda_schema = DFSchema::from_unqualified_fields(
lambda_parameters,
HashMap::new(),
)?;
create_physical_expr(arg, &lambda_schema, execution_props)
}
_ => create_physical_expr(arg, input_dfschema, execution_props),
})
.collect::<Result<_>>()?;
let config_options = match execution_props.config_options.as_ref() {
Some(config_options) => Arc::clone(config_options),
None => Arc::new(ConfigOptions::default()),
};
Ok(Arc::new(HigherOrderFunctionExpr::try_new_with_schema(
Arc::clone(func),
physical_args,
input_schema,
config_options,
)?))
}

For function implementors this won't change anything, but would be easier to write a custom physical expr planner, for example.

You can see how it would look like on this commit gstvg@3a8ad01

  1. Disambiguate LambdaVariable name
enum LambdaVariableField {
    Unbound(String),
    Bound(FieldRef),
}

struct LambdaVariable {
    pub field: LambdaVariableField,
    pub spans: Spans,
}

This would remove the ambiguity between the name property and the FieldRef name

Comment on lines +211 to +215
/// A RecordBatch with the captured columns inside the lambda body, if any
///
/// For example, for `array_transform([2], v -> v + a + b)`,
/// this will be a `RecordBatch` with columns `a` and `b`
captures: Option<RecordBatch>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that we should have this for column capture and I think it would be a long review for the design on the column capture itself, so I'm fine for not including this preparation for column capture now and create a breaking change in a different pr (I'm also referring to the rest of the breaking changes made to support column capture like the evaluate added argument)

for example I think we should have ProjectionExpr, and use the existing logic for projecting, and not get all the arrays in the evaluate function

pub fn evaluate(
&self,
args: &[&dyn Fn() -> Result<ArrayRef>],
_adjust: impl FnOnce(&[ArrayRef]) -> Result<Vec<ArrayRef>>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would revert this as well as explained before

Comment on lines +305 to +309
/// The return of [HigherOrderUDF::lambda_parameters]
pub enum LambdaParametersProgress {
Partial(Vec<Option<Vec<FieldRef>>>),
Complete(Vec<Vec<FieldRef>>),
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs more comments explaining what each variant is, and when you should use each one (I know that you added some comments regarding this on lambda_parameters but some should also be here)

Lambda(L),
}

/// The return of [HigherOrderUDF::lambda_parameters]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment should explain the enum and not the function that return it IMO

@rluvaton rluvaton dismissed their stale review April 27, 2026 08:54

have new important comments

Copy link
Copy Markdown
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should move the column capture support to a follow up PR, since it would involve more design discussions and this PR already proves that most lambda functions can be implemented easily -- I don't think we should block on it. Left some comments on the multi step type resolution you introduced in the last commits

Comment on lines +518 to +524
if step > 256 {
return plan_err!(
"{} lambda_parameters called 256 times without completion",
fm.name()
);
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 This limit feels arbitrary, even if I can't imagine a situation where a user has 256+ lambdas in a higher-order function each outputting a different type, in theory implementors express this. Since implementors know beforehand how their higher-order function typing works, it might make sense to include something like max_lambda_parameter_steps: usize in HigherOrderSignature, defaulting to 1?

Comment on lines +545 to +548
fn coerce_values_for_lambdas(
&self,
_fields: &[ValueOrLambda<DataType, DataType>],
) -> Result<Vec<DataType>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coerce_values_for_lambdas and coerce_value_types do the same thing, but one is called after the lambdas are planned right? iiuc, you added a separate function because you need to see both lambdas and values via ValueOrLambda. If that's the only reason for having a separate function, maybe we should just have one function that receives &[ValueOrLambda<DataType, DataType>]? In the other PR I suggested having only &[DataType] because I couldn't think of a near term use case that would actually need both.

Not sure what the cleaner approach is here -- having two APIs that do the same thing, or having one API with &[ValueOrLambda<DataType, DataType>] as the parameter.

also I see this function is only called if coerce_values_for_lambdas flag is set to true in the signature, which can be easy to forget to the implementor.

@comphead
Copy link
Copy Markdown
Contributor

I tried to build array_exists spark function using current approach #21881

Good thing it works. From the performance point of view there are some improvements can be made for lambdas but likely we can address it in follow, @gstvg it is up to you

 Loops — hoisting & fusion                                                                                                                                                                                
                                                                                                                                                                                                           
  L1. Three passes over self.args in one evaluate call. datafusion/physical-expr/src/higher_order_function.rs:247, :269, :309 each iterate self.args, and each of the first and third pass calls           
  self.lambda_positions.contains(&i) (linear scan) inside the loop body. For N args with L lambdas, that's O(N·L) per batch just to classify args — the classification is already known at construction    
  time.                                                     
                                                                                                                                                                                                           
  - Fix: Replace lambda_positions: Vec<usize> with a per-arg enum cached on the struct:                                                                                                                    
  enum ArgSlot { Value(Arc<dyn PhysicalExpr>), Lambda(Arc<LambdaExpr>) }
  slots: Vec<ArgSlot>,                                                                                                                                                                                     
  - The three passes then become one fused loop; O(N) per batch, zero allocation for the lookup, and the slow wrapped_lambda tree walk (:459-474) collapses to a cached Arc<LambdaExpr>.                   
                                                                                                                                                                                                           
  L2. Pass 2 is pure reshaping of pass 1 (arg_fields → fields, cloning Arcs). Can be folded into pass 1 by building the Vec<ValueOrLambda<FieldRef, Option<FieldRef>>> directly.                           
                                                                                                                                                                                                           
  ---                                                                                                                                                                                                      
  Data structures — precompute / augment                    
                                                                                                                                                                                                           
  D1. LambdaArgument::evaluate rebuilds the inner Schema on every call. datafusion/expr/src/higher_order_function.rs:256:
  let schema = Arc::new(Schema::new(self.params.clone()));                                                                                                                                                 
  Schema::new builds a name→index HashMap internally. For a nested HOF (array_transform([[1,2]], a -> array_transform(a, b -> ...))) this runs once per outer sublist per batch. Build it once in
  LambdaArgument::new and store Arc<Schema>.                                                                                                                                                               
                                                                                                                                                                                                           
  D2. self.fun.clear_null_values() is called once per non-lambda arg per batch (:349). It's a &self method returning a constant bool. Cache as a bool on HigherOrderFunctionExpr at construction.          
                                                                                                                                                                                                           
  D3. wrapped_lambda walks the expr tree every evaluate. :253 and :315 both call it. Pre-resolve the Arc<LambdaExpr> once at try_new_with_schema / with_new_children and store it — lambdas don't change   
  mid-execution.                                                                                                                                                                                           
                                                                                                                                                                                                           
  D4. HigherOrderFunctionArgs::args and ::arg_fields are always Vec even for 2-arg HOFs (array_transform, array_exists, array_filter — the vast majority). Switching to smallvec::SmallVec<[_; 2]>         
  eliminates a heap alloc per batch for the common shape.
                                                                                                                                                                                                           
  ---                                                       
  Logic — short-circuit / reorder / lazy
                                                                                                                                                                                                           
  G1. lambda_parameters(0, &fields) runs unconditionally at :285 even when there are no lambdas. Guard with if self.lambda_positions.is_empty() { skip }.
                                                                                                                                                                                                           
  G2. conditional_arguments is specced but unused by the evaluator. The trait exposes short_circuits() + conditional_arguments(args) to mark args as lazy, but HigherOrderFunctionExpr::evaluate still     
  eagerly arg.evaluate(batch) for every non-lambda value (:346). A short-circuiting array_exists/array_any can't benefit from skipping remaining work today. Wire this up.                                 
                                                                                                                                                                                                           
  G3. remove_list_null_values path (:349-355) is entered for every List/LargeList arg. The callee has a null_count == 0 fast path that returns list.clone(), but you still pay an outer Arc::new+match. Add
   array.null_count() == 0 short-circuit at the caller.
                                                                                                                                                                                                           
  G4. Constant-body lambdas aren't specialized. array_transform([..5], v -> 42) builds a RecordBatch and evaluates the body over all values despite the body not referencing v. Detect "no LambdaVariable  
  in body" at construction; evaluate the body on an empty batch once and replicate.
                                                                                                                                                                                                           
  ---                                                                                                                                                                                                      
  Expressions — word/SIMD parallelism & CSE
                                                                                                                                                                                                           
  E1. Per-row predicate scans are elementwise today (see my compute_exists in array_exists.rs and the array_transform pattern). Replace the for i in start..end { if predicate.value(i) … } loops with
  Arrow's bit-level operations on the predicate's underlying BooleanBuffer:                                                                                                                                
                                                            
  let row_slice = predicate.slice(start, end - start);                                                                                                                                                     
  if row_slice.true_count() > 0   { TRUE }            // SIMD popcount                                                                                                                                     
  else if row_slice.null_count() > 0 { NULL }                                                                                                                                                              
  else                               { FALSE }                                                                                                                                                             
                                                                                                                                                                                                           
  Same trick applies to any boolean-reducing HOF (exists, forall, filter size estimation).                                                                                                                 
                                                                                                                                                                                                           
  E2. self.lambda_positions.contains(&i) is a repeated CSE target (L1 above covers it).                                                                                                                    
                                                            
  ---                                                                                                                                                                                                      
  Procedures — parallelism                                  
                                                                                                                                                                                                           
  P1. Flat lambda evaluation over list_values is embarrassingly parallel. For per-row-independent HOFs (transform, exists, filter, any, forall), list_values can be chunked and the lambda body evaluated
  in parallel per chunk, then concatenated. Reuses Arrow's kernels that already release the GIL equivalent. Gates on detecting side-effect-free lambdas (e.g. !is_volatile_node()).                        
                                                            
  P2. No HOF benchmarks exist (rg "HigherOrder" benches/ returns nothing). Before any of the above, add a Criterion bench for array_transform over (List, LargeList, FixedSizeList × 10K/1M rows ×         
  scalar/columnar lambda body). Without baseline numbers, "faster" is unmeasurable.
                                                                                                                                                                                                           
  ---                                                       
  Suggested priority order
                                                                                                                                                                                                           
  1. D3 + L1 + L2 (cache ArgSlot, fold loops) — one edit, removes the O(N·L) classification and two Vec reallocs per batch.
  2. D1 (cache lambda Schema) — one-line fix, big win for nested HOFs.                                                                                                                                     
  3. E1 (SIMD popcount on predicate boolean buffer) — biggest per-element win on array_exists / filter / forall.                                                                                           
  4. P2 (add Criterion benches) — required to validate any of the above.                                                                                                                                   
  5. G2 (wire up conditional_arguments) — needed before short-circuiting HOFs like exists can skip work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation execution Related to the execution crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate spark sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants