BFV Compute-on-Read for get_historical_features() in SparkOfflineStore #6345

Is your feature request related to a problem? Please describe.
When using @batch_feature_view with TransformationMode.PYTHON and the Spark offline store, calling store.get_historical_features() fails with UNRESOLVED_COLUMN errors. The PIT join SQL template reads directly from the raw batch_source and expects output columns (e.g., aggregated features) to exist in the source data. The BFV's Python transformation is never invoked during offline reads — only during feast materialize.
This forces users to either:

Maintain a separate ETL pipeline that pre-computes the same features the BFV defines
Use plain FeatureView pointing at pre-computed data, duplicating transformation logic
This breaks the "define once, use everywhere" promise of the feature store.

Describe the solution you'd like
In SparkOfflineStore.get_historical_features(), before building the PIT join SQL, detect BatchFeatureView instances with a UDF. For each:

Read the raw source into a Spark DataFrame
Invoke the BFV's udf() function (same as SparkTransformationNode.execute() does during materialization)
Register the transformed DataFrame as a Spark temp view
Replace the table_subquery in the FeatureViewQueryContext with the temp view name
This makes the entire pipeline distributed Spark: raw read -> transformation -> PIT join -> training data. No code duplication required.

Describe alternatives you've considered

Pre-compute features via an external Spark job and use plain FeatureView (works but duplicates logic)
Set offline=True on BFVs and rely on materialized offline parquet (requires running feast materialize before training, adds operational complexity)
Use on_demand_feature_view for transformations (doesn't support Spark-native aggregations like groupBy)

Additional context
Add any other context or screenshots about the feature request here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFV Compute-on-Read for get_historical_features() in SparkOfflineStore #6345

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFV Compute-on-Read for get_historical_features() in SparkOfflineStore #6345

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Footer

Footer navigation