← 返回首页
BFV Compute-on-Read for get_historical_features() in SparkOfflineStore · Issue #6345 · feast-dev/feast · GitHub
Skip to content

Navigation Menu

Toggle navigation
Sign in
Appearance settings
Search or jump to...

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Resetting focus

BFV Compute-on-Read for get_historical_features() in SparkOfflineStore #6345

New issue
New issue

Description

Is your feature request related to a problem? Please describe.
When using @batch_feature_view with TransformationMode.PYTHON and the Spark offline store, calling store.get_historical_features() fails with UNRESOLVED_COLUMN errors. The PIT join SQL template reads directly from the raw batch_source and expects output columns (e.g., aggregated features) to exist in the source data. The BFV's Python transformation is never invoked during offline reads — only during feast materialize.
This forces users to either:

  • Maintain a separate ETL pipeline that pre-computes the same features the BFV defines
  • Use plain FeatureView pointing at pre-computed data, duplicating transformation logic
    This breaks the "define once, use everywhere" promise of the feature store.

Describe the solution you'd like
In SparkOfflineStore.get_historical_features(), before building the PIT join SQL, detect BatchFeatureView instances with a UDF. For each:

  1. Read the raw source into a Spark DataFrame
  2. Invoke the BFV's udf() function (same as SparkTransformationNode.execute() does during materialization)
  3. Register the transformed DataFrame as a Spark temp view
  4. Replace the table_subquery in the FeatureViewQueryContext with the temp view name
    This makes the entire pipeline distributed Spark: raw read -> transformation -> PIT join -> training data. No code duplication required.

Describe alternatives you've considered

  • Pre-compute features via an external Spark job and use plain FeatureView (works but duplicates logic)
  • Set offline=True on BFVs and rely on materialized offline parquet (requires running feast materialize before training, adds operational complexity)
  • Use on_demand_feature_view for transformations (doesn't support Spark-native aggregations like groupBy)

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Footer

      © 2026 GitHub, Inc.