feat: Updating FeatureViewProjection and OnDemandFeatureView to add batch_source and entities#4530

franciscojavierarceo

What this PR does / why we need it:

This PR addresses some gaps in the FeatureViewProjection, BaseFeatureView, and OnDemandFeatureview classes. In particular, this PR aims to allow the data source to be included in the BaseFeatureView and all data sources in a FeatureViewProjection. I make the assumption that a single BaseFeatureView can only have one datasource, which is a reasonable assumption.

This structure allows users to express an explicit dependency graph with the BaseFeatureView being ultimately tied to one single, foundational data source which I believe is the right pattern.

Along with storing the datasources, we also store relevant batch_source data in the Projection.

More specificially, this PR update the following:

FeatureViewProjection to include the underlying data sources for OnDemandFeatureView. This allows for much richer lineage when constructing metadata about an OnDemandFeatureViews and its data sources.
OnDemandFeatureView to optionally support entities, which will be required to write an OnDemandFeatureView to the online store
BaseFeatureView to include sources to optionally include bath_source
OnDemandFeatureView to include write_to_online_store, a boolean to be used in a follow-up PR (not used in this PR)

Which issue(s) this PR fixes:

This is a step along the path of solving #4376

Misc

N/A

This is admittedly a hack and I don't like it but this should be refactored in 1.0.0

My overall learning from this is that FeatureViews should be really derived from the same object with the same class parameters and make them optionally instantiated.

This should be described more thoroughly for 1.0.0

tokoko

A few questions from me:

I make the assumption that a single BaseFeatureView can only have one datasource, which is a reasonable assumption.

I'm not sure this will hold true actually... BatchFeatureViews or whatever we will call them will need to rely on multiple sources for example.

FeatureViewProjection to include the underlying data sources for OnDemandFeatureView. This allows for much richer lineage when constructing metadata about an OnDemandFeatureViews and its data sources.

Can you elaborate on why we need this? My understanding is that FeatureViewProjection objects hold additional info (request-specific modifications) about a specific FeatureView, what's the point of copying some of those field over here when they can already be accessed from FeatureView objects anyway. Feels like unnecessary duplication to me.

OnDemandFeatureView to optionally support entities, which will be required to write an OnDemandFeatureView to the online store

I sort of understand the rationale behind this, but wouldn't it better for us to try to keep OnDemandFeatureViews entity-agnostic? I guess I have concerns similar to FeatureViewProjection ones above. odfv already has dependency on feature views and their entities as a result. Relisting them here feels redundant and would break a single of version of truth. What is a user applies an odfv whose entities do not match the entities of relevant feature views?

My idea about a odfv caching was not to make them retrievable by any set of entities, but rather by some sort of key that would hold information about all the input fields.

franciscojavierarceo

ODFVs today can behave the old way by not using the optional forthcoming optional parameter that configures writes. The benefit of storing the data here for batch will be the backfilling and seeing the full lineage of the data sources that's used to compute the ODFV.

In a simple way:

ODFV = f(Data source 1, data source 2, feature view 3)

In practice, combinations like this may occur and we get this flexibility and lineage. Having the write option basically acts as a precomputation option for faster retrieval.

franciscojavierarceo

I sort of understand the rationale behind this, but wouldn't it better for us to try to keep OnDemandFeatureViews entity-agnostic? I guess I have concerns similar to FeatureViewProjection ones above. odfv already has dependency on feature views and their entities as a result.

In order to write an ODFV that can be stored in the online store for retrieval, you need an entity.

This is also important for backfilling.

Relisting them here feels redundant and would break a single of version of truth. What if a user applies an odfv whose entities do not match the entities of relevant feature views?

Not redundant as things can change at different points in time.

HaoXuAI

Instead of directly updating the BaseFeatureView with DataSource, is it possible to change OndemandFeatureView in inherit from FeatureView which has the source attribute already?
I'm not saying this is a better idea since all types of concrete feature views have the source attribute. Just in case updating the BaseFeatureView might cause breaking changes

franciscojavierarceo

It won't be a breaking change as it's optional. I do think we should revisit all of these for free 1.0.0 release and make StreamFeatureView, FeatureView, and OnDemandFeatureView all inherit from BaseFeatureView with a fixed set of parameters and make things much cleaner. They have lots of duplicate code all over the place and it's very unnecessary...but I'd rather do that full cleanup once we have this functionality done. Wdyt?

HaoXuAI

LGTM!

tokoko

Sorry, couldn't follow up sooner.

Not redundant as things can change at different points in time.

I think that's precisely my point. we are allowing essentially same concepts to be defined twice in unrelated places that can easily diverge from one another.

For example, after this PR what's the difference between entities field of an odfvs vs retrieving the list of entities from underlying feature views:

odfv = store.get_on_demand_feature_view('transformed_conv_rate') entities_1 = odfv.entities entities_2 = [store.registry.get_any_feature_view(fvp.name).entities for fvp in odfv.source_feature_view_projections]

Is there any scenario where it makes sense for entities_1 and entities_2 not to be exactly the same? If not, why do we need them defined twice?

P.S. the same applied to the FeatureViewProjection. Those fields can easily be retrieved from the underlying FeatureView imho.

franciscojavierarceo commented Sep 18, 2024

View reviewed changes

franciscojavierarceo added the ok-to-test label Sep 18, 2024

franciscojavierarceo changed the title feat: Updating protos for Projections to include more info feat: Updating OnDemandFeatureView to add Entities and batch_source Sep 18, 2024

franciscojavierarceo mentioned this pull request Sep 21, 2024

chore: Adding unit test to test feature view dummy entity serialization after apply() #4553

Merged

franciscojavierarceo changed the title feat: Updating OnDemandFeatureView to add Entities and batch_source feat: Updating FeatureViewProjection and OnDemandFeatureView to add batch_source and entities Sep 21, 2024

franciscojavierarceo added 16 commits September 21, 2024 11:31

feat: Updating protos for Projections to include more info …

b791284

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

adding unit test …

54ca376

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

adding type checking where batch source is already serialized into pr… …

cf613bd

…otobuf Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

almost got everything working and type validation behaving …

44066d2

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

cleaned up and have tests behaving …

2837c1c

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

removed comment …

52d4253

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

updated FeatureViewProjection batch_source serialization …

34f99b9

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

trying to debug a test …

fcf2917

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

handling snowflake issue, cant confirm why it is happening so just go… …

3c7812b

…ing to put a workaround Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

linter …

6c9a21f

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

trying to handle it correctly …

1103af7

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

handling the else case for from_feature_view_definition …

987c690

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

adding print …

c5a7ea3

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

adding test of issue …

0d489a9

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

think i got everything working now …

a85854e

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

removing print …

cf86862

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

franciscojavierarceo force-pushed the fv-projections branch from f50923f to cf86862 Compare September 21, 2024 17:56

franciscojavierarceo marked this pull request as ready for review September 21, 2024 18:23

franciscojavierarceo requested review from a team, HaoXuAI, adchia, felixwang9817, shuchu and tokoko and removed request for a team September 21, 2024 18:23

HaoXuAI approved these changes Sep 23, 2024

View reviewed changes

HaoXuAI merged commit 0795496 into master Sep 23, 2024

nanohanno mentioned this pull request Dec 17, 2024

TypeError regression for feast plan with an existing registry #4816

Closed

+                              name_alias=None,
+                              features=base_feature_view.features,
+                              desired_features=[],
+                              timestamp_field=base_feature_view.batch_source.created_timestamp_column  # type:ignore[attr-defined]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Updating FeatureViewProjection and OnDemandFeatureView to add batch_source and entities#4530