Sorry, something went wrong.
| name_alias=None, | ||
| features=base_feature_view.features, | ||
| desired_features=[], | ||
| timestamp_field=base_feature_view.batch_source.created_timestamp_column # type:ignore[attr-defined] |
There was a problem hiding this comment.
This is admittedly a hack and I don't like it but this should be refactored in 1.0.0
My overall learning from this is that FeatureViews should be really derived from the same object with the same class parameters and make them optionally instantiated.
This should be described more thoroughly for 1.0.0
Sorry, something went wrong.
|
A few questions from me: I make the assumption that a single BaseFeatureView can only have one datasource, which is a reasonable assumption. I'm not sure this will hold true actually... BatchFeatureViews or whatever we will call them will need to rely on multiple sources for example. FeatureViewProjection to include the underlying data sources for OnDemandFeatureView. This allows for much richer lineage when constructing metadata about an OnDemandFeatureViews and its data sources. Can you elaborate on why we need this? My understanding is that FeatureViewProjection objects hold additional info (request-specific modifications) about a specific FeatureView, what's the point of copying some of those field over here when they can already be accessed from FeatureView objects anyway. Feels like unnecessary duplication to me. OnDemandFeatureView to optionally support entities, which will be required to write an OnDemandFeatureView to the online store I sort of understand the rationale behind this, but wouldn't it better for us to try to keep OnDemandFeatureViews entity-agnostic? I guess I have concerns similar to FeatureViewProjection ones above. odfv already has dependency on feature views and their entities as a result. Relisting them here feels redundant and would break a single of version of truth. What is a user applies an odfv whose entities do not match the entities of relevant feature views? My idea about a odfv caching was not to make them retrievable by any set of entities, but rather by some sort of key that would hold information about all the input fields. |
Sorry, something went wrong.
|
ODFVs today can behave the old way by not using the optional forthcoming optional parameter that configures writes. The benefit of storing the data here for batch will be the backfilling and seeing the full lineage of the data sources that's used to compute the ODFV. In a simple way: ODFV = f(Data source 1, data source 2, feature view 3) In practice, combinations like this may occur and we get this flexibility and lineage. Having the write option basically acts as a precomputation option for faster retrieval. |
Sorry, something went wrong.
|
I sort of understand the rationale behind this, but wouldn't it better for us to try to keep OnDemandFeatureViews entity-agnostic? I guess I have concerns similar to FeatureViewProjection ones above. odfv already has dependency on feature views and their entities as a result. In order to write an ODFV that can be stored in the online store for retrieval, you need an entity. This is also important for backfilling. Relisting them here feels redundant and would break a single of version of truth. What if a user applies an odfv whose entities do not match the entities of relevant feature views? Not redundant as things can change at different points in time. |
Sorry, something went wrong.
|
Instead of directly updating the BaseFeatureView with DataSource, is it possible to change OndemandFeatureView in inherit from FeatureView which has the source attribute already? |
Sorry, something went wrong.
|
It won't be a breaking change as it's optional. I do think we should revisit all of these for free 1.0.0 release and make StreamFeatureView, FeatureView, and OnDemandFeatureView all inherit from BaseFeatureView with a fixed set of parameters and make things much cleaner. They have lots of duplicate code all over the place and it's very unnecessary...but I'd rather do that full cleanup once we have this functionality done. Wdyt? |
Sorry, something went wrong.
There was a problem hiding this comment.
LGTM!
Sorry, something went wrong.
|
Sorry, couldn't follow up sooner. Not redundant as things can change at different points in time. I think that's precisely my point. we are allowing essentially same concepts to be defined twice in unrelated places that can easily diverge from one another. For example, after this PR what's the difference between entities field of an odfvs vs retrieving the list of entities from underlying feature views: odfv = store.get_on_demand_feature_view('transformed_conv_rate')
entities_1 = odfv.entities
entities_2 = [store.registry.get_any_feature_view(fvp.name).entities for fvp in odfv.source_feature_view_projections]
Is there any scenario where it makes sense for entities_1 and entities_2 not to be exactly the same? If not, why do we need them defined twice? P.S. the same applied to the FeatureViewProjection. Those fields can easily be retrieved from the underlying FeatureView imho. |
Sorry, something went wrong.
What this PR does / why we need it:
This PR addresses some gaps in the FeatureViewProjection, BaseFeatureView, and OnDemandFeatureview classes. In particular, this PR aims to allow the data source to be included in the BaseFeatureView and all data sources in a FeatureViewProjection. I make the assumption that a single BaseFeatureView can only have one datasource, which is a reasonable assumption.
This structure allows users to express an explicit dependency graph with the BaseFeatureView being ultimately tied to one single, foundational data source which I believe is the right pattern.
Along with storing the datasources, we also store relevant batch_source data in the Projection.
More specificially, this PR update the following:
Which issue(s) this PR fixes:
This is a step along the path of solving #4376
Misc
N/A