There was a problem hiding this comment.
Looking good initially, have some doubts.
Also needs to add tests.
Sorry, something went wrong.
| return pa.Table.from_pandas(df).schema | ||
|
|
||
|
|
||
| def _compute_non_entity_dates_ray( |
There was a problem hiding this comment.
I think we should have make a common utility function for this, so that it can be used in all stores without repeating the code.
wdyt ?
Sorry, something went wrong.
| return _filter_range | ||
|
|
||
|
|
||
| def _make_select_distinct_keys(join_keys: List[str]): |
There was a problem hiding this comment.
I think we should not drop rows with duplicate IDs, because there could be multiple transactions per ID and we need to choose the row based on timestamp while joining the colums from another table/view. I think this is the same case with your spark PR.
Please check the postgres implementation to understand the case.
Or Am I misreading this ?
Sorry, something went wrong.
There was a problem hiding this comment.
Testing the case after discussion
Sorry, something went wrong.
There was a problem hiding this comment.
Previously, when entity_df=None was passed to get_historical_features(), the Ray offline store would extract only distinct entity keys and assign a single fixed timestamp (end_date) to all entities. This broke point-in-time joins for cases where multiple transactions exist per entity ID in date-time range.
Now extracts distinct (entity_keys, event_timestamp) combinations, aligning with Postgres based offline store's behaviour.
Sorry, something went wrong.
What this PR does / why we need it:
Add support for entity_df=None in RayOfflineStore.get_historical_features with start_date/end_date.
-- Derives entity set by reading distinct join keys from each FeatureView source within the time window, applies field mappings and join_key_map, filters by timestamp, and unions aligned schemas.
-- Adds stable event_timestamp = end_date for PIT joins.
Signature change: get_historical_features accepts entity_df: Optional[Union[pd.DataFrame, str]] and **kwargs.
-- Why: Match base interface and support date-only retrieval.
Which issue(s) this PR fixes:
RHOAIENG-38643
Misc