Sorry, something went wrong.
There was a problem hiding this comment.
Adds support for on-demand feature sources, enforces cycle detection in FeatureView (de)serialization, standardizes the topological sort API, and refreshes related documentation.
Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file| sdk/python/feast/on_demand_feature_view.py | Add OnDemandSourceType alias and clean up sources hints |
| sdk/python/feast/infra/compute_engines/feature_resolver.py | Rename resolver method from topo_sort to topological_sort |
| sdk/python/feast/infra/compute_engines/feature_builder.py | Update builder calls to topological_sort |
| sdk/python/feast/infra/compute_engines/algorithms/topo.py | Rename functions topo_sort[_multiple] to topological_sort[_multiple] |
| sdk/python/feast/infra/compute_engines/dag/README.md | Add high-level DAG documentation |
| sdk/python/feast/feature_view.py | Implement cycle detection in (de)serialization, update copy/eq |
| docs/reference/compute-engine/README.md | Refresh compute-engine table and links |
| docs/getting-started/concepts/batch-feature-view.md | New guide for BatchFeatureView |
docs/reference/compute-engine/README.md:22
Sorry, something went wrong.
| for feature in on_demand_feature_view_proto.spec.features | ||
| ], | ||
| sources=sources, | ||
| sources=cast(List[OnDemandSourceType], sources), |
There was a problem hiding this comment.
Somehow this was breaking mypy, so put a fix on this.
Sorry, something went wrong.
| ## ✅ Key Capabilities | ||
|
|
||
| - **Composable DAG of FeatureViews**: Supports defining a `BatchFeatureView` on top of one or more other `FeatureView`s. | ||
| - **Transformations**: Apply PySpark-based transformation logic (`feature_transformation` or `udf`) to raw data source, can also be used to deal with multiple data sources. |
There was a problem hiding this comment.
is this exclusively limited to PySpark?
Sorry, something went wrong.
There was a problem hiding this comment.
not exclusively to Pyspark, will update it
Sorry, something went wrong.
| *, | ||
| name: str, | ||
| source: Union[DataSource, FeatureView, List[FeatureView]], | ||
| sink_source: Optional[DataSource] = None, |
There was a problem hiding this comment.
for some reason i thought we agreed on naming it sink.
Sorry, something went wrong.
There was a problem hiding this comment.
Yeah, on a second thought, I think sink_source is more explicit for the user to know it is passing a data source to this config.
Sorry, something went wrong.
| Field(name="conv_rate", dtype=Float32), | ||
| ], | ||
| aggregations=[ | ||
| Aggregation(column="conv_rate", function="sum", time_window=timedelta(days=1)), |
There was a problem hiding this comment.
nit: maybe we should make it avg since summing a conversion rate is weird
Sorry, something went wrong.
| ## Feature resolver and builder | ||
| The `FeatureBuilder` initialize a `FeatureResolver` that extracts a DAG from the `FeatureView` definitions, resolving dependencies and ensuring correct execution order. \ | ||
| The FeatureView represents a logical data source, while DataSource represents the physical data source (e.g., BigQuery, Spark, etc.). \ | ||
| When defines the FeatureView, the source can be a physical DataSource, a derived FeatureView, or a list of FeatureViews. |
There was a problem hiding this comment.
missing a word here. ?
Sorry, something went wrong.
There was a problem hiding this comment.
lgtm, had some small suggestions that would be great to incorporate though. thanks for the docs!!!
Sorry, something went wrong.
What this PR does / why we need it:
Follow up on PR: #5482.
Which issue(s) this PR fixes:
Misc