If you haven't already, check out the quickstart guide on Feast's website (http://docs.feast.dev/quickstart), which
uses this repo. A quick view of what's in this repository's feature_repo/ directory:
- data/ contains raw demo parquet data
- feature_repo/example_repo.py contains demo feature definitions
- feature_repo/feature_store.yaml contains a demo setup configuring where data sources are
- feature_repo/test_workflow.py showcases how to run all key Feast commands, including defining, retrieving, and pushing features.
You can run the overall workflow with python test_workflow.py.
To move from this into a more production ready workflow:
See more details in Running Feast in production
- First: you should start with a different Feast template, which delegates to a more scalable offline store.
- For example, running feast init -t gcp
or feast init -t aws or feast init -t snowflake.
- You can see your options if you run feast init --help.
- feature_store.yaml points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a
SQL registry. See registry docs for more details.
- This example uses a file offline store
to generate training data. It does not scale. We recommend instead using a data warehouse such as BigQuery,
Snowflake, Redshift. There is experimental support for Spark as well.
- Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See docs.
- (optional) Regularly scheduled materialization to power low latency feature retrieval (e.g. via Airflow). See Batch data ingestion
for more details.
- (optional) Deploy feature server instances with feast serve to expose endpoints to retrieve online features.