We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contactedHave a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
| @@ -0,0 +1,19 @@ | ||
| # Project | ||
|
|
||
| Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment (`dev`, `staging`, `prod`). | ||
|
|
||
| .png>) | ||
|
|
||
| Users define one or more [feature views](feature-view.md) within a project. Each feature view contains one or more [features](feature-view.md#field). These features typically relate to one or more [entities](entity.md). A feature view must always have a [data source](data-ingestion.md), which in turn is used during the generation of training [datasets](feature-retrieval.md#dataset) and when materializing feature values into the online store. | ||
|
|
||
| The concept of a "project" provide the following benefits: | ||
|
|
||
| **Logical Grouping**: Projects group related features together, making it easier to manage and track them. | ||
|
|
||
| **Feature Definitions**: Within a project, you can define features, including their metadata, types, and sources. This helps standardize how features are created and consumed. | ||
|
|
||
| **Isolation**: Projects provide a way to isolate different environments, such as development, testing, and production, ensuring that changes in one project do not affect others. | ||
|
|
||
| **Collaboration**: By organizing features within projects, teams can collaborate more effectively, with clear boundaries around the features they are responsible for. | ||
|
|
||
| **Access Control**: Projects can implement permissions, allowing different users or teams to access only the features relevant to their work. |
| @@ -1,6 +1,29 @@ | ||
| # Quickstart | ||
|
|
||
| In this tutorial we will | ||
| ## What is Feast? | ||
|
|
||
| Feast (Feature Store) is an open-source feature store designed to facilitate the management and serving of machine learning features in a way that supports both batch and real-time applications. | ||
|
Comment thread
tmihalac marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| * *For Data Scientists*: Feast is a a tool where you can easily define, store, and retrieve your features for both model development and model deployment. By using Feast, you can focus on what you do best: build features that power your AI/ML models and maximize the value of your data. | ||
|
|
||
| * *For MLOps Engineers*: Feast is a library that allows you to connect your existing infrastructure (e.g., online database, application server, microservice, analytical database, and orchestration tooling) that enables your Data Scientists to ship features for their models to production using a friendly SDK without having to be concerned with software engineering challenges that occur from serving real-time production systems. By using Feast, you can focus on maintaining a resilient system, instead of implementing features for Data Scientists. | ||
|
|
||
| * *For Data Engineers*: Feast provides a centralized catalog for storing feature definitions allowing one to maintain a single source of truth for feature data. It provides the abstraction for reading and writing to many different types of offline and online data stores. Using either the provided python SDK or the feature server service, users can write data to the online and/or offline stores and then read that data out again in either low-latency online scenarios for model inference, or in batch scenarios for model training. | ||
|
|
||
| For more info refer to [Introduction to feast](../README.md) | ||
|
|
||
| ## Prerequisites | ||
| * Ensure that you have Python (3.9 or above) installed. | ||
| * It is recommended to create and work in a virtual environment: | ||
| ```sh | ||
| # create & activate a virtual environment | ||
| python -m venv venv/ | ||
| source venv/bin/activate | ||
| ``` | ||
|
|
||
| ## Overview | ||
|
|
||
| In this tutorial we will: | ||
|
|
||
| 1. Deploy a local feature store with a **Parquet file offline store** and **Sqlite online store**. | ||
| 2. Build a training dataset using our time series features from our **Parquet files**. | ||
| Expand All | @@ -9,7 +32,9 @@ In this tutorial we will | |
| 5. Read the latest features from the online store for real-time inference. | ||
| 6. Explore the (experimental) Feast UI | ||
|
|
||
| ## Overview | ||
| ***Note*** - Feast provides a python SDK as well as an optional [hosted service](../reference/feature-servers/python-feature-server.md) for reading and writing feature data to the online and offline data stores. The latter might be useful when non-python languages are required. | ||
|
|
||
| For this tutorial, we will be using the python SDK. | ||
|
|
||
| In this tutorial, we'll use Feast to generate training data and power online model inference for a | ||
| ride-sharing driver satisfaction prediction model. Feast solves several common issues in this flow: | ||
| Expand Down Expand Up | @@ -279,7 +304,7 @@ There's an included `test_workflow.py` file which runs through a full sample wor | |
| 7. Verify online features are updated / fresher | ||
|
|
||
| We'll walk through some snippets of code below and explain | ||
| ### Step 3a: Register feature definitions and deploy your feature store | ||
| ### Step 4: Register feature definitions and deploy your feature store | ||
|
|
||
| The `apply` command scans python files in the current directory for feature view/entity definitions, registers the | ||
| objects, and deploys infrastructure. In this example, it reads `example_repo.py` and sets up SQLite online store tables. Note that we had specified SQLite as the default online store by | ||
| Expand Down Expand Up | @@ -311,7 +336,7 @@ Created sqlite table my_project_driver_hourly_stats | |
| {% endtab %} | ||
| {% endtabs %} | ||
|
|
||
| ### Step 3b: Generating training data or powering batch scoring models | ||
| ### Step 5: Generating training data or powering batch scoring models | ||
|
|
||
| To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels. | ||
|
|
||
| Expand Down Expand Up | @@ -466,7 +491,7 @@ print(training_df.head()) | |
| ``` | ||
| {% endtab %} | ||
| {% endtabs %} | ||
| ### Step 3c: Ingest batch features into your online store | ||
| ### Step 6: Ingest batch features into your online store | ||
|
|
||
| We now serialize the latest values of features since the beginning of time to prepare for serving (note: | ||
| `materialize-incremental` serializes all new features since the last `materialize` call). | ||
| Expand Down Expand Up | @@ -499,7 +524,7 @@ Materializing 2 feature views to 2024-04-19 10:59:58-04:00 into the sqlite onlin | |
| {% endtab %} | ||
| {% endtabs %} | ||
|
|
||
| ### Step 3d: Fetching feature vectors for inference | ||
| ### Step 7: Fetching feature vectors for inference | ||
|
|
||
| At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might | ||
| have existed only in batch sources) from the online feature store using `get_online_features()`. These feature | ||
| Expand Down Expand Up | @@ -544,7 +569,7 @@ pprint(feature_vector) | |
| {% endtab %} | ||
| {% endtabs %} | ||
|
|
||
| ### Step 3e: Using a feature service to fetch online features instead. | ||
| ### Step 8: Using a feature service to fetch online features instead. | ||
|
|
||
| You can also use feature services to manage multiple features, and decouple feature view definitions and the | ||
| features needed by end applications. The feature store can also be used to fetch either online or historical | ||
| Expand Down Expand Up | @@ -594,7 +619,7 @@ pprint(feature_vector) | |
| {% endtab %} | ||
| {% endtabs %} | ||
|
|
||
| ## Step 4: Browse your features with the Web UI (experimental) | ||
| ## Step 9: Browse your features with the Web UI (experimental) | ||
|
|
||
| View all registered features, data sources, entities, and feature services with the Web UI. | ||
|
|
||
| Expand Down Expand Up | @@ -626,7 +651,7 @@ INFO: Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit) | |
|
|
||
|  | ||
|
|
||
| ## Step 5: Re-examine `test_workflow.py` | ||
| ## Step 10: Re-examine `test_workflow.py` | ||
|
Comment thread
Copy link
Copy Markdown
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Choose a reason Spam Abuse Off Topic Outdated Duplicate Resolved Low Quality Hide commentI think the originally numbering had a structure that outlined that some components were optional or sub-steps so we should probably modify these to reflect that. For example, the previous Step 3 had a bunch of sub-steps to reflect that hierarchy.
Sorry, something went wrong.
All reactions
Copy link
Copy Markdown
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Choose a reason Spam Abuse Off Topic Outdated Duplicate Resolved Low Quality Hide commentI know what you are saying but one of the comments (number 8) in #4593 asks for the changes that I did
Sorry, something went wrong.
👍
1
franciscojavierarceo reacted with thumbs up emoji
All reactions
|
||
| Take a look at `test_workflow.py` again. It showcases many sample flows on how to interact with Feast. You'll see these | ||
| show up in the upcoming concepts + architecture + tutorial pages as well. | ||
|
|
||
| Expand Down | ||
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1,45 @@ | ||
| # Byte-compiled / optimized / DLL files | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *.pyo | ||
| *.pyd | ||
|
|
||
| # C extensions | ||
| *.so | ||
|
|
||
| # Distribution / packaging | ||
| .Python | ||
| env/ | ||
| venv/ | ||
| ENV/ | ||
| env.bak/ | ||
| venv.bak/ | ||
| *.egg-info/ | ||
| dist/ | ||
| build/ | ||
| .venv | ||
|
|
||
| # Pytest | ||
| .cache | ||
| *.cover | ||
| *.log | ||
| .coverage | ||
| nosetests.xml | ||
| coverage.xml | ||
| *.hypothesis/ | ||
| *.pytest_cache/ | ||
|
|
||
| # Jupyter Notebook | ||
| .ipynb_checkpoints | ||
|
|
||
| # IDEs and Editors | ||
| .vscode/ | ||
| .idea/ | ||
| *.swp | ||
| *.swo | ||
| *.sublime-workspace | ||
| *.sublime-project | ||
|
|
||
| # OS generated files | ||
| .DS_Store | ||
| Thumbs.db |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
| @@ -0,0 +1 @@ | ||
| ../local/.gitignore |
Uh oh!
There was an error while loading. Please reload this page.