← 返回首页
fix: Redundant feature materialization and premature incremental materialization timestamp updates by james-crabtree-sp · Pull Request #3789 · feast-dev/feast · GitHub
Skip to content

Navigation Menu

Toggle navigation
Sign in
Appearance settings
Search or jump to...

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Resetting focus

fix: Redundant feature materialization and premature incremental materialization timestamp updates#3789

Merged
achals merged 10 commits into
feast-dev:masterfrom
sailpoint-core:master
Oct 25, 2023
Merged

fix: Redundant feature materialization and premature incremental materialization timestamp updates#3789
achals merged 10 commits into
feast-dev:masterfrom
sailpoint-core:master

Conversation

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

This PR fixes 2 issues encountered while attempting to run very large materializations (Several billion rows as part of an initial run of online materialization).

  1. Every pod created materializes every file in the dataset. This is redundant and costs a lot of time and money. To fix this, I updated the dataflow code to select only the single file corresponding to the running pod's JOB_COMPLETION_INDEX as input.
  2. The last materialized date used for incremental materialization was being updated before materialization had even run. This meant that if materialization had failed for some reason, many updates could be ignored, because the next run would continue from beyond the end date of the failed run. I solved this by implementing a synchronous mode for bytewax materialization that will wait for materialization of a feature to complete successfully before returning and updating the last materialized date
  3. Bytewax jobs can run infinitely if node crashes occur often enough that the job has to continuously reprocess the same pods after they are lost. I mitigated this by adding a configurable activeDeadlineSeconds and by adding an option to synchronous materialization to allow these jobs to be split into smaller batches so fewer pods have to be rerun in the event of a node failure

Which issue(s) this PR fixes:
Fixes #3786
Fixes #3787
Fixes #3788

james-crabtree-sp changed the title fix: redundant feature materialization and premature incremental materialization timestamp updates fix: Redundant feature materialization and premature incremental materialization timestamp updates Oct 9, 2023

Copy link
Copy Markdown
Collaborator

great, I have been working on the similar PR, but you did it first and way better 💯

Copy link
Copy Markdown
Collaborator

@james-crabtree-sp Thanks again for putting this PR together. I think in order to merge, you'll need to sign your commits: https://docs.feast.dev/project/development-guide#signing-off-commits

Copy link
Copy Markdown
Contributor Author

@james-crabtree-sp Thanks again for putting this PR together. I think in order to merge, you'll need to sign your commits: https://docs.feast.dev/project/development-guide#signing-off-commits

I think they are all signed off already. I see that the DCO task that checks this completed successfully https://github.com/feast-dev/feast/pull/3789/checks?check_run_id=17541354092

Copy link
Copy Markdown
Collaborator

@james-crabtree-sp Thanks again for putting this PR together. I think in order to merge, you'll need to sign your commits: https://docs.feast.dev/project/development-guide#signing-off-commits

I think they are all signed off already. I see that the DCO task that checks this completed successfully https://github.com/feast-dev/feast/pull/3789/checks?check_run_id=17541354092

Sorry for the confusion, I was referring to the requirements to merge, and linked to the wrong documentation.

This is the requirement that I was referring to:

Merging is blocked
The base branch requires all commits to be signed. Learn more about signing commits.

Copy link
Copy Markdown
Collaborator

hey @james-crabtree-sp this is a great PR, you need any hands to make it merged? I see there's a lot of commits in the past unverified

if you don't mind I can create another PR to mimic this one

james-crabtree-sp force-pushed the master branch 2 times, most recently from 0631425 to baf108c Compare October 23, 2023 16:37
james-crabtree-sp and others added 8 commits October 23, 2023 11:39
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
) * SAASMLOPS-809 fix bytewax workers so they only process a single file * SAASMLOPS-809 fix newlines Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
* SAASMLOPS-833 add configurable job timeout * SAASMLOPS-833 fix whitespace Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

Copy link
Copy Markdown
Contributor Author

@whoahbot @sudohainguyen Ahh, I understand now. Commits are now verified

Copy link
Copy Markdown
Collaborator

@whoahbot @sudohainguyen Ahh, I understand now. Commits are now verified

exactly 🤣 looks good now but you need to run code format though

Copy link
Copy Markdown
Contributor Author

@whoahbot @sudohainguyen Ahh, I understand now. Commits are now verified

exactly 🤣 looks good now but you need to run code format though

Noticed that too. Weird rebase issue. Should be fixed now

…error Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

Copy link
Copy Markdown
Collaborator

@james-crabtree-sp I've been adopting your changes on my side

so far do we need to wait the job to be done, like forever without timeout?

Copy link
Copy Markdown
Contributor Author

@james-crabtree-sp I've been adopting your changes on my side

so far do we need to wait the job to be done, like forever without timeout?

Well, this PR adds the configurable active_deadline_seconds setting, which can be used to impose a timeout on the kubernetes jobs. Does that answer your question?

Copy link
Copy Markdown
Collaborator

@james-crabtree-sp I've been adopting your changes on my side
so far do we need to wait the job to be done, like forever without timeout?

Well, this PR adds the configurable active_deadline_seconds setting, which can be used to impose a timeout on the kubernetes jobs. Does that answer your question?

cool clear

achals merged commit 417b16b into feast-dev:master Oct 25, 2023
woop pushed a commit that referenced this pull request Jan 13, 2024
# [0.35.0](v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([#3812](#3812)) ([9583ed6](9583ed6)) * Adopt connection pooling for HBase ([#3793](#3793)) ([b3852bf](b3852bf)) * Bytewax engine create configmap from object ([#3821](#3821)) ([25e9775](25e9775)) * Fix warnings from deprecated paths and update default log level ([#3757](#3757)) ([68a8737](68a8737)) * improve parsing bytewax job status ([5983f40](5983f40)) * make bytewax settings unexposed ([ae1bb8b](ae1bb8b)) * Make generated temp table name escaped ([#3797](#3797)) ([175d796](175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([#3789](#3789)) ([417b16b](417b16b)), closes [#6](#6) [#7](#7) * Resolve hbase hotspot issue when materializing ([#3790](#3790)) ([7376db8](7376db8)) * Set keepalives_idle None by default ([#3756](#3756)) ([8717e9b](8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](2151c39)) * UI project cannot handle fallback routes ([#3766](#3766)) ([96ece0f](96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](5dc0b24)) * Update jackson and remove unnecessary logging ([#3809](#3809)) ([018d0ea](018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](052182b)) ### Features * Add get online feature rpc to gprc server ([#3815](#3815)) ([01db8cc](01db8cc)) * Add materialize and materialize-incremental rest endpoints ([#3761](#3761)) ([fa600fe](fa600fe)), closes [#3760](#3760) * add redis sentinel support ([3387a15](3387a15)) * add redis sentinel support ([4337c89](4337c89)) * add redis sentinel support format lint ([aad8718](aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([#3762](#3762)) ([6a728fe](6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([#3754](#3754)) ([2192e65](2192e65)) * Apply cache to load proto registry for performance ([#3702](#3702)) ([709c709](709c709)) * Make bytewax job write as mini-batches ([#3777](#3777)) ([9b0e5ce](9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](9cf9d96)) * Support GCS filesystem for bytewax engine ([#3774](#3774)) ([fb6b807](fb6b807))
tokoko pushed a commit to tokoko/feast that referenced this pull request Feb 6, 2024
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6)) * Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf)) * Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775)) * Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737)) * improve parsing bytewax job status ([5983f40](feast-dev@5983f40)) * make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b)) * Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7) * Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8)) * Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39)) * UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24)) * Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b)) ### Features * Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc)) * Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760) * add redis sentinel support ([3387a15](feast-dev@3387a15)) * add redis sentinel support ([4337c89](feast-dev@4337c89)) * add redis sentinel support format lint ([aad8718](feast-dev@aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65)) * Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709)) * Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96)) * Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807)) Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
…rialization timestamp updates (feast-dev#3789) * SAASMLOPS-767 wait for jobs to complete Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * SAASMLOPS-805 Stopgap change to fix duplicate materialization of data Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * SAASMLOPS-805 save BYTEWAX_REPLICAS=1 Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * SAASMLOPS-809 fix bytewax workers so they only process a single file (feast-dev#6) * SAASMLOPS-809 fix bytewax workers so they only process a single file * SAASMLOPS-809 fix newlines Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * SAASMLOPS-833 add configurable job timeout (feast-dev#7) * SAASMLOPS-833 add configurable job timeout * SAASMLOPS-833 fix whitespace Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * develop Run large materializations in batches of pods Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * master Set job_batch_size at least equal to max_parallelism Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * master clarity max_parallelism description Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * master resolve bug that causes materialization to continue after job error Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> * master resolve bug causing pod logs to not be printed Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> --------- Signed-off-by: James Crabtree <james.crabtree@sailpoint.com> Signed-off-by: Attila Toth <hello@attilatoth.dev>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6)) * Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf)) * Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775)) * Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737)) * improve parsing bytewax job status ([5983f40](feast-dev@5983f40)) * make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b)) * Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7) * Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8)) * Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39)) * UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24)) * Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b)) ### Features * Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc)) * Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760) * add redis sentinel support ([3387a15](feast-dev@3387a15)) * add redis sentinel support ([4337c89](feast-dev@4337c89)) * add redis sentinel support format lint ([aad8718](feast-dev@aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65)) * Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709)) * Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96)) * Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807)) Signed-off-by: Attila Toth <hello@attilatoth.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

4 participants

Footer

© 2026 GitHub, Inc.