Sorry, something went wrong.
| destination_path = storage.file_options.uri | ||
| if not destination_path.startswith(("s3://", "gs://", "hdfs://")): | ||
| if not destination_path.startswith( | ||
| ("s3://", "gs://", "hdfs://", "abfs://", "abfss://") |
There was a problem hiding this comment.
Can we define a constant for supported remote storage URI schemes at top and use it later at all three locations?
Sorry, something went wrong.
|
Also, let's update doc https://github.com/feast-dev/feast/blob/master/docs/reference/offline-stores/ray.md?plain=1#L30 |
Sorry, something went wrong.
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🔴 1 issue in files not directly in the diffThe PR adds Azure storage support (abfs://, abfss://) to the Ray offline store by introducing a REMOTE_STORAGE_SCHEMES constant, but fails to update the RayDAGRetrievalJob.persist() method in job.py which still uses a hardcoded tuple ("s3://", "gs://", "hdfs://").
Click to expandWhen users try to persist datasets to Azure storage using the Ray compute engine (via RayDAGRetrievalJob), the code will incorrectly:
This happens because Azure paths like abfss://container@account.dfs.core.windows.net/path are not recognized as remote storage.
ray.py:70 defines:
But job.py:208 still uses:
Recommendation: Import and use REMOTE_STORAGE_SCHEMES from feast.infra.offline_stores.contrib.ray_offline_store.ray or define a shared constant in a common module that both files can import.
View issue and 6 additional flags in Devin Review.
Sorry, something went wrong.
Ray compute engine job.py not updated to support Azure storage schemes (sdk/python/feast/infra/compute_engines/ray/job.py:208)The PR adds Azure storage support (abfs://, abfss://) to the Ray offline store by introducing a REMOTE_STORAGE_SCHEMES constant, but fails to update the RayDAGRetrievalJob.persist() method in job.py which still uses a hardcoded tuple ("s3://", "gs://", "hdfs://"). @jbauer12 I think this comment make sense, we need to update job.py as well |
Sorry, something went wrong.
There was a problem hiding this comment.
Devin Review found 2 new potential issues.
🔴 1 issue in files not directly in the diffThe PR adds Azure storage support (abfs://, abfss://) to the Ray offline store by introducing a REMOTE_STORAGE_SCHEMES constant, but fails to update the RayDAGRetrievalJob.persist() method in job.py which still uses a hardcoded tuple ("s3://", "gs://", "hdfs://").
Click to expandWhen users try to persist datasets to Azure storage using the Ray compute engine (via RayDAGRetrievalJob), the code will incorrectly:
This happens because Azure paths like abfss://container@account.dfs.core.windows.net/path are not recognized as remote storage.
ray.py:70 defines:
But job.py:208 still uses:
Recommendation: Import and use REMOTE_STORAGE_SCHEMES from feast.infra.offline_stores.contrib.ray_offline_store.ray or define a shared constant in a common module that both files can import.
View issues and 8 additional flags in Devin Review.
Sorry, something went wrong.
What this PR does / why we need it:
Adjustments to support ADLS Gen2 Azure Storage in Ray Offline Store.
Which issue(s) this PR fixes:
Fixes##5844