Sorry, something went wrong.
There was a problem hiding this comment.
This looks like a useful guard for the S3A event log edge case, and the focused tests help. One follow-up worth considering is whether some Feast users rely on credentials or endpoint details only through Spark/Hadoop config rather than environment variables. If so, a short note or test around that path could prevent surprises when the pre-create step runs before Spark fully applies the config.
Sorry, something went wrong.
| "spark.hadoop.fs.s3a.endpoint", | ||
| os.environ.get("FEAST_S3A_ENDPOINT", ""), | ||
| ) | ||
| access_key = os.environ.get("AWS_ACCESS_KEY_ID", "") |
There was a problem hiding this comment.
Sorry, something went wrong.
|
@abhijeet-dhumal Let's handle both comment from devin and @R-behera suggestion |
Sorry, something went wrong.
|
@abhijeet-dhumal Let's handle both comment from devin and @R-behera suggestion @ntkathole Addressed both your comments ✅ |
Sorry, something went wrong.
|
This looks like a useful guard for the S3A event log edge case, and the focused tests help. One follow-up worth considering is whether some Feast users rely on credentials or endpoint details only through Spark/Hadoop config rather than environment variables. If so, a short note or test around that path could prevent surprises when the pre-create step runs before Spark fully applies the config. @R-behera Good catch on the Spark/Hadoop config credentials path ✅ |
Sorry, something went wrong.
|
|
||
| endpoint = spark_config.get( | ||
| "spark.hadoop.fs.s3a.endpoint", | ||
| os.environ.get("FEAST_S3A_ENDPOINT", ""), |
There was a problem hiding this comment.
Wondering if this can be AWS_ENDPOINT_URL instead or atleast we need to document this new env var in our docs ?
Sorry, something went wrong.
There was a problem hiding this comment.
Good call — switched to AWS_ENDPOINT_URL . No custom env vars to document now. Spark config (spark.hadoop.fs.s3a.endpoint) still takes precedence when set.
Sorry, something went wrong.
| aws_access_key_id=access_key or None, | ||
| aws_secret_access_key=secret_key or None, | ||
| aws_session_token=session_token, | ||
| config=BotoConfig(signature_version="s3v4"), |
There was a problem hiding this comment.
Also, consider supporting minio or other path style
Sorry, something went wrong.
There was a problem hiding this comment.
Added .. - _ensure_s3a_event_log_dir now reads spark.hadoop.fs.s3a.path.style.access and passes addressing_style: "path" to BotoConfig when it's "true", otherwise defaults to "auto". Tests cover both paths
Sorry, something went wrong.
What this PR does / why we need it:
When spark.eventLog.enabled: "true" and spark.eventLog.dir points to an S3A path, feast materialize-incremental silently writes nothing to the online store and exits with code 0.
The failure chain:
S3 has no real directories. An empty prefix is indistinguishable from "does not exist", so Spark's pre-flight check always fails on a fresh bucket.
Which issue(s) this PR fixes:
In get_or_create_new_spark_session() (compute_engines/spark/utils.py), before building the SparkSession, call _ensure_s3a_event_log_dir() which:
No-ops for non-S3A paths (hdfs://, file://, etc.) and when event logging is disabled.
Checks
Testing Strategy
Misc