Sorry, something went wrong.
There was a problem hiding this comment.
Thanks for the contribution, @singhhimanshu0811! This directly addresses the gap we discussed around single-DC limitations in the Cassandra online store config. Overall the approach is solid — using per-DC execution profiles is the right driver-level mechanism. I have several items ranging from bugs to design suggestions:
This is a mutable class attribute shared across all instances. If multiple CassandraOnlineStore instances are created (e.g., in tests or multi-project setups), they'll all append to the same list. This same pattern issue exists for _prepared_statements in the original code, but introducing a new one should be avoided.
Fix: Initialize in _get_session or use a property pattern:
The docstring says:
Mutually exclusive with hosts and secure_bundle_path.
But there's no actual validation enforcing this. If a user sets both datacenters and hosts, Branch B runs (because it checks datacenters first) and silently ignores hosts. This will confuse users.
Fix: Add a Pydantic model_validator or an explicit check at the top of _get_session:
The PR creates named execution profiles and exposes them via _dc_execution_profiles, but none of the existing Feast operations (online_read, online_write_batch, update) pass execution_profile= to the driver. All queries will use EXEC_PROFILE_DEFAULT — effectively still single-DC behavior.
This means:
I understand this may be intended as a foundational PR with follow-up work to add target_dc parameters to the Feast API. If so, please mention this explicitly in the PR description and/or open a tracking issue for the follow-up. Otherwise, consider adding at minimum an optional execution_profile pass-through in _read_rows_by_entity_keys and _write_rows_concurrently.
These fields are documented as "informational" and Feast doesn't use them. This raises questions:
At minimum, add a log warning if these are set:
In the config structure, local_dc appears in two places with different semantics:
Consider renaming the per-DC field to just name or dc_name for clarity:
The field is called local_dc because that's what the driver's DCAwareRoundRobinPolicy parameter is named, but at the config level it reads as "this datacenter's local_dc" which is semantically odd.
What happens if datacenters: [] (empty list)? The truthiness check if online_store_config.datacenters: will be False, so it falls through to Branch A. But Branch A will then fail with E_CASSANDRA_NOT_CONFIGURED because hosts is also None. The error message will be misleading.
Fix: Validate that datacenters contains at least one entry if set:
There are no unit tests for the new code path. Please add tests covering:
After Branch B was inserted, the existing code still has:
This if not self._session: is now always True when reached (because we already returned early at line 294 if self._session: return). It's harmless but misleading — the original code had it as the single branch, now it's dead logic. Consider removing the if wrapper for clarity (keeping the body).
| Mutable class-level list | Bug | Must fix |
| Missing mutual exclusivity check | Bug | Must fix |
| Profiles unused by Feast ops | Design gap | Document plan or implement |
| Informational-only fields | Design | Consider removing or add warning |
| local_dc naming confusion | Suggestion | Consider rename |
| Empty list edge case | Suggestion | Add validation |
| No tests | Required | Add unit tests |
| Redundant if not self._session | Minor | Cleanup |
The core idea is good and aligns with a real production need. Looking forward to the next iteration!
Sorry, something went wrong.
|
Hey @ntkathole thanks for replying back. I'll revert back with updated pr in 1-2 days, meanwhile could you review these once |
Sorry, something went wrong.
|
@ntkathole for solving issue 3, basically incorporating profiles in read and write, i am thinking something like this online_store: basically along with datacenters, adding a routing header, so user can decide at start time, that in this feature store session, which dc they would like to read from or which dc they would like to write from. does this work? and of course there would be validation that routing header params are legal, i.e in datacenter header params |
Sorry, something went wrong.
|
Sorry, something went wrong.
##Which issue(s) this PR fixes
Fixes #6433
Summary
Adds a datacenters config list to CassandraOnlineStoreConfig as an alternative to the flat hosts field, enabling proper multi-datacenter support.
Each datacenter entry gets a named Cassandra execution profile (keyed by local_dc), so pipeline code can route reads/writes to a specific DC by passing execution_profile="<dc-name>" to the driver. The default profile is pinned to load_balancing.local_dc, or the first DC when load_balancing is absent.
Changes
Motivation
The flat hosts + single local_dc model forces all hosts to belong to the same datacenter. Users with multi-DC clusters had to create separate feature store projects per DC. With this change:
Backward compatibility
Fully backward compatible — datacenters is optional and the original code path is untouched.