← 返回首页
Milvus Online Store Dimension Mismatch Error in Push API and Materialization · Issue #5551 · feast-dev/feast · GitHub
Skip to content

Navigation Menu

Toggle navigation
Sign in
Appearance settings
Search or jump to...

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Resetting focus

Milvus Online Store Dimension Mismatch Error in Push API and Materialization #5551

New issue
New issue

Description

Summary

Feast's Milvus online store integration has a critical dimension mismatch bug that affects both the push API and materialization approaches. When storing embeddings with correct dimensions (384), Feast internally transforms the data incorrectly, causing Milvus to reject the data with dimension errors.

Environment

  • Feast version: 0.51.0
  • Python version: 3.12.11
  • pymilvus version: 2.3.0+
  • OS: macOS (Darwin 24.5.0)
  • Milvus: milvus-lite (via path: data/online_store.db)

Bug Description

Error Message

ERROR:pymilvus.decorators:RPC error: [upsert_rows], <MilvusException: (code=65535, message=the length(7695) of float data should divide the dim(384): )>

Expected Behavior

  • Input: 5 embeddings × 384 dimensions = 1920 total elements
  • Feast should store these embeddings correctly in Milvus
  • Expected elements sent to Milvus: 1920

Actual Behavior

  • Input: 5 embeddings × 384 dimensions = 1920 total elements
  • Feast transforms this to 7695 elements (factor of ~4x)
  • Milvus rejects the data because 7695 ÷ 384 = 20.04... (not integer)

Steps to Reproduce

1. Feature Store Configuration

# feast_feature_repo/feature_store.yaml project: rag provider: local registry: data/registry.db online_store: type: milvus path: data/online_store.db vector_enabled: true embedding_dim: 384 index_type: "FLAT" metric_type: "L2" offline_store: type: file entity_key_serialization_version: 3 auth: type: no_auth

2. Feature Definitions

from feast import Entity, FeatureView, Field, FileSource, PushSource from feast.types import Array, Float32, String, Int64 from feast.value_type import ValueType from datetime import timedelta document = Entity( name="document_id", value_type=ValueType.STRING, description="Unique identifier for document chunks" ) document_embeddings_source = FileSource( name="document_embeddings_source", path="data/document_embeddings.parquet", timestamp_field="event_timestamp", created_timestamp_column="created_timestamp", ) document_embeddings_push_source = PushSource( name="document_embeddings_push_source", batch_source=document_embeddings_source, ) document_embeddings = FeatureView( name="document_embeddings", entities=[document], ttl=timedelta(days=365), schema=[ Field(name="embedding", dtype=Array(Float32), vector_index=True), Field(name="chunk_text", dtype=String), Field(name="document_title", dtype=String), Field(name="chunk_index", dtype=Int64), Field(name="file_path", dtype=String), Field(name="chunk_length", dtype=Int64), ], online=True, source=document_embeddings_push_source, tags={"team": "rag", "version": "v3"}, )

3. Reproduce with Push API

import pandas as pd import numpy as np from datetime import datetime from sentence_transformers import SentenceTransformer from feast import FeatureStore from feast.data_format import PushMode # Generate test embeddings (384 dimensions) model = SentenceTransformer('all-MiniLM-L6-v2') texts = [ 'Test document 1', 'Test document 2', 'Test document 3', 'Test document 4', 'Test document 5' ] embeddings = model.encode(texts) # Shape: (5, 384) # Create DataFrame feature_data = [] for i, (text, embedding) in enumerate(zip(texts, embeddings)): feature_data.append({ "document_id": f"test_doc_{i}", "embedding": embedding.tolist(), # Convert to list as per docs "chunk_text": text, "document_title": "test_document.md", "chunk_index": i, "file_path": "test_path", "chunk_length": len(text), "event_timestamp": pd.Timestamp.now(tz='UTC'), "created_timestamp": pd.Timestamp.now(tz='UTC') }) df = pd.DataFrame(feature_data) print(f"Input data: {len(df)} rows, {len(df) * 384} total elements") # Initialize Feast store fs = FeatureStore(repo_path="feast_feature_repo") # This will fail with dimension mismatch fs.push( push_source_name="document_embeddings_push_source", df=df, to=PushMode.ONLINE_AND_OFFLINE )

4. Reproduce with Materialization

# Save to parquet file df.to_parquet('feast_feature_repo/data/document_embeddings.parquet', index=False) # Try materialization from datetime import timedelta end_time = datetime.now() start_time = end_time - timedelta(hours=1) # This will also fail with same dimension mismatch fs.materialize( start_date=start_time, end_date=end_time, feature_views=["document_embeddings"] )

Investigation Results

Data Validation

Our debugging confirmed:

  • ✅ Input embeddings are exactly 384 dimensions each
  • ✅ DataFrame contains 5 rows × 384 = 1920 total elements
  • ✅ Embeddings converted to Python lists correctly
  • ✅ Data types are correct (Array(Float32))
  • ❌ Feast somehow transforms 1920 → 7695 elements internally

Affected Methods

  1. Push API: store.push() with PushMode.ONLINE_AND_OFFLINE
  2. Materialization: store.materialize() from parquet files
  3. Both fail with identical dimension mismatch errors

Expected Fix

Feast should correctly handle Array(Float32) fields when:

  1. Pushing data via push API
  2. Materializing data from parquet files
  3. The dimension transformation logic needs debugging/fixing

Potential Root Cause

The issue appears to be in Feast's internal serialization/transformation of Array(Float32) fields when interfacing with Milvus. The ~4x multiplication factor (1920 → 7695) suggests there might be:

  • Incorrect flattening of nested arrays
  • Multiple serialization passes
  • Data type conversion issues in the Milvus online store adapter

Workaround

Currently using direct pymilvus.MilvusClient integration which works perfectly with the same data, confirming the issue is within Feast's Milvus adapter.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Footer

    © 2026 GitHub, Inc.