Sorry, something went wrong.
| query: Optional[List[float]] = None, | ||
| query_string: Optional[str] = None, | ||
| distance_metric: Optional[str] = "L2", | ||
| query_image_bytes: Optional[bytes] = None, |
There was a problem hiding this comment.
is this the right way to do this? See I would have updated the proto values to support ImageBytes like we support PDFBytes and then just queried embedding the standard way.
You can then enrich the image vector embeddings with text and semantic embeddings and then allow a composite search of both. In that sense, you still would use 1 query but searching across multiple vectors.
Sorry, something went wrong.
There was a problem hiding this comment.
Actually I'm realizing my mistake here. ImageBytes is required for retrieval, what you here is appropriate for image search when the image is passed in for the query.
Sorry, something went wrong.
There was a problem hiding this comment.
Added IMAGE_BYTES as well 👍
Sorry, something went wrong.
|
|
||
| for output in outputs: | ||
| feature_vector = output.numpy() | ||
| normalized = normalize(feature_vector.reshape(1, -1), norm="l2") |
There was a problem hiding this comment.
feel like l2 should be configurable but probably not needed yet
Sorry, something went wrong.
There was a problem hiding this comment.
right, most of the usecase get covered via l2, we can expand as needed
Sorry, something went wrong.
| else: | ||
| raise ValueError( | ||
| f"Unknown combination strategy: {strategy}. " | ||
| f"Supported strategies: weighted_sum, concatenate, average" |
There was a problem hiding this comment.
small nit, technically you should just make the strategy an enum so the valueerror can be maintained easier
Sorry, something went wrong.
There was a problem hiding this comment.
Made the changes
Sorry, something went wrong.
There was a problem hiding this comment.
some small nits but this lgtm and i'd like to include it in the next release as this is a killer new feature. great work as always @ntkathole!
Sorry, something went wrong.
What this PR does / why we need it:
Added support for image search combining text and image queries and fixed Milvus binary data handling with base64 encoding.
Users can now define Field(name="image_bytes", dtype=ImageBytes)
This PR also fixes the vector field inconsistency issue on online_write_batch with milvus.
Which issue(s) this PR fixes:
Fixes #5372 and Fixes #5551