View all files | ||||
Official PyTorch implementation of "MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering" [CVPR'26].
Efficiently and accurately responsing to user questions over long, live video streams (either third-person view or first-person view) remains challenging, especially when the questions involve fine-grained details in the far past. Existing sparse sampling and sliding window approaches often trade-off visual details for efficiency. Video KV-cache memory provides a good alternative, but per-frame caching not only neglects information granularity but also brings heavy redanducy. We thus propose MuKV, a multi-grained KV-cache compression approach designed to improve streaming VideoQA. We highlight the followings:
Figure 1: A comparison with ReKV under different online inference token count and video lengths.
We provide a convenient bash script to setup the exact dependencies and isolated conda environment automatically.
Activate the environment before proceeding:
The core scripts are adapted to run across several Large Vision/Language models (e.g. LLaVA-OneVision).
We support the official LLaVA-OneVision weights on Hugging Face:
By default, the code points to the 0.5B instance. The transformers library will download the weights automatically when you first run the server. You may also specify any other pre-downloaded local path using the --model_path argument.
We conduct experiments primarily on RVS-Ego and RVS-Movie (MovieNet).
Structure the annotations (.json/.csv) and video tensors (.npy/.mp4) inside the data/ directory exactly as shown below:
We abstract the entry points into simple run_mukv_<dataset>.py handlers inside the scripts/ folder. You must execute all python commands directly from the root MuKV/ directory.
Logs, resulting prediction CSVs, and inference time memory stat snapshots will automatically be collected under the generated results/mukv/ log directory.
If you want to run exactly configured end-to-end evaluations without manually copying command-line arguments, you can directly execute the ready-made shell scripts inside scripts/sh/:
Our methodology expands upon the impressive foundation set by LLaVA-OneVision. We thank the authors for their open-source contributions.