SVC: Singular Value Calibration for Model Merging
English | 中文
Training-free and data-free singular value calibration for robust model merging across shared subspaces.
- 💥2026-05-01: Our paper is accepted by ICML'26. See you all in Seoul, Korea!
- 💥2026-03-20: We appreciate Anke Tang for including our work in fusion_bench!
- 💥2026-02-05: We have submitted our paper to arXiv.
Model merging combines multiple fine-tuned models into a single model by adding their weight updates, providing a lightweight alternative to retraining.
Existing methods primarily target resolving conflicts between task updates, leaving the failure mode of over-counting shared knowledge unaddressed.
We show that when tasks share aligned spectral directions (i.e., overlapping singular vectors), a simple linear combination repeatedly accumulates these directions, inflating the singular values and biasing the merged model toward shared subspaces.
To mitigate this issue, we propose Singular Value Calibration (SVC), a training-free and data-free post-processing method that quantifies subspace overlap and rescales inflated singular values to restore a balanced spectrum.
Across vision and language benchmarks, SVC consistently improves strong merging baselines and achieves state-of-the-art performance.
Furthermore, by modifying only the singular values, SVC improves the performance of Task Arithmetic by 13.0%.
- Training-free and data-free post-processing for model merging.
- Targets spectral over-counting in shared singular directions.
- Plug-and-play with common merging baselines (TA, TIES, STAR, TSV-M, Iso-*).
- Strong empirical gains across vision and language benchmarks.
🚀 Quick Start (5 Minutes)
# Create environment from environment.yml
conda env create -f environment.yml -n SVC
conda activate SVC
Download from Google Drive and organize:
checkpoints/
└── ViT-B-32/
├── zeroshot.pt # Pre-trained CLIP
├── Cars/finetuned.pt
├── DTD/finetuned.pt
└── ... (other tasks)
3️⃣ Run Your First Experiment
# Task Arithmetic (baseline)
python main.py --model ViT-B-32 --merge TA
# Task Arithmetic + SVC (calibrated)
python main.py --model ViT-B-32 --merge TA --c
# Try all methods
bash run.sh
✅ Results will appear in logs/ViT-B-32/log_*.txt
python main.py [OPTIONS]
Option
Default
Description
| --model |
ViT-B-32 |
Model architecture (ViT-B-32, ViT-L-14, etc.) |
| --merge |
TA |
Merging method (TA, TSV-M, etc.) |
| --c |
False |
Enable Support Vector Calibration |
| --alpha |
0.1 |
Calibration weight parameter |
| --scaling_coef |
1.0 |
Task vector scaling factor |
| --base_dir |
. |
Base directory for checkpoints |
Available Merging Methods
Method
Description
Best For
Link
| TA |
Task Arithmetic (simple average) |
Baseline, fast |
link |
| TIES |
Sign-alignment sparse merging |
Conflict reduction |
link |
| DARE |
Randomized task vector selection |
Regularization |
link |
| TSV-M |
Task singular vectors merging |
Spectral alignment |
link |
| Iso-C |
Isolated common subspace |
Common-space control |
link |
| Iso-CTS |
Common + task-specific subspaces |
Fine-grained decomposition |
link |
| STAR |
Spectral task arithmetic |
Non-uniform spectrum |
link |
Example 1: Reproduce Baseline Results
python main.py --model ViT-B-32 --merge TA --scaling_coef 0.3
Example 2: Use SVC for Better Performance
python main.py --model ViT-B-32 --merge TA --c --alpha 0.5
Example 3: Compare Multiple Methods
for method in TA WA TIES STAR; do
python main.py --model ViT-B-32 --merge $method --c
done
Example 4: Batch Processing (All Methods)
bash run.sh
SVC/
├── main.py # Main entry point
├── merge_func.py # All merging algorithm implementations
├── run.sh # Batch script to run all experiments
├── utils.py # Common utilities
│
├── clip/ # CLIP model implementation
│ ├── clip.py
│ ├── model.py # ViT architecture
│ └── simple_tokenizer.py # Tokenization
│
├── src/
│ ├── args.py # Argument parsing
│ ├── eval.py # Evaluation on multiple datasets
│ ├── modeling.py # Model instantiation
│ ├── task_vectors.py # TaskVector class - core data structure
│ ├── ties_merging_utils.py # TIES algorithm utilities
│ │
│ └── datasets/ # Dataset implementations
│ ├── registry.py # Dataset registry & factory
│ ├── common.py # Base dataset class
│ ├── cifar10.py
│ ├── cifar100.py
│ ├── imagenet.py
│ └── ... (20+ datasets)
│
├── logs/ # Output logs (auto-created)
│ └── ViT-B-32/
│ └── log_*.txt
│
└── checkpoints/ # Model checkpoints (external)
└── ViT-B-32/
├── zeroshot.pt
└── */finetuned.pt
Task Vector: The difference between fine-tuned and pre-trained weights:
$$\tau_i = \theta_{ft,i} - \theta_{pt}$$
Merging Task: Combine n task vectors into a single merged model while preserving knowledge:
$$\theta_{merged} = \theta_{pt} + \text{Merge}(\tau_1, \tau_2, ..., \tau_n)$$
Support Vector Calibration: Our proposed method identifies and weights critical parameters for better merging.
- ✅ No Task ID Needed: Merged model works on all tasks without task-specific routing
- ✅ Parameter Efficient: Single model replaces N fine-tuned models
- ✅ Knowledge Preservation: Combines learned knowledge across tasks
- ⚠️ Challenge: Preventing negative transfer between tasks
Add new method to merge_func.py:
@torch.no_grad()
def MyMethod(task_vector_avg, task_vectors, config):
"""
Args:
task_vector_avg: Initial averaged task vector
task_vectors: List[TaskVector] of all tasks
config: Configuration object with hyperparameters
Returns:
TaskVector: Merged task vector
"""
# Your implementation
print(f"Processing {len(task_vectors)} task vectors...")
for key in task_vector_avg.vector:
# Modify task_vector_avg.vector[key]
pass
return task_vector_avg
Then register in main.py:
merge_methods = {
'TA': TA,
'MyMethod': MyMethod,
}
- Create new dataset class in src/datasets/:
from src.datasets.common import AbstractDataset
class MyDataset(AbstractDataset):
def __init__(self, root, split='train'):
# Load your dataset
pass
- Register in src/datasets/registry.py
Typical evaluation metrics:
- Accuracy per dataset
- Average accuracy across all tasks
- Runtime and memory consumption
Results logged in: logs/ViT-B-32/log_YYYYMMDD_HHMMSS_mainV2.txt
Problem: Checkpoint Loading Fails
RuntimeError: Unable to load checkpoint
Solution: Use pickle instead of torch.load
import pickle
ckpt = pickle.load(open('checkpoint.pt', 'rb'))
state = ckpt.state_dict() if hasattr(ckpt, 'state_dict') else ckpt
Solutions:
- Use GPU: ensure CUDA is available
- Reduce number of evaluation samples
- Use multiprocessing in eval.py
📚 Related Work & References
If this work helps your research, please cite:
@article{SVC2026,
title={When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging},
author={Li, Yayuan and Peng, Ze and Zhang, Jian and Guo, Jintao and Duan, Yue and Shi, Yinghuan},
journal={arXiv preprint arXiv:2602.05536},
year={2026}
}
MIT License - See LICENSE file for details
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
一句话简介:SVC 是一种无需训练、无需数据的奇异值校准方法,用于提升多任务模型融合的稳定性与泛化性能。
SVC 是一个先进的模型融合框架,该项目基于 Task Vectors 的研究,提出了 奇异值校准(SVC) 方法来实现多个微调模型的稳定高效融合。
# 1. 环境配置
conda create -n svc python=3.10 -y
conda activate svc
pip install torch torchvision transformers scipy tqdm Pillow
# 2. 下载模型检查点到 checkpoints/ViT-B-32/
# 3. 运行第一个实验
python main.py --model ViT-B-32 --merge TA
python main.py --model ViT-B-32 --merge TA --c # 使用 SVC
bash run.sh # 运行所有方法
python main.py [OPTIONS]
主要参数:
- --model: 模型架构 (ViT-B-32, ViT-L-14 等)
- --merge: 合并方法 (TA, WA, SA, TIES, DARE, STAR, iso_c, iso_cts 等)
- --c: 启用支持向量校准
- --alpha: 校准权重参数
- --scaling_coef: 任务向量缩放系数
任务向量:微调权重与预训练权重的差:
$$\tau_i = \theta_{ft,i} - \theta_{pt}$$
合并目标:将 n 个任务向量合并成单个模型,同时保留所有任务的知识:
$$\theta_{merged} = \theta_{pt} + \text{Merge}(\tau_1, \tau_2, ..., \tau_n)$$
可以轻松添加新的合并方法、数据集和模型。详见源码注释。