View all files | ||||
EdgeRazor is a lightweight framework for edge AI, designed to train models that are smaller, faster, and deployable across diverse hardware, ranging from mobile and edge endpoints to latency-sensitive clouds. The EdgeRazor framework seamlessly integrates model compression techniques into existing full-precision training pipelines with minimal code modification, preserving promising task performance and enabling low-cost and high-efficiency computations.
EdgeRazor currently focuses on low-bit LLM compression via configurable quantization-aware distillation. In terms of quantization, EdgeRazor supports quantizing weights (including embedding and lm_head layers), activations, and KV cache. Quantized bit-widths include the uniform 1.58-bit and 4-bit, as well as matrix-wise mixed-precision, such as 2.79-bit (50% 4-bit + 50% 1.58-bit) and 1.88-bit (12.5% 4-bit + 87.5% 1.58-bit). In terms of distillation, EdgeRazor offers the logits, features, and attention distillation, all of which can be flexibly combined within a unified configuration interface.
EdgeRazor achieves the state-of-the-art performance across a range of models, including base LLMs, instruction-tuned LLMs, and multimodal LLMs. For W-A8-KV8 quantization, Qwen3-0.6B-EdgeRazor attains average scores of 47.80 / 44.10 / 41.76 / 39.81 at 4-bit / 2.79-bit / 1.88-bit / 1.58-bit, corresponding to compression ratios of 3.94× / 5.05× / 6.40× / 7.03×, respectively. In comparison, the best prior methods achieve 45.74 / 37.38 / 30.49 at 4-bit / 3-bit / 2-bit with compression ratios of 2.21× / 2.47× / 2.78×.
Figure: The EdgeRazor framework with lightweight model training pipeline.
After installation, you can integrate EdgeRazor into your existing training pipeline to build lightweight models.
Seamlessly integrate EdgeRazor into your FULL-PRECISION model training pipeline!
Below are LLM examples of both the high-level and low-level API usage
Lightweight models are available from checkpoints trained with EdgeRazor. For example, you can convert Qwen3-EdgeRazor-4bit checkpoints to Q4_0 GGUF models. We also provide ready-to-use quantized models in our collection, including Qwen3-0.6B-EdgeRazor-GGUF and Qwen3-1.7B-EdgeRazor-GGUF.
EdgeRazor Playgound is CPU-friendly! Enjoy low-bit LLMs from EdgeRazor on your edge devices!
Quantization-Aware Distillation (QAD):
Figure: Workflow of the EdgeRazor framework.
Average Performance (Avg.): average of performance scores in multiple tasks using lm-eval v0.4.9.1 with tasks.
Hub Link: We provide the original quantized checkpoints. We also transfer the checkpoints into GGUF (llama.cpp) and GPTQ (GPTQModel, working in progress) formats if compatible.
| Qwen3-0.6B | W16-A16-KV16 | - | 47.35 | Base |
| Qwen3-0.6B | W4-A8-KV8 | 256 | 47.80 | EdgeRazor, Q4_0 |
| Qwen3-0.6B | W2.79-A8-KV8 | 256 | 44.10 | EdgeRazor |
| Qwen3-0.6B | W1.88-A8-KV8 | 256 | 41.76 | EdgeRazor |
| Qwen3-0.6B | W1.58-A8-KV8 | 256 | 39.81 | EdgeRazor, TQ1_0, TQ2_0 |
| Qwen3-1.7B | W16-A16-KV16 | - | 58.65 | Base |
| Qwen3-1.7B | W4-A8-KV8 | 256 | 58.57 | EdgeRazor, Q4_0 |
| Qwen3-1.7B | W2.79-A8-KV8 | 256 | 53.00 | EdgeRazor |
| Qwen3-1.7B | W1.88-A8-KV8 | 256 | 47.14 | EdgeRazor |
| Qwen3-1.7B | W1.58-A8-KV8 | 256 | 43.91 | EdgeRazor, TQ1_0, TQ2_0 |
| MobileLLM-350M | W16-A16-KV16 | - | 41.18 | Base |
| MobileLLM-350M | W4-A8-KV8 | 64 | 41.86 | EdgeRazor |
| MobileLLM-350M | W2.79-A8-KV8 | 64 | 40.62 | EdgeRazor |
| MobileLLM-350M | W1.88-A8-KV8 | 64 | 39.32 | EdgeRazor |
| MobileLLM-350M | W1.58-A8-KV8 | 64 | 38.12 | EdgeRazor |
| Qwen2.5-Omni-7B | W16-A16-KV16 | - | 62.81 | 48.01 | Base |
| Qwen2.5-Omni-7B | W4-A16-KV16 | 32 | 62.22 | 48.82 | EdgeRazor |
EdgeRazor is continuously evolving! Here's what's coming:
Have ideas or suggestions? We welcome and appreciate any contributions and collaborations! Please feel free to submit issues or pull requests! 🚀
The deployment demos utilize llama.cpp and ChatterUI.
If you find our papar and code useful in your research, please consider kindly citing our papers ✏️:
This project was supported by LAMDA and Assistant Professor Shao-Qun Zhang. Shu-Hao Zhang is the core developer and maintainer of EdgeRazor-V1. Xiang-Sheng Deng and Le-Tong Huang jointly participated in the development of this project.