Dynamic quantized merging framework for large language models that integrates task-specific routing with 1-bit quantized task vectors. Leverages the observation that different task-specific models store knowledge in distinct layers (chat models in attention, math/code in MLP) to balance performance and storage efficiency when merging domain-specific fine-tuned models.

Outputs 1

1bit-Merging: Dynamic Quantized Merging for Large Language Models

paper

arXiv: 2502.10743

model-mergingefficiencynlp