"Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture." Proposes a dual-LoRA setup that attaches a reasoning adapter and an embedding adapter to a shared frozen backbone, with gradients detached at the interface to avoid the reasoning/embedding conflict that joint multi-task training usually introduces. A self-supervised routing gate then decides per-input whether to spend chain-of-thought tokens before producing the embedding.

Adds only 3–5% parameters over the backbone and uses up to 50% fewer reasoning tokens than always-generating mode. Reports SOTA on the 78-task MMEB-V2 multimodal-embedding benchmark. By Longxiang Zhang, Weilong Dai, Guanghao Zhang, Hao Jiang, and Pipei Huang (Alibaba Group).

Paper

Authors: Longxiang Zhang · Weilong Dai · Guanghao Zhang · Hao Jiang · Pipei Huang
multimodalreasoningembeddingsefficiency