MiniCPM4.1
model paperAdds InfLLM-v2 trainable sparse attention to MiniCPM4, supporting both deep reasoning and non-reasoning modes. 3x decoding speedup for reasoning tasks. InfLLM-v2 enables dense-sparse switchable attention for seamless short-to-long context adaptation.
Outputs 2
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
paperThe trainable sparse attention mechanism powering MiniCPM4 and 4.1. Each token computes relevance with less than 5% of tokens in 128K context.
arXiv: 2509.24663