MiniCPM4.1 | Lab Index

Adds InfLLM-v2 trainable sparse attention to MiniCPM4, supporting both deep reasoning and non-reasoning modes. 3x decoding speedup for reasoning tasks. InfLLM-v2 enables dense-sparse switchable attention for seamless short-to-long context adaptation.

HuggingFace GitHub InfLLM-V2 Paper (arXiv)

Outputs 2

MiniCPM4.1-8B

model

HuggingFace

Architecture DENSE

Parameters 8B

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

paper 2025-09-29

The trainable sparse attention mechanism powering MiniCPM4 and 4.1. Each token computes relevance with less than 5% of tokens in 128K context.

Paper (arXiv)

arXiv: 2509.24663

on-deviceefficiencyreasoningopen-weightscaling

Outputs 2

MiniCPM4.1-8B

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

Related