Codestral Mamba
model7.3B code model using Mamba2 state-space architecture (NOT a transformer). Linear-time inference with theoretically unlimited context (tested to 256K tokens). Co-developed with Mamba's original authors. Apache 2.0.
Model Details
Architecture DENSE
Parameters 7.3B