Family of autoregressive code generation models from 350M to 16.1B parameters. Trained with multi-step paradigm: natural language → multilingual code → monolingual (Python) specialization.

Competitive with OpenAI Codex at release. Introduced the MTPB (Multi-Turn Programming Benchmark). ICLR 2023. By Nijkamp, Pang, Hayashi et al. Apache 2.0.

Model Details

Architecture DENSE
Parameters 16.1B

Variants

Name Parameters Notes
CodeGen-350M 350M
CodeGen-2B 2B
CodeGen-6B 6B
CodeGen-16B 16.1B

Paper

arXiv: 2203.13474

Venue: ICLR 2023

codingopen-weight