CodeGen | Lab Index

Family of autoregressive code generation models from 350M to 16.1B parameters. Trained with multi-step paradigm: natural language → multilingual code → monolingual (Python) specialization.

Competitive with OpenAI Codex at release. Introduced the MTPB (Multi-Turn Programming Benchmark). ICLR 2023. By Nijkamp, Pang, Hayashi et al. Apache 2.0.

Paper (arXiv)GitHub HuggingFace

Model Details

Architecture DENSE

Parameters 16.1B

Variants

Name	Parameters	Notes
CodeGen-350M	350M	—
CodeGen-2B	2B	—
CodeGen-6B	6B	—
CodeGen-16B	16.1B	—

Paper

Venue ICLR 2023

arXiv HTML

codingopen-weight