First any-to-any omnimodal model in the HyperCLOVA X family. Supports text, audio, and vision as both inputs and outputs via unified next-token prediction over interleaved multimodal sequences. 8B parameters. Open weight.

Model Details

Architecture DENSE
Parameters 8B

Paper

arXiv: 2601.01792

multimodalaudioopen-weight

Related