Multi-modality-to-multi-modality mega-transformer, including the M6-10T (10 trillion parameters) and M6-T sparse models. Predates the Qwen branding.
multimodalsparse