"BitNet: Scaling 1-Bit Transformers for Large Language Models." Replaces linear layers with BitLinear using binary {-1, +1} weights. The follow-up BitNet b1.58 (February 2024, "The Era of 1-bit LLMs") uses ternary {-1, 0, 1} weights, matching full-precision Transformer performance while dramatically reducing memory and compute.

BitNet opened a new research direction in extreme quantization from scratch (training in low precision rather than post-training quantization). By Wang, Ma, Dong et al. at MSR Asia.

Paper

arXiv: 2310.11453

foundationalefficiency