DynaBERT
model paperDynamic BERT model with adaptive width and depth, allowing flexible adjustment of model size and latency at runtime. Uses knowledge distillation from full-sized models to smaller sub-networks with network rewiring to share important attention heads. Published at NeurIPS 2020.