TinyBERT
model paperCompressed BERT model using a novel Transformer distillation method. Achieves 96.8% of BERT-base performance on GLUE while being 7.5x smaller and 9.4x faster. Introduces a two-stage learning framework performing distillation at both pre-training and fine-tuning stages. One of the most influential model compression works for NLP.