DistilBart-MNLI

distilbart-mnli是使用No Teacher Distillation技术在bart-large-mnli的基础上进行蒸馏得到的精简版本，该技术由Huggingface提出用于BART摘要生成。 here

我们只是从bart-large-mnli中复制交替的层，并在相同的数据上进行更多的微调。

matched acc	mismatched acc
1235321 (baseline, 12-12)	89.9	90.01
1236321	87.08	87.5
1237321	88.1	88.19
1238321	89.19	89.01
1239321	89.56	89.52

这是一个非常简单而有效的技术，因为我们可以看到性能下降非常小。

详细的性能权衡将在此 sheet 中发布。

微调

如果您想自己训练这些模型，请克隆 distillbart-mnli repo ，然后按照下面的步骤操作

从源代码中克隆并安装transformers

git clone https://github.com/huggingface/transformers.git
pip install -qqq -U ./transformers

下载MNLI数据

python transformers/utils/download_glue_data.py --data_dir glue_data --tasks MNLI

创建学生模型

python create_student.py \
  --teacher_model_name_or_path facebook/bart-large-mnli \
  --student_encoder_layers 12 \
  --student_decoder_layers 6 \
  --save_path student-bart-mnli-12-6 \

开始微调

python run_glue.py args.json

您可以在此 wandb project 中找到这些训练模型的日志。

作者:

Suraj Patil

数据集大小:

2.29 GB