对BERT-base/SST-2进行联合幅值剪枝、量化和蒸馏

该模型在GLUE SST2数据集上微调BERT-base时同时进行非结构化幅值剪枝、量化和蒸馏。在评估集上取得以下结果：

Torch准确率：0.9128
OpenVINO IR准确率：0.9128
Transformer块线性层的稀疏性：0.80

设置

conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip install optimum[openvino,nncf]==1.7.0
pip install datasets sentencepiece scipy scikit-learn protobuf evaluate
pip install wandb # optional

训练脚本

参见 https://gist.github.com/yujiepan-work/5d7e513a47b353db89f6e1b512d7c080

运行

我们在训练时使用一张卡片。

NNCFCFG=/path/to/nncf_config/json
python run_glue.py \
  --lr_scheduler_type cosine_with_restarts \
  --cosine_lr_scheduler_cycles 11 6 \
  --record_best_model_after_epoch 9 \
  --load_best_model_at_end True \
  --metric_for_best_model accuracy \
  --model_name_or_path textattack/bert-base-uncased-SST-2 \
  --teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \
  --distillation_temperature 2 \
  --task_name sst2 \
  --nncf_compression_config $NNCFCFG \
  --distillation_weight 0.95 \
  --output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80 \
  --overwrite_output_dir \
  --run_name bert-base-uncased-sst2-int8-unstructured80 \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --per_device_eval_batch_size 32 \
  --learning_rate 5e-05 \
  --optim adamw_torch \
  --num_train_epochs 17 \
  --logging_steps 1 \
  --evaluation_strategy steps \
  --eval_steps 250 \
  --save_strategy steps \
  --save_steps 250 \
  --save_total_limit 1 \
  --fp16 \
  --seed 1

框架版本

Transformers 4.26.0
Pytorch 1.13.1+cu116
Datasets 2.8.0
Tokenizers 0.13.2
Optimum 1.6.3
Optimum-intel 1.7.0
NNCF 2.4.0

作者:

OpenVINO Toolkit

数据集大小:

917.83 MB