模型:
DTAI-KULeuven/robbertje-1-gb-non-shuffled
任务:
填充掩码数据集:
oscar dbrd lassy-ud europarl-mono conll2002 3Aconll2002 3Aeuroparl-mono 3Alassy-ud 3Adbrd 3Aoscar语言:
nl预印本库:
arxiv:2101.05716许可:
mitRobBERTje是基于 RobBERT 的一组精简模型。有多个不同大小和不同训练设置的模型可供您选择使用。
我们还在不断努力发布表现更好的模型,因此请关注 the repository 获取更新。
Model | Description | Parameters | Training size | Huggingface id |
---|---|---|---|---|
Non-shuffled | Trained on the non-shuffled variant of the oscar corpus, without any operations to preserve this order during training and distillation. | 74 M | 1 GB | this model |
Shuffled | Trained on the publicly available and shuffled OSCAR corpus. | 74 M | 1 GB | 1234321 |
Merged (p=0.5) | Same as the non-shuffled variant, but sequential sentences of the same document are merged with a probability of 50%. | 74 M | 1 GB | 1235321 |
BORT | A smaller version with 8 attention heads instead of 12 and 4 layers instead of 6 (and 12 for RobBERT). | 46 M | 1 GB | 1236321 |
我们根据 cite 计算了伪困惑度(PPPL),这是我们蒸馏库中的内置指标。该指标可提供模型对输入分布的捕捉程度的指示。
Model | PPPL |
---|---|
RobBERT (teacher) | 7.76 |
Non-shuffled | 12.95 |
Shuffled | 18.74 |
Merged (p=0.5) | 17.10 |
BORT | 26.44 |
我们还对我们的模型进行了多个下游任务的评估,就像RobBERT老师模型一样。自那次评估以来,还发布了 Dutch NLI task named SICK-NL ,我们也使用它对我们的模型进行了评估。
Model | DBRD | DIE-DAT | NER | POS | SICK-NL |
---|---|---|---|---|---|
RobBERT (teacher) | 94.4 | 99.2 | 89.1 | 96.4 | 84.2 |
Non-shuffled | 90.2 | 98.4 | 82.9 | 95.5 | 83.4 |
Shuffled | 92.5 | 98.2 | 82.7 | 95.6 | 83.4 |
Merged (p=0.5) | 92.9 | 96.5 | 81.8 | 95.2 | 82.8 |
BORT | 89.6 | 92.2 | 79.7 | 94.3 | 81.0 |