DanskBERT

这是 DanskBERT，一个丹麦语言模型。请注意，在直接使用时不要在掩码前加空格！

该模型在 ScandEval benchmark for Danish 上表现最佳。

DanskBERT 是基于丹麦 Gigaword 语料库（Strømberg-Derczynski et al., 2021）进行训练的。

DanskBERT 使用 RoBERTa-base 配置在 fairseq 中进行训练。模型的批大小为 2k，使用 16 张 V100 卡片进行训练，大约耗时两周达到收敛的 500k 步。

如果您觉得该模型有用，请引用

@inproceedings{snaebjarnarson-etal-2023-transfer,
    title = "{T}ransfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese",
    author = "Snæbjarnarson, Vésteinn  and
      Simonsen, Annika  and
      Glavaš, Goran  and
      Vulić, Ivan",
    booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
    month = "may 22--24",
    year = "2023",
    address = "Tórshavn, Faroe Islands",
    publisher = {Link{\"o}ping University Electronic Press, Sweden},
}

。

作者:

Vésteinn Snæbjarnarson

数据集大小:

955 MB