

这是 DanskBERT,一个丹麦语言模型。请注意,在直接使用时不要在掩码前加空格!

该模型在 ScandEval benchmark for Danish 上表现最佳。

DanskBERT 是基于丹麦 Gigaword 语料库(Strømberg-Derczynski et al., 2021)进行训练的。

DanskBERT 使用 RoBERTa-base 配置在 fairseq 中进行训练。模型的批大小为 2k,使用 16 张 V100 卡片进行训练,大约耗时两周达到收敛的 500k 步。


    title = "{T}ransfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese",
    author = "Snæbjarnarson, Vésteinn  and
      Simonsen, Annika  and
      Glavaš, Goran  and
      Vulić, Ivan",
    booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
    month = "may 22--24",
    year = "2023",
    address = "Tórshavn, Faroe Islands",
    publisher = {Link{\"o}ping University Electronic Press, Sweden},