LaBerta

这篇论文 Exploring Language Models for Classical Philology 是首次系统地为古典文献学领域提供最先进的语言模型。LaBerta是基于RoBerta-base模型大小的、单语的、仅有编码器的变种。

该模型是在 Corpus Corporum 上进行训练的。

更多信息可以在我们的论文或我们的 GitHub repository 中找到。

用法

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('bowphs/LaBerta')
model = AutoModelForMaskedLM.from_pretrained('bowphs/LaBerta')

请查看Hugging Face关于如何微调我们的模型的精彩教程。

评估结果

当在 EvaLatin 2022 的词性数据上进行微调时，LaBerta达到了以下结果:

Task	Classical	Cross-genre	Cross-time
98.11	96.73	93.33

联系方式

如果您有任何问题或问题，请随时联系我们。

引用

@incollection{riemenschneiderfrank:2023,
    address = "Toronto, Canada",
    author = "Riemenschneider, Frederick and Frank, Anette",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23)",
    note = "to appear",
    pubType = "incollection",
    publisher = "Association for Computational Linguistics",
    title = "Exploring Large Language Models for Classical Philology",
    url = "https://arxiv.org/abs/2305.13698",
    year = "2023",
    key = "riemenschneiderfrank:2023"
}

作者:

Frederick Riemenschneider

数据集大小:

1.09 GB