模型:
bowphs/LaBerta
这篇论文 Exploring Language Models for Classical Philology 是首次系统地为古典文献学领域提供最先进的语言模型。LaBerta是基于RoBerta-base模型大小的、单语的、仅有编码器的变种。
该模型是在 Corpus Corporum 上进行训练的。
更多信息可以在我们的论文或我们的 GitHub repository 中找到。
from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained('bowphs/LaBerta') model = AutoModelForMaskedLM.from_pretrained('bowphs/LaBerta')
请查看Hugging Face关于如何微调我们的模型的精彩教程。
当在 EvaLatin 2022 的词性数据上进行微调时,LaBerta达到了以下结果:
Task | Classical | Cross-genre | Cross-time |
---|---|---|---|
98.11 | 96.73 | 93.33 |
如果您有任何问题或问题,请随时联系我们。
@incollection{riemenschneiderfrank:2023, address = "Toronto, Canada", author = "Riemenschneider, Frederick and Frank, Anette", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23)", note = "to appear", pubType = "incollection", publisher = "Association for Computational Linguistics", title = "Exploring Large Language Models for Classical Philology", url = "https://arxiv.org/abs/2305.13698", year = "2023", key = "riemenschneiderfrank:2023" }