? + ? dbmdz DistilBERT模型

在这个仓库中，巴伐利亚国家图书馆(MDZ Digital Library team)的dbmdz开源了一个德文欧洲数字图书馆DistilBERT模型 ?

德文欧洲数字图书馆DistilBERT

我们使用了由The European Library提供的开源 Europeana newspapers 。最终的训练语料库大小为51GB，包含了8,035,986,369个标记。

关于数据和预训练步骤的详细信息可以在 this repository 中找到。

结果

有关历史命名实体识别的结果，请参阅 this repository 。

用法

使用 Transformers >= 4.3，我们可以加载德文欧洲数字图书馆DistilBERT模型如下：

from transformers import AutoModel, AutoTokenizer

model_name = "dbmdz/distilbert-base-german-europeana-cased"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

Huggingface模型中心

所有其他德文欧洲数字图书馆模型都可以在 Huggingface model hub 上找到。

联系方式 (错误、反馈、贡献等)

如果对我们的Europeana BERT、ELECTRA和ConvBERT模型有任何问题，只需打开一个新的讨论 here ?

致谢

本研究得到了来自谷歌TensorFlow研究云(TFRC)的云TPU支持。感谢提供TFRC的访问权限 ❤️

感谢 Hugging Face 团队的慷慨支持，我们可以从他们的S3存储中下载包含大小写的模型和不区分大小写的模型 ?

作者:

Bayerische Staatsbibliothek

数据集大小:

613.87 MB