模型:

l3cube-pune/marathi-bert-scratch

英文

MahaBERT-Scratch

MahaBERT是一个马拉地语BERT模型。它是基于L3Cube-MahaCorpus和其他公开的马拉地语单语数据集从零开始训练的基础BERT模型。[数据集链接] ( https://github.com/l3cube-pune/MarathiNLP )

关于数据集、模型和基准结果的更多详细信息可以在我们的[论文] ( https://arxiv.org/abs/2202.01159 ) 中找到

最佳版本的该模型可在 here 中找到。

@InProceedings{joshi:2022:WILDRE6,
  author    = {Joshi, Raviraj},
  title     = {L3Cube-MahaCorpus and MahaBERT: Marathi Monolingual Corpus, Marathi BERT Language Models, and Resources},
  booktitle      = {Proceedings of The WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {97--101}
}

其他从零开始训练的模型如下: Marathi-Scratch Marathi-Tweets-Scratch Hindi-Scratch Dev-Scratch Kannada-Scratch Telugu-Scratch Malayalam-Scratch Gujarati-Scratch

更好的单语Indic BERT模型版本如下: Marathi BERT Marathi RoBERTa Marathi AlBERT

Hindi BERT Hindi RoBERTa Hindi AlBERT

Dev BERT Dev RoBERTa Dev AlBERT

Kannada BERT Telugu BERT Malayalam BERT Tamil BERT Gujarati BERT Oriya BERT Bengali BERT Punjabi BERT