中文

MahaBERT

MahaBERT is a Marathi BERT model. It is a multilingual BERT (google/muril-base-cased) model fine-tuned on L3Cube-MahaCorpus and other publicly available Marathi monolingual datasets. [dataset link] ( https://github.com/l3cube-pune/MarathiNLP )

More details on the dataset, models, and baseline results can be found in our [paper] ( https://arxiv.org/abs/2202.01159 )

@inproceedings{joshi-2022-l3cube,
    title = "{L}3{C}ube-{M}aha{C}orpus and {M}aha{BERT}: {M}arathi Monolingual Corpus, {M}arathi {BERT} Language Models, and Resources",
    author = "Joshi, Raviraj",
    booktitle = "Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.wildre-1.17",
    pages = "97--101",
}

Other Monolingual Indic BERT models are listed below: Marathi BERT Marathi RoBERTa Marathi AlBERT

Hindi BERT Hindi RoBERTa Hindi AlBERT

Dev BERT Dev RoBERTa Dev AlBERT

Kannada BERT Telugu BERT Malayalam BERT Tamil BERT Gujarati BERT Oriya BERT Bengali BERT Punjabi BERT Assamese BERT