RobBERTje finetuned for sentiment analysis on DBRD

这是对 RobBERTje (merged) 进行情感分析的RobBERTje模型的微调版本。我们使用了 DBRD ，其中包含来自hebban.nl的图书评论。因此，我们的示例句子都是关于图书的。我们进行了一些有限的实验，以测试该模型是否适用于其他领域，但结果并不理想。

我们发布了一个精简模型和一个基准模型。这两个模型都表现出色，所以只有轻微的性能牺牲：

Model	Identifier	Layers	#Params.	Accuracy
RobBERT (v2)	1233321	12	116 M	93.3*
RobBERTje - Merged (p=0.5)	1234321	6	74 M	92.9

* RobBERT的结果与论文中报告的结果不同。

训练数据和设置

我们使用了van der Burgh等人（2019）的 Dutch Book Reviews Dataset (DBRD) 。这些评论最初是五星评级，但已转换为积极（⭐️⭐️⭐️⭐️和⭐️⭐️⭐️⭐️⭐️），中立（⭐️⭐️⭐️）和消极（⭐️和⭐️⭐️）的评级。我们使用19.5k条评论作为训练集，528条评论作为验证集，并使用2224条评论计算最终准确性。

验证集用于评估学习率、权重衰减和梯度积累步骤的随机超参数搜索。完整的训练细节可以在 training_args.bin 中作为一个二进制PyTorch文件中找到。

限制和偏见

评论的领域仅限于图书评论。
大多数书评作者是女性，这可能导致 a difference in performance for reviews written by men and women 。

致谢和引用

此项目由 Pieter Delobelle 、 Thomas Winters 和 Bettina Berendt 创建。如果您想引用我们的论文或模型，可以使用以下BibTeX：

@article{Delobelle_Winters_Berendt_2021,
    title        = {RobBERTje: A Distilled Dutch BERT Model},
    author       = {Delobelle, Pieter and Winters, Thomas and Berendt, Bettina},
    year         = 2021,
    month        = {Dec.},
    journal      = {Computational Linguistics in the Netherlands Journal},
    volume       = 11,
    pages        = {125–140},
    url          = {https://www.clinjournal.org/clinj/article/view/131}
}


@inproceedings{delobelle2020robbert,
    title = "{R}ob{BERT}: a {D}utch {R}o{BERT}a-based {L}anguage {M}odel",
    author = "Delobelle, Pieter  and
      Winters, Thomas  and
      Berendt, Bettina",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.292",
    doi = "10.18653/v1/2020.findings-emnlp.292",
    pages = "3255--3265"
}

作者:

DTAI Research Group, KU Leuven

数据集大小:

285.37 MB