英文

毒性分类模型

此模型是针对毒性分类任务进行训练的。训练所使用的数据集是由Jigsaw的三个数据集( Jigsaw 2018 Jigsaw 2019 Jigsaw 2020 )的英文部分合并而成,包含大约200万个示例。我们将其分为两部分,并在此基础上对RoBERTa模型( RoBERTa: A Robustly Optimized BERT Pretraining Approach )进行了微调。分类器在第一次Jigsaw比赛的测试集上表现良好,达到了0.98的AUC-ROC和0.76的F1分数。

如何使用

from transformers import RobertaTokenizer, RobertaForSequenceClassification

# load tokenizer and model weights
tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')
model = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')

# prepare the input
batch = tokenizer.encode('you are amazing', return_tensors='pt')

# inference
model(batch)

授权信息

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License