模型:
s-nlp/roberta_toxicity_classifier
此模型是针对毒性分类任务进行训练的。训练所使用的数据集是由Jigsaw的三个数据集( Jigsaw 2018 , Jigsaw 2019 , Jigsaw 2020 )的英文部分合并而成,包含大约200万个示例。我们将其分为两部分,并在此基础上对RoBERTa模型( RoBERTa: A Robustly Optimized BERT Pretraining Approach )进行了微调。分类器在第一次Jigsaw比赛的测试集上表现良好,达到了0.98的AUC-ROC和0.76的F1分数。
from transformers import RobertaTokenizer, RobertaForSequenceClassification # load tokenizer and model weights tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier') model = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier') # prepare the input batch = tokenizer.encode('you are amazing', return_tensors='pt') # inference model(batch)
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License 。