数据集:
sick
任务:
文本分类语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced许可:
cc-by-nc-sa-3.0共享和被国际公认的基准是任何计算系统发展的基础。我们旨在通过提供适用于组合分布语义模型(CDSMs)的大规模英文基准数据集SICK(涉及组合知识的句子),帮助研究社区。SICK包含大约10,000个英文句对,其中包含许多CDSMs应该解决的词汇、句法和语义现象的例子,但不需要处理CDSMs范围之外的现有句子数据集的其他方面(惯用的多字表达式、命名实体、电报语言)。通过众包技术,每个句对被注释为两个至关重要的语义任务:意义相关性(使用5级评分标准作为黄金分数)和两个元素之间的蕴含关系(使用三个可能的黄金标签:蕴含、矛盾和中立)。SICK数据集在SemEval-2014任务1中使用,并可供研究目的免费使用。
[需要更多信息]
数据集为英文。
示例实例:
{ "entailment_AB": "A_neutral_B", "entailment_BA": "B_neutral_A", "label": 1, "id": "1", "relatedness_score": 4.5, "sentence_A": "A group of kids is playing in a yard and an old man is standing in the background", "sentence_A_dataset": "FLICKR", "sentence_A_original": "A group of children playing in a yard, a man in the background.", "sentence_B": "A group of boys in a yard is playing and a man is standing in the background", "sentence_B_dataset": "FLICKR", "sentence_B_original": "A group of children playing in a yard, a man in the background." }
训练集4439,试验集495,测试集4906
[需要更多信息]
[需要更多信息]
谁是源语言的生产者?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
@inproceedings{marelli-etal-2014-sick, title = "A {SICK} cure for the evaluation of compositional distributional semantic models", author = "Marelli, Marco and Menini, Stefano and Baroni, Marco and Bentivogli, Luisa and Bernardi, Raffaella and Zamparelli, Roberto", booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation ({LREC}'14)", month = may, year = "2014", address = "Reykjavik, Iceland", publisher = "European Language Resources Association (ELRA)", url = "http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf", pages = "216--223", }
感谢 @calpt 添加了这个数据集。