数据集:
blimp
任务:
文本分类语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
machine-generated批注创建人:
crowdsourced源数据集:
original许可:
cc-by-4.0BLiMP是用于评估语言模型(LMs)对英语中主要语法现象的了解程度的挑战集。BLiMP由67个子数据集组成,每个数据集都包含1000个针对语法、形态或语义中特定对比的最小对。数据根据专家设计的语法自动生成。
“train”的示例如下所示。
{ "UID": "tough_vs_raising_1", "field": "syntax_semantics", "lexically_identical": false, "linguistics_term": "control_raising", "one_prefix_method": false, "pair_id": 2, "sentence_bad": "Benjamin's tutor was certain to boast about.", "sentence_good": "Benjamin's tutor was easy to boast about.", "simple_LM_method": true, "two_prefix_method": false }anaphor_gender_agreement
“train”的示例如下所示。
{ "UID": "tough_vs_raising_1", "field": "syntax_semantics", "lexically_identical": false, "linguistics_term": "control_raising", "one_prefix_method": false, "pair_id": 2, "sentence_bad": "Benjamin's tutor was certain to boast about.", "sentence_good": "Benjamin's tutor was easy to boast about.", "simple_LM_method": true, "two_prefix_method": false }anaphor_number_agreement
“train”的示例如下所示。
{ "UID": "tough_vs_raising_1", "field": "syntax_semantics", "lexically_identical": false, "linguistics_term": "control_raising", "one_prefix_method": false, "pair_id": 2, "sentence_bad": "Benjamin's tutor was certain to boast about.", "sentence_good": "Benjamin's tutor was easy to boast about.", "simple_LM_method": true, "two_prefix_method": false }animate_subject_passive
“train”的示例如下所示。
{ "UID": "tough_vs_raising_1", "field": "syntax_semantics", "lexically_identical": false, "linguistics_term": "control_raising", "one_prefix_method": false, "pair_id": 2, "sentence_bad": "Benjamin's tutor was certain to boast about.", "sentence_good": "Benjamin's tutor was easy to boast about.", "simple_LM_method": true, "two_prefix_method": false }animate_subject_trans
“train”的示例如下所示。
{ "UID": "tough_vs_raising_1", "field": "syntax_semantics", "lexically_identical": false, "linguistics_term": "control_raising", "one_prefix_method": false, "pair_id": 2, "sentence_bad": "Benjamin's tutor was certain to boast about.", "sentence_good": "Benjamin's tutor was easy to boast about.", "simple_LM_method": true, "two_prefix_method": false }
所有拆分中的数据字段相同。
adjunct_islandname | train |
---|---|
adjunct_island | 1000 |
anaphor_gender_agreement | 1000 |
anaphor_number_agreement | 1000 |
animate_subject_passive | 1000 |
animate_subject_trans | 1000 |
@article{warstadt2019blimp, title={BLiMP: A Benchmark of Linguistic Minimal Pairs for English}, author={Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei, and Wang, Sheng-Fu and Bowman, Samuel R}, journal={arXiv preprint arXiv:1912.00582}, year={2019} }
感谢 @lhoestq 、 @patrickvonplaten 和 @thomwolf 添加此数据集。