数据集:
TUKE-DeutscheTelekom/skquad
SK-QuAD是斯洛伐克语的第一个问答数据集。它是手动注释的,因此没有机器翻译引起的失真。该数据集在主题上具有多样性 - 与SQuAD没有重叠 - 它带来了新的知识。它通过了第二轮注释 - 每个问题和答案至少被两个注释者查看过。
This example was too long and was cropped: { "answers": { "answer_start": [94, 87, 94, 94], "text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"] }, "context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...", "id": "56ddde6b9a695914005b9629", "question": "When were the Normans in Normandy?", "title": "Normans" }
所有拆分的数据字段相同。
squad_v2Train | Dev | Translated | |
---|---|---|---|
Documents | 8,377 | 940 | 442 |
Paragraphs | 22,062 | 2,568 | 18,931 |
Questions | 81,582 | 9,583 | 120,239 |
Answers | 65,839 | 7,822 | 79,978 |
Unanswerable | 15,877 | 1,784 | 40,261 |
[需要更多信息]
[需要更多信息]
谁是源语言的创作者?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
[需要更多信息]
感谢 @github-username 添加了该数据集。