数据集:
msr_sqa
任务:
子任务:
extractive-qa语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
最近在语义解析问答方面的工作聚焦于复杂而冗长的问题,其中许多问题如果在两个人之间的正常对话中提出将显得不太自然。为了研究对话式问答环境,我们提出了一个更加现实的任务:回答一系列简单但彼此相关的问题。
我们使用众包工人对来自维基百科表格问答(WikiTableQuestions, WTQ)数据集的2,022个问题进行分解创建了SQA数据集。每个WTQ问题有三个工人进行分解,最终生成了6,066个包含17,553个问题的序列数据集。每个问题还与答案关联,答案以表格中的单元格位置的形式给出。
[需要更多信息]
英语 (en)
{'id': 'nt-639', 'annotator': 0, 'position': 0, 'question': 'where are the players from?', 'table_file': 'table_csv/203_149.csv', 'table_header': ['Pick', 'Player', 'Team', 'Position', 'School'], 'table_data': [['1', 'Ben McDonald', 'Baltimore Orioles', 'RHP', 'Louisiana State University'], ['2', 'Tyler Houston', 'Atlanta Braves', 'C', '"Valley HS (Las Vegas', ' NV)"'], ['3', 'Roger Salkeld', 'Seattle Mariners', 'RHP', 'Saugus (CA) HS'], ['4', 'Jeff Jackson', 'Philadelphia Phillies', 'OF', '"Simeon HS (Chicago', ' IL)"'], ['5', 'Donald Harris', 'Texas Rangers', 'OF', 'Texas Tech University'], ['6', 'Paul Coleman', 'Saint Louis Cardinals', 'OF', 'Frankston (TX) HS'], ['7', 'Frank Thomas', 'Chicago White Sox', '1B', 'Auburn University'], ['8', 'Earl Cunningham', 'Chicago Cubs', 'OF', 'Lancaster (SC) HS'], ['9', 'Kyle Abbott', 'California Angels', 'LHP', 'Long Beach State University'], ['10', 'Charles Johnson', 'Montreal Expos', 'C', '"Westwood HS (Fort Pierce', ' FL)"'], ['11', 'Calvin Murray', 'Cleveland Indians', '3B', '"W.T. White High School (Dallas', ' TX)"'], ['12', 'Jeff Juden', 'Houston Astros', 'RHP', 'Salem (MA) HS'], ['13', 'Brent Mayne', 'Kansas City Royals', 'C', 'Cal State Fullerton'], ['14', 'Steve Hosey', 'San Francisco Giants', 'OF', 'Fresno State University'], ['15', 'Kiki Jones', 'Los Angeles Dodgers', 'RHP', '"Hillsborough HS (Tampa', ' FL)"'], ['16', 'Greg Blosser', 'Boston Red Sox', 'OF', 'Sarasota (FL) HS'], ['17', 'Cal Eldred', 'Milwaukee Brewers', 'RHP', 'University of Iowa'], ['18', 'Willie Greene', 'Pittsburgh Pirates', 'SS', '"Jones County HS (Gray', ' GA)"'], ['19', 'Eddie Zosky', 'Toronto Blue Jays', 'SS', 'Fresno State University'], ['20', 'Scott Bryant', 'Cincinnati Reds', 'OF', 'University of Texas'], ['21', 'Greg Gohr', 'Detroit Tigers', 'RHP', 'Santa Clara University'], ['22', 'Tom Goodwin', 'Los Angeles Dodgers', 'OF', 'Fresno State University'], ['23', 'Mo Vaughn', 'Boston Red Sox', '1B', 'Seton Hall University'], ['24', 'Alan Zinter', 'New York Mets', 'C', 'University of Arizona'], ['25', 'Chuck Knoblauch', 'Minnesota Twins', '2B', 'Texas A&M University'], ['26', 'Scott Burrell', 'Seattle Mariners', 'RHP', 'Hamden (CT) HS']], 'answer_coordinates': {'row_index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], 'column_index': [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]}, 'answer_text': ['Louisiana State University', 'Valley HS (Las Vegas, NV)', 'Saugus (CA) HS', 'Simeon HS (Chicago, IL)', 'Texas Tech University', 'Frankston (TX) HS', 'Auburn University', 'Lancaster (SC) HS', 'Long Beach State University', 'Westwood HS (Fort Pierce, FL)', 'W.T. White High School (Dallas, TX)', 'Salem (MA) HS', 'Cal State Fullerton', 'Fresno State University', 'Hillsborough HS (Tampa, FL)', 'Sarasota (FL) HS', 'University of Iowa', 'Jones County HS (Gray, GA)', 'Fresno State University', 'University of Texas', 'Santa Clara University', 'Fresno State University', 'Seton Hall University', 'University of Arizona', 'Texas A&M University', 'Hamden (CT) HS']}
注意,某些文本字段可能包含制表符或换行符,并因此以引号开始。建议使用类似Python的CSV包的CSV解析器来处理数据。
train | test | |
---|---|---|
N. examples | 14541 | 3012 |
[需要更多信息]
[需要更多信息]
谁是源语言的生产者?[需要更多信息]
[需要更多信息]
谁是标注者?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
Microsoft Research Data License Agreement .
@inproceedings{iyyer-etal-2017-search, title = "Search-based Neural Structured Learning for Sequential Question Answering", author = "Iyyer, Mohit and Yih, Wen-tau and Chang, Ming-Wei", booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2017", address = "Vancouver, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P17-1167", doi = "10.18653/v1/P17-1167", pages = "1821--1831", }
感谢 @mattbui 添加了该数据集。