数据集:
gem
预印本库:
arxiv:2102.01672许可:
otherGEM 是一个以评估为重点的自然语言生成基准环境,可以通过人工注释和自动指标进行评估。
GEM 的目标是:
我们的目标是定期更新 GEM,并通过扩展现有数据或为其他语言开发数据集来鼓励更具包容性的数据集开发实践。
您可以在每个子集的数据集卡中找到更完整的信息:
子集按任务进行组织:
{ "summarization": { "mlsum": ["mlsum_de", "mlsum_es"], "wiki_lingua": ["wiki_lingua_es_en", "wiki_lingua_ru_en", "wiki_lingua_tr_en", "wiki_lingua_vi_en"], "xsum": ["xsum"], }, "struct2text": { "common_gen": ["common_gen"], "cs_restaurants": ["cs_restaurants"], "dart": ["dart"], "e2e": ["e2e_nlg"], "totto": ["totto"], "web_nlg": ["web_nlg_en", "web_nlg_ru"], }, "simplification": { "wiki_auto_asset_turk": ["wiki_auto_asset_turk"], }, "dialog": { "schema_guided_dialog": ["schema_guided_dialog"], }, }
每个示例的训练集中有一个目标,验证集和测试集中有一组参考(一个或多个项目)。
验证集示例如下。
{'concept_set_id': 0, 'concepts': ['field', 'look', 'stand'], 'gem_id': 'common_gen-validation-0', 'references': ['The player stood in the field looking at the batter.', 'The coach stands along the field, looking at the goalkeeper.', 'I stood and looked across the field, peacefully.', 'Someone stands, looking around the empty field.'], 'target': 'The player stood in the field looking at the batter.'}cs_restaurants
验证集示例如下。
{'dialog_act': '?request(area)', 'dialog_act_delexicalized': '?request(area)', 'gem_id': 'cs_restaurants-validation-0', 'references': ['Jakou lokalitu hledáte ?'], 'target': 'Jakou lokalitu hledáte ?', 'target_delexicalized': 'Jakou lokalitu hledáte ?'}dart
验证集示例如下。
{'dart_id': 0, 'gem_id': 'dart-validation-0', 'references': ['A school from Mars Hill, North Carolina, joined in 1973.'], 'subtree_was_extended': True, 'target': 'A school from Mars Hill, North Carolina, joined in 1973.', 'target_sources': ['WikiSQL_decl_sents'], 'tripleset': [['Mars Hill College', 'JOINED', '1973'], ['Mars Hill College', 'LOCATION', 'Mars Hill, North Carolina']]}e2e_nlg
验证集示例如下。
{'gem_id': 'e2e_nlg-validation-0', 'meaning_representation': 'name[Alimentum], area[city centre], familyFriendly[no]', 'references': ['There is a place in the city centre, Alimentum, that is not family-friendly.'], 'target': 'There is a place in the city centre, Alimentum, that is not family-friendly.'}mlsum_de
验证集示例如下。
{'date': '00/04/2019', 'gem_id': 'mlsum_de-validation-0', 'references': ['In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ihrer Wohnung gefunden worden. Nun stehen zwei Bekannte unter Verdacht.'], 'target': 'In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ihrer Wohnung gefunden worden. Nun stehen zwei Bekannte unter Verdacht.', 'text': 'Kerzen und Blumen stehen vor dem Eingang eines Hauses, in dem eine 18-jährige Frau tot aufgefunden wurde. In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ...', 'title': 'Tod von 18-Jähriger auf Usedom: Zwei Festnahmen', 'topic': 'panorama', 'url': 'https://www.sueddeutsche.de/panorama/usedom-frau-tot-festnahme-verdaechtige-1.4412256'}mlsum_es
验证集示例如下。
{'date': '05/01/2019', 'gem_id': 'mlsum_es-validation-0', 'references': ['El diseñador que dio carta de naturaleza al estilo genuinamente americano celebra el medio siglo de su marca entre grandes fastos y problemas financieros. Conectar con las nuevas generaciones es el regalo que precisa más que nunca'], 'target': 'El diseñador que dio carta de naturaleza al estilo genuinamente americano celebra el medio siglo de su marca entre grandes fastos y problemas financieros. Conectar con las nuevas generaciones es el regalo que precisa más que nunca', 'text': 'Un oso de peluche marcándose un heelflip de monopatín es todo lo que Ralph Lauren necesitaba esta Navidad. Estampado en un jersey de lana azul marino, supone la guinda que corona ...', 'title': 'Ralph Lauren busca el secreto de la eterna juventud', 'topic': 'elpais estilo', 'url': 'http://elpais.com/elpais/2019/01/04/estilo/1546617396_933318.html'}schema_guided_dialog
验证集示例如下。
{'dialog_acts': [{'act': 2, 'slot': 'song_name', 'values': ['Carnivore']}, {'act': 2, 'slot': 'playback_device', 'values': ['TV']}], 'dialog_id': '10_00054', 'gem_id': 'schema_guided_dialog-validation-0', 'prompt': 'Yes, I would.', 'references': ['Please confirm the song Carnivore on tv.'], 'target': 'Please confirm the song Carnivore on tv.', 'turn_id': 15}totto
验证集示例如下。
{'example_id': '7391450717765563190', 'gem_id': 'totto-validation-0', 'highlighted_cells': [[3, 0], [3, 2], [3, 3]], 'overlap_subset': 'True', 'references': ['Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.', 'Daniel Henry Chamberlain was the 76th Governor of South Carolina, beginning in 1874.', 'Daniel Henry Chamberlain was the 76th Governor of South Carolina who took office in 1874.'], 'sentence_annotations': [{'final_sentence': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.', 'original_sentence': 'Daniel Henry Chamberlain (June 23, 1835 – April 13, 1907) was an American planter, lawyer, author and the 76th Governor of South Carolina ' 'from 1874 until 1877.', 'sentence_after_ambiguity': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.', 'sentence_after_deletion': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.'}, ... ], 'table': [[{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': '#'}, {'column_span': 2, 'is_header': True, 'row_span': 1, 'value': 'Governor'}, {'column_span': 1, 'is_header': True, 'row_span': 1, 'value': 'Took Office'}, {'column_span': 1, 'is_header': True, 'row_span': 1, 'value': 'Left Office'}], [{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': '74'}, {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '-'}, {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': 'Robert Kingston Scott'}, {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': 'July 6, 1868'}], ... ], 'table_page_title': 'List of Governors of South Carolina', 'table_section_text': 'Parties Democratic Republican', 'table_section_title': 'Governors under the Constitution of 1868', 'table_webpage_url': 'http://en.wikipedia.org/wiki/List_of_Governors_of_South_Carolina', 'target': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.', 'totto_id': 0}web_nlg_en
验证集示例如下。
{'category': 'Airport', 'gem_id': 'web_nlg_en-validation-0', 'input': ['Aarhus | leader | Jacob_Bundsgaard'], 'references': ['The leader of Aarhus is Jacob Bundsgaard.'], 'target': 'The leader of Aarhus is Jacob Bundsgaard.', 'webnlg_id': 'dev/Airport/1/Id1'}web_nlg_ru
验证集示例如下。
{'category': 'Airport', 'gem_id': 'web_nlg_ru-validation-0', 'input': ['Punjab,_Pakistan | leaderTitle | Provincial_Assembly_of_the_Punjab'], 'references': ['Пенджаб, Пакистан, возглавляется Провинциальной ассамблеей Пенджаба.', 'Пенджаб, Пакистан возглавляется Провинциальной ассамблеей Пенджаба.'], 'target': 'Пенджаб, Пакистан, возглавляется Провинциальной ассамблеей Пенджаба.', 'webnlg_id': 'dev/Airport/1/Id1'}wiki_auto_asset_turk
验证集示例如下。
{'gem_id': 'wiki_auto_asset_turk-validation-0', 'references': ['The Gandalf Awards honor excellent writing in in fantasy literature.'], 'source': 'The Gandalf Awards, honoring achievement in fantasy literature, were conferred by the World Science Fiction Society annually from 1974 to 1981.', 'source_id': '350_691837-1-0-0', 'target': 'The Gandalf Awards honor excellent writing in in fantasy literature.', 'target_id': '350_691837-0-0-0'}wiki_lingua_es_en
验证集示例如下。
'references': ["Practice matted hair prevention from early in your cat's life. Make sure that your cat is grooming itself effectively. Keep a close eye on cats with long hair."], 'source': 'Muchas personas presentan problemas porque no cepillaron el pelaje de sus gatos en una etapa temprana de su vida, ya que no lo consideraban necesario. Sin embargo, a medida que...', 'target': "Practice matted hair prevention from early in your cat's life. Make sure that your cat is grooming itself effectively. Keep a close eye on cats with long hair."}wiki_lingua_ru_en
验证集示例如下。
{'gem_id': 'wiki_lingua_ru_en-val-0', 'references': ['Get immediate medical care if you notice signs of a complication. Undergo diagnostic tests to check for gallstones and complications. Ask your doctor about your treatment ' 'options.'], 'source': 'И хотя, скорее всего, вам не о чем волноваться, следует незамедлительно обратиться к врачу, если вы подозреваете, что у вас возникло осложнение желчекаменной болезни. Это ...', 'target': 'Get immediate medical care if you notice signs of a complication. Undergo diagnostic tests to check for gallstones and complications. Ask your doctor about your treatment ' 'options.'}wiki_lingua_tr_en
验证集示例如下。
{'gem_id': 'wiki_lingua_tr_en-val-0', 'references': ['Open Instagram. Go to the video you want to download. Tap ⋮. Tap Copy Link. Open Google Chrome. Tap the address bar. Go to the SaveFromWeb site. Tap the "Paste Instagram Video" text box. Tap and hold the text box. Tap PASTE. Tap Download. Download the video. Find the video on your Android.'], 'source': 'Instagram uygulamasının çok renkli kamera şeklindeki simgesine dokun. Daha önce giriş yaptıysan Instagram haber kaynağı açılır. Giriş yapmadıysan istendiğinde e-posta adresini ...', 'target': 'Open Instagram. Go to the video you want to download. Tap ⋮. Tap Copy Link. Open Google Chrome. Tap the address bar. Go to the SaveFromWeb site. Tap the "Paste Instagram Video" text box. Tap and hold the text box. Tap PASTE. Tap Download. Download the video. Find the video on your Android.'}wiki_lingua_vi_en
验证集示例如下。
{'gem_id': 'wiki_lingua_vi_en-val-0', 'references': ['Select the right time of year for planting the tree. You will usually want to plant your tree when it is dormant, or not flowering, during cooler or colder times of year.'], 'source': 'Bạn muốn cung cấp cho cây cơ hội tốt nhất để phát triển và sinh tồn. Trồng cây đúng thời điểm trong năm chính là yếu tố then chốt. Thời điểm sẽ thay đổi phụ thuộc vào loài cây ...', 'target': 'Select the right time of year for planting the tree. You will usually want to plant your tree when it is dormant, or not flowering, during cooler or colder times of year.'}xsum
验证集示例如下。
{'document': 'Burberry reported pre-tax profits of £166m for the year to March. A year ago it made a loss of £16.1m, hit by charges at its Spanish operations.\n' 'In the past year it has opened 21 new stores and closed nine. It plans to open 20-30 stores this year worldwide.\n' 'The group has also focused on promoting the Burberry brand online...', 'gem_id': 'xsum-validation-0', 'references': ['Luxury fashion designer Burberry has returned to profit after opening new stores and spending more on online marketing'], 'target': 'Luxury fashion designer Burberry has returned to profit after opening new stores and spending more on online marketing', 'xsum_id': '10162122'}
所有拆分的数据字段都是相同的。
common_gentrain | validation | test | |
---|---|---|---|
common_gen | 67389 | 993 | 1497 |
train | validation | test | |
---|---|---|---|
cs_restaurants | 3569 | 781 | 842 |
train | validation | test | |
---|---|---|---|
dart | 62659 | 2768 | 6959 |
train | validation | test | |
---|---|---|---|
e2e_nlg | 33525 | 4299 | 4693 |
train | validation | test | |
---|---|---|---|
mlsum_de | 220748 | 11392 | 10695 |
train | validation | test | |
---|---|---|---|
mlsum_es | 259886 | 9977 | 13365 |
train | validation | test | |
---|---|---|---|
schema_guided_dialog | 164982 | 10000 | 10000 |
train | validation | test | |
---|---|---|---|
totto | 121153 | 7700 | 7700 |
train | validation | test | |
---|---|---|---|
web_nlg_en | 35426 | 1667 | 1779 |
train | validation | test | |
---|---|---|---|
web_nlg_ru | 14630 | 790 | 1102 |
train | validation | test_asset | test_turk | |
---|---|---|---|---|
wiki_auto_asset_turk | 373801 | 73249 | 359 | 359 |
train | validation | test | |
---|---|---|---|
wiki_lingua_es_en | 79515 | 8835 | 19797 |
train | validation | test | |
---|---|---|---|
wiki_lingua_ru_en | 36898 | 4100 | 9094 |
train | validation | test | |
---|---|---|---|
wiki_lingua_tr_en | 3193 | 355 | 808 |
train | validation | test | |
---|---|---|---|
wiki_lingua_vi_en | 9206 | 1023 | 2167 |
train | validation | test | |
---|---|---|---|
xsum | 23206 | 1117 | 1166 |
CC-BY-SA-4.0
@article{gem_benchmark, author = {Sebastian Gehrmann and Tosin P. Adewumi and Karmanya Aggarwal and Pawan Sasanka Ammanamanchi and Aremu Anuoluwapo and Antoine Bosselut and Khyathi Raghavi Chandu and Miruna{-}Adriana Clinciu and Dipanjan Das and Kaustubh D. Dhole and Wanyu Du and Esin Durmus and Ondrej Dusek and Chris Emezue and Varun Gangal and Cristina Garbacea and Tatsunori Hashimoto and Yufang Hou and Yacine Jernite and Harsh Jhamtani and Yangfeng Ji and Shailza Jolly and Dhruv Kumar and Faisal Ladhak and Aman Madaan and Mounica Maddela and Khyati Mahajan and Saad Mahamood and Bodhisattwa Prasad Majumder and Pedro Henrique Martins and Angelina McMillan{-}Major and Simon Mille and Emiel van Miltenburg and Moin Nadeem and Shashi Narayan and Vitaly Nikolaev and Rubungo Andre Niyongabo and Salomey Osei and Ankur P. Parikh and Laura Perez{-}Beltrachini and Niranjan Ramesh Rao and Vikas Raunak and Juan Diego Rodriguez and Sashank Santhanam and Jo{\~{a}}o Sedoc and Thibault Sellam and Samira Shaikh and Anastasia Shimorina and Marco Antonio Sobrevilla Cabezudo and Hendrik Strobelt and Nishant Subramani and Wei Xu and Diyi Yang and Akhila Yerukola and Jiawei Zhou}, title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and Metrics}, journal = {CoRR}, volume = {abs/2102.01672}, year = {2021}, url = {https://arxiv.org/abs/2102.01672}, archivePrefix = {arXiv}, eprint = {2102.01672} }
感谢 @yjernite 添加此数据集。