数据集:
Muennighoff/xwinograd
多语种Winograd模式挑战,如在 Crosslingual Generalization through Multitask Finetuning 中使用。
此数据集中的Winograd模式挑战是从Tikhonov等人引入的XWinograd数据集中组合的,由于其中只包含16个中文模式,我们从clue/cluewsc2020中添加了488个中文模式。
如果只想获取原始的XWinograd中文模式,请执行以下操作:
load_dataset("Muennighoff/xwinograd", "zh")["test"][0][:16]
@misc{muennighoff2022crosslingual, title={Crosslingual Generalization through Multitask Finetuning}, author={Niklas Muennighoff and Thomas Wang and Lintang Sutawika and Adam Roberts and Stella Biderman and Teven Le Scao and M Saiful Bari and Sheng Shen and Zheng-Xin Yong and Hailey Schoelkopf and Xiangru Tang and Dragomir Radev and Alham Fikri Aji and Khalid Almubarak and Samuel Albanie and Zaid Alyafeai and Albert Webson and Edward Raff and Colin Raffel}, year={2022}, eprint={2211.01786}, archivePrefix={arXiv}, primaryClass={cs.CL} }
@misc{tikhonov2021heads, title={It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning}, author={Alexey Tikhonov and Max Ryabinin}, year={2021}, eprint={2106.12066}, archivePrefix={arXiv}, primaryClass={cs.CL} }
像原始的 English winograd schema challenge 一样,该数据集的许可证为 CC BY 4.0 。即可以用于商业目的等等。 :)
感谢Jordan Clive、@yongzx和@khalidalt对添加中文的支持。