数据集:
LLukas22/cqadupstack
这是"cqadupstack"的经过预处理的版本,以便于使用huggingface方便地使用。原始数据集可以在 here 中找到。
CQADupStack是一个用于社区问答(cQA)研究的基准数据集。它包含来自十二个StackExchange1子论坛的主题,附带有重复问题信息的注释,并且具有预定义的训练、开发和测试拆分,用于检索和分类实验。
'train'的一个示例如下。
{ "question": "Very often, when some unknown company is calling me, in couple of seconds I see its name and logo on standard ...", "answer": "You didn't explicitely mention it, but from the context I assume you're using a device with Android 4.4 (Kitkat). With that ...", "title": "Why Dialer shows contact name and image, when contact is not in my address book?", "forum_tag": "android" }
所有拆分的数据字段是相同的。
此数据集以Apache 2.0许可分发。