数据集:

pszemraj/qmsum-cleaned

语言:

en

源数据集:

tau/scrolls

许可:

apache-2.0
英文

qmsum-cleaned

前缀

值得注意的是,每个"文档"在输入中都以对模型应该做什么的问题/提示为前缀。您可能希望以某种方式明确处理此问题,或者在基于此数据集训练的模型前加上前缀。

在训练集分割中,最常见的"前缀"以 sentence-splitter 分隔:

Sentence Count
0 Summarize the whole meeting. 121
1 Summarize the meeting 25
2 What did the team discuss about the product cost? 4
3 How did Marketing design the product evaluation? 4
4 Summarize the wrap up of the meeting. 3
5 What did the group discuss about user requirements of the new remote control? 3
6 What did the team discuss during the product evaluation? 3
7 Summarize the meeting. 2
8 Summarize what was said about digits form 2
9 What was discussed in the meeting? 2

词云

以词云形式可视化(训练集分割):

标记计数