数据集:
pszemraj/qmsum-cleaned
值得注意的是,每个"文档"在输入中都以对模型应该做什么的问题/提示为前缀。您可能希望以某种方式明确处理此问题,或者在基于此数据集训练的模型前加上前缀。
在训练集分割中,最常见的"前缀"以 sentence-splitter 分隔:
Sentence | Count | |
---|---|---|
0 | Summarize the whole meeting. | 121 |
1 | Summarize the meeting | 25 |
2 | What did the team discuss about the product cost? | 4 |
3 | How did Marketing design the product evaluation? | 4 |
4 | Summarize the wrap up of the meeting. | 3 |
5 | What did the group discuss about user requirements of the new remote control? | 3 |
6 | What did the team discuss during the product evaluation? | 3 |
7 | Summarize the meeting. | 2 |
8 | Summarize what was said about digits form | 2 |
9 | What was discussed in the meeting? | 2 |
以词云形式可视化(训练集分割):