英文

Pegasus-x-sumstew

模型描述

此模型是对Pegasus-x-large模型在CNN-Dailymail,Samsum,Booksum和Laysum数据集的筛选子集上进行微调的版本。它可以生成长文本的抽象摘要。

预期用途和限制

此模型可用于对英文长文本进行摘要,例如学术记录,会议纪要或文献。它不适用于对短文本(如推文,标题或字幕)进行摘要。如果输入文本包含事实错误,俚语或冒犯性语言,则该模型可能会产生不准确或有偏见的摘要。

使用方法

您可以使用transformers库中的pipeline函数与此模型一起使用:

from transformers import pipeline

summarizer = pipeline("summarization", "joemgu/pegasus-x-sumstew")
text = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?' So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. In another moment down went Alice after it, never once considering how in the world she was to get out again."
summary = summarizer(text,
                     num_beams=8,
                     repetition_penalty=3.5,
                     no_repeat_ngram_size=4,
                     encoder_no_repeat_ngram_size=4
    )[0]["summary_text"]
print(summary)

输出:

Alice is a bored and curious girl who follows a White Rabbit with a watch into a rabbit-hole. She enters a strange world where she has many adventures and meets many peculiar creatures.

训练数据

该模型在CNN-Dailymail,Samsum,Booksum和Laysum数据集的筛选子集上进行了微调。这些数据集包含各种类型的文本及其抽象摘要。选择的子集仅包括长度超过1000个词并且摘要长度小于100个词的文本。子集总大小约为15万个示例。

评估结果

待办事项

限制和偏见

该模型可能从预训练的Pegasus-x-large模型和微调数据集中继承了一些限制和偏见。一些可能存在的偏见来源包括:

  • Pegasus-x-large预训练模型是在来自不同来源的大量英文文本语料库上训练的,这可能无法反映不同语言和文化的多样性和细微差别。
  • 微调数据集是从不同领域和流派中收集的,这些领域和流派可能具有自己的文体惯例和对某些主题和事件的观点。
  • 微调数据集仅包含抽象摘要,可能无法捕捉原始文本的所有重要信息和细微差别。
  • 微调数据集仅涵盖某些时间段和来源的文本,可能无法反映当前事务和趋势。

因此,用户在使用此模型时应注意这些限制和偏见,并评估其在特定用例中的性能和适用性。