英文

T5-base fine-tuned fo News Summarization ?✏️?

Abhishek Kumar Mishra 负责所有归属

News Summary 数据集上对 Google's T5 进行了基于T5的fine-tuning,用于摘要的下游任务。

T5的详细信息

T5模型是由Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu于 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 提出的,在这里是摘要:

转移学习在自然语言处理(NLP)中已经成为一种强大的技术,其中模型首先在数据丰富的任务上进行预训练,然后在下游任务上进行微调。转移学习的有效性催生了各种方法、方法论和实践。在本文中,我们通过引入一个统一的框架,将每个语言问题转化为文本到文本的格式,来探索NLP中的转移学习技术的潜力。我们的系统研究比较了预训练目标、架构、无标注数据集、转移方法和其他因素在数十个语言理解任务上的表现。通过将我们的探索结果与规模和我们的新的“巨大干净爬行语料库”的见解相结合,我们在许多涵盖摘要、问题回答、文本分类等基准上取得了最先进的结果。为了促进NLP的转移学习的未来工作,我们发布了我们的数据集、预训练模型和代码。

下游任务(摘要)的详细信息-数据集?

News Summary

数据集包含4515个示例,包括作者姓名、标题、文章URL、简短文本和完整文章。我从Inshorts收集了摘要的新闻,并只从Hindu、Indian times和Guardian爬取了新闻文章。时间跨度从2017年2月到8月。

模型微调?️‍

训练脚本是由 Abhishek Kumar Mishra 创建的 this Colab Notebook 的稍作修改版本,所以所有功劳归于他!我还将模型训练了更多的时代(6个时代)。

模型展示?

from transformers import AutoModelWithLMHead, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")

def summarize(text, max_length=150):
  input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True)

  generated_ids = model.generate(input_ids=input_ids, num_beams=2, max_length=max_length,  repetition_penalty=2.5, length_penalty=1.0, early_stopping=True)

  preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]

  return preds[0]

给定来自NYT(2020/06/09)的以下文章,标题为George Floyd’s death energized a movement. He will be buried in Houston today:

在声音和愤怒之后,数周的示威和对种族正义的痛苦呼吁,这位男子的死亡引发了全球运动,他的最后几句话:“我无法呼吸”,成为了一个集结的口号,将在休斯顿的私人葬礼上安息。乔治·弗洛伊德在46岁时去世,然后将被埋葬在他母亲的墓旁。这次葬礼将于上午11点在Fountain of Praise教堂举行,此前已在明尼阿波利斯、北卡罗来纳州和休斯顿进行了五天的公众追悼会,并在明尼阿波利斯一名警察在视频中将膝盖压在Floyd先生的脖子上近九分钟后,弗洛伊德先生死亡。该警官德里克·肖万因涉嫌二级谋杀和二级过失杀人被起诉。他在周一的一次法庭出庭时,被判保释金为125万美元。弗洛伊德死后的愤怒和愤怒表达,以及示威在他死亡地点的紧张混乱示威向罗马和里约热内卢等国际运动的迅速传播,反映出多年来观察到黑人在警察或私刑者手中死亡,并要求改革而不见改善的深深失望。

summarize('After the sound and the fury, weeks of demonstrations and anguished calls for racial justice, the man whose death gave rise to an international movement, and whose last words — “I can’t breathe” — have been a rallying cry, will be laid to rest on Tuesday at a private funeral in Houston.George Floyd, who was 46, will then be buried in a grave next to his mother’s.The service, scheduled to begin at 11 a.m. at the Fountain of Praise church, comes after five days of public memorials in Minneapolis, North Carolina and Houston and two weeks after a Minneapolis police officer was caught on video pressing his knee into Mr. Floyd’s neck for nearly nine minutes before Mr. Floyd died. That officer, Derek Chauvin, has been charged with second-degree murder and second-degree manslaughter. His bail was set at $1.25 million in a court appearance on Monday. The outpouring of anger and outrage after Mr. Floyd’s death — and the speed at which protests spread from tense, chaotic demonstrations in the city where he died to an international movement from Rome to Rio de Janeiro — has reflected the depth of frustration borne of years of watching black people die at the hands of the police or vigilantes while calls for change went unmet.', 80)

我们将得到:

在休斯顿举行的私人葬礼上。弗洛伊德去世时46岁,将被安葬在母亲的坟墓旁边。一名明尼苏达州的警察在录像中对弗洛伊德先生的脖子压了近九分钟后,他去世了。警官因涉嫌二级过失杀人罪被控,并被设定了125万美元的保释金。

创建者 Manuel Romero/@mrm8488 | LinkedIn

由西班牙的♥制作