
T5-base fine-tuned fo News Summarization ?✏️?

Abhishek Kumar Mishra 负责所有归属

News Summary 数据集上对 Google's T5 进行了基于T5的fine-tuning,用于摘要的下游任务。


T5模型是由Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu于 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 提出的,在这里是摘要:



News Summary

数据集包含4515个示例,包括作者姓名、标题、文章URL、简短文本和完整文章。我从Inshorts收集了摘要的新闻,并只从Hindu、Indian times和Guardian爬取了新闻文章。时间跨度从2017年2月到8月。


训练脚本是由 Abhishek Kumar Mishra 创建的 this Colab Notebook 的稍作修改版本,所以所有功劳归于他!我还将模型训练了更多的时代(6个时代)。


from transformers import AutoModelWithLMHead, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")

def summarize(text, max_length=150):
  input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True)

  generated_ids = model.generate(input_ids=input_ids, num_beams=2, max_length=max_length,  repetition_penalty=2.5, length_penalty=1.0, early_stopping=True)

  preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]

  return preds[0]

给定来自NYT(2020/06/09)的以下文章,标题为George Floyd’s death energized a movement. He will be buried in Houston today:

在声音和愤怒之后,数周的示威和对种族正义的痛苦呼吁,这位男子的死亡引发了全球运动,他的最后几句话:“我无法呼吸”,成为了一个集结的口号,将在休斯顿的私人葬礼上安息。乔治·弗洛伊德在46岁时去世,然后将被埋葬在他母亲的墓旁。这次葬礼将于上午11点在Fountain of Praise教堂举行,此前已在明尼阿波利斯、北卡罗来纳州和休斯顿进行了五天的公众追悼会,并在明尼阿波利斯一名警察在视频中将膝盖压在Floyd先生的脖子上近九分钟后,弗洛伊德先生死亡。该警官德里克·肖万因涉嫌二级谋杀和二级过失杀人被起诉。他在周一的一次法庭出庭时,被判保释金为125万美元。弗洛伊德死后的愤怒和愤怒表达,以及示威在他死亡地点的紧张混乱示威向罗马和里约热内卢等国际运动的迅速传播,反映出多年来观察到黑人在警察或私刑者手中死亡,并要求改革而不见改善的深深失望。

summarize('After the sound and the fury, weeks of demonstrations and anguished calls for racial justice, the man whose death gave rise to an international movement, and whose last words — “I can’t breathe” — have been a rallying cry, will be laid to rest on Tuesday at a private funeral in Houston.George Floyd, who was 46, will then be buried in a grave next to his mother’s.The service, scheduled to begin at 11 a.m. at the Fountain of Praise church, comes after five days of public memorials in Minneapolis, North Carolina and Houston and two weeks after a Minneapolis police officer was caught on video pressing his knee into Mr. Floyd’s neck for nearly nine minutes before Mr. Floyd died. That officer, Derek Chauvin, has been charged with second-degree murder and second-degree manslaughter. His bail was set at $1.25 million in a court appearance on Monday. The outpouring of anger and outrage after Mr. Floyd’s death — and the speed at which protests spread from tense, chaotic demonstrations in the city where he died to an international movement from Rome to Rio de Janeiro — has reflected the depth of frustration borne of years of watching black people die at the hands of the police or vigilantes while calls for change went unmet.', 80)



创建者 Manuel Romero/@mrm8488 | LinkedIn
