英文

模型ID 的模型卡片

这个模型是在 XLSum dataset 上进行微调的版本,旨在进行抽象的多语言摘要。

它在评估集上达到以下结果:

  • rouge-1: 18.2
  • rouge-2: 7.6
  • rouge-l: 14.9
  • rouge-lsum: 14.7

数据集描述

XLSum dataset 是一个包含135万篇经过专业注释的文章-摘要对的全面且多样化的数据集,这些对是通过一组精心设计的启发式方法提取的BBC。该数据集涵盖45种语言,从资源低到资源高,其中许多语言目前没有公开数据集。XL-Sum是高度抽象、简洁且高质量的,这些都通过人工和内部评估得出的。

语言

  • 阿姆哈拉语
  • 阿拉伯语
  • 阿塞拜疆语
  • 孟加拉语
  • 缅甸语
  • 简体中文
  • 繁体中文
  • 英语
  • 法语
  • 古吉拉特语
  • 豪萨语
  • 印地语
  • 伊博语
  • 印度尼西亚语
  • 日语
  • 克鲁恩迪语
  • 韩语
  • 吉尔吉斯语
  • 马拉地语
  • 尼泊尔语
  • 奥罗莫语
  • 普什图语
  • 波斯语
  • 爱尔兰裔斯科特语
  • 塞尔维亚文(西里尔字母)
  • 塞尔维亚文(拉丁字母)
  • 僧伽罗语
  • 索马里语
  • 西班牙语
  • 斯瓦希里语
  • 泰米尔语
  • 泰卢固语
  • 泰语
  • 提格利尼亚语
  • 土耳其语
  • 乌克兰语
  • 乌尔都语
  • 乌兹别克语
  • 越南语
  • 威尔士语
  • 约鲁巴语

训练超参数

模型在AWS Sagemaker的p4d.24xlarge实例上进行训练,以下是配置信息:

  • 模型: deltalm base
  • 批次大小: 8
  • 学习率: 1e-5
  • 训练轮数: 3
  • 热身步数: 500
  • 权重衰减: 0.01

推断示例

from modeling_deltalm import DeltalmForConditionalGeneration  # download from https://huggingface.co/hhhhzy/deltalm-base-xlsum/blob/main/modeling_deltalm.py
from configuration_deltalm import DeltalmConfig      # download from https://huggingface.co/hhhhzy/deltalm-base-xlsum/blob/main/configuration_deltalm.py
from transformers import AutoTokenizer                        

model = DeltalmForConditionalGeneration.from_pretrained("hhhhzy/deltalm-base-xlsum")
tokenizer = AutoTokenizer.from_pretrained("hhhhzy/deltalm-base-xlsum")

text = "The USA’s biggest sports league, the NFL, has extended its partnership with Amazon Prime, granting the streaming platform an additional live game on ‘black Friday’, the day after Thanksgiving. The additional game, added from 2023, builds on Amazon Prime’s package of ‘Thursday night football’ live rights (secured in an 11-year deal).\\nOn the surface, the deal makes sense because it gives Amazon Prime additional game time during the holiday season. But there is a deeper motivation at play. Black Friday is also regarded as the starting point of the pre-Christmas shopping season. Amazon has worked hard to leverage its sports rights in a way that benefits its ecommerce platform, so the addition of this fixture will boost that strategic goal.\\nIt’s unusual for sports rights holders to utilise their inventory in such a granular way – but it does suggest a shift towards a more data-driven approach to negotiations. For NFL, the deal means it now has partnerships with NBC, CBS, Fox and Amazon across the Thanksgiving period. Amazon Prime is currently in the NFL’s good books, helping revitalise the Thursday night slot through its marketing support and onscreen investment. Around 10 million people in the US are watching live fixtures each week."
inputs = tokenizer(text, max_length=512, return_tensors="pt")

generate_ids = model.generate(inputs["input_ids"], min_length=32, max_length=128)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]