英文

英文句子分割的T5模型

句子分割是将长句子分成多个句子的任务。例如:

Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.

可以分割为

Mary likes to play football in her freetime whenever she meets with her friends.
Her friends are very nice people.

如何在您的代码中使用它:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-v1_1-base-wikisplit")
model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-v1_1-base-wikisplit")

complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ."
sample_tokenized = tokenizer(complex_sentence, return_tensors="pt")

answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5)
gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True)
gene_sentence

"""
Output:
This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director.
"""

数据集:

Wiki_Split

当前基线来自 paper

我们的结果:

Model Exact SARI BLEU
1236321 17.93 67.5438 76.9
1237321 18.1207 67.4873 76.9478
1238321 11.3582 67.2685 73.1682
1239321 18.6632 68.0501 77.1881