模型:
flax-community/t5-v1_1-base-wikisplit
任务:
文生文预印本库:
arxiv:1907.12461句子分割是将长句子分成多个句子的任务。例如:
Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.
可以分割为
Mary likes to play football in her freetime whenever she meets with her friends.
Her friends are very nice people.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-v1_1-base-wikisplit") model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-v1_1-base-wikisplit") complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ." sample_tokenized = tokenizer(complex_sentence, return_tensors="pt") answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5) gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True) gene_sentence """ Output: This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director. """
Model | Exact | SARI | BLEU |
---|---|---|---|
1236321 | 17.93 | 67.5438 | 76.9 |
1237321 | 18.1207 | 67.4873 | 76.9478 |
1238321 | 11.3582 | 67.2685 | 73.1682 |
1239321 | 18.6632 | 68.0501 | 77.1881 |