模型:
unikei/t5-base-split-and-rephrase
Split-and-rephrase是将复杂输入句子分成较短句子而保持其意义的任务。(Narayan et al., 2017)
例如:
Cystic Fibrosis (CF) is an autosomal recessive disorder that affects multiple organs, which is common in the Caucasian population, symptomatically affecting 1 in 2500 newborns in the UK, and more than 80,000 individuals globally.
可以分解成
Cystic Fibrosis is an autosomal recessive disorder that affects multiple organs.
Cystic Fibrosis is common in the Caucasian population.
Cystic Fibrosis affects 1 in 2500 newborns in the UK.
Cystic Fibrosis affects more than 80,000 individuals globally.
from transformers import T5Tokenizer, T5ForConditionalGeneration checkpoint="unikei/t5-base-split-and-rephrase" tokenizer = T5Tokenizer.from_pretrained(checkpoint) model = T5ForConditionalGeneration.from_pretrained(checkpoint) complex_sentence = "Cystic Fibrosis (CF) is an autosomal recessive disorder that \ affects multiple organs, which is common in the Caucasian \ population, symptomatically affecting 1 in 2500 newborns in \ the UK, and more than 80,000 individuals globally." complex_tokenized = tokenizer(complex_sentence, padding="max_length", truncation=True, max_length=256, return_tensors='pt') simple_tokenized = model.generate(complex_tokenized['input_ids'], attention_mask = complex_tokenized['attention_mask'], max_length=256, num_beams=5) simple_sentences = tokenizer.batch_decode(simple_tokenized, skip_special_tokens=True) print(simple_sentences) """ Output: Cystic Fibrosis is an autosomal recessive disorder that affects multiple organs. Cystic Fibrosis affects 1 in 2500 newborns in the UK. Cystic Fibrosis affects more than 80,000 individuals globally. Cystic Fibrosis is common in the Caucasian population. """