以翻译的PAWS数据集进行训练的IndoT5-base模型。
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Wikidepia/IndoT5-base-paraphrase") model = AutoModelForSeq2SeqLM.from_pretrained("Wikidepia/IndoT5-base-paraphrase") sentence = "Anak anak melakukan piket kelas agar kebersihan kelas terjaga" text = "paraphrase: " + sentence + " </s>" encoding = tokenizer(text, padding='longest', return_tensors="pt") outputs = model.generate( input_ids=encoding["input_ids"], attention_mask=encoding["attention_mask"], max_length=512, do_sample=True, top_k=200, top_p=0.95, early_stopping=True, num_return_sequences=5 )
有时改写的内容中包含原始文本中不存在的日期:/
感谢Tensorflow Research Cloud提供的TPU v3-8s。