模型:
BeIR/query-gen-msmarco-t5-large-v1
此模型是来自 docTTTTTquery 的t5-base模型。
T5-base模型在 MS MARCO Passage Dataset 上进行了训练,该数据集包含来自必应的约500,000个真实搜索查询以及相关的段落。
该模型可以用于查询生成,以学习语义搜索模型,而不需要注释的训练数据: Synthetic Query Generation 。
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained('model-name') model = T5ForConditionalGeneration.from_pretrained('model-name') para = "Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects." input_ids = tokenizer.encode(para, return_tensors='pt') outputs = model.generate( input_ids=input_ids, max_length=64, do_sample=True, top_p=0.95, num_return_sequences=3) print("Paragraph:") print(para) print("\nGenerated Queries:") for i in range(len(outputs)): query = tokenizer.decode(outputs[i], skip_special_tokens=True) print(f'{i + 1}: {query}')