模型:
BeIR/query-gen-msmarco-t5-base-v1
这个模型是来自 docTTTTTquery 的 t5-base 模型。
t5-base 模型是在 MS MARCO Passage Dataset 上训练的,其中包含大约 500k 来自必应的真实搜索查询和相关段落。
该模型可用于查询生成,以学习语义搜索模型而不需要注释的训练数据: Synthetic Query Generation 。
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained('model-name') model = T5ForConditionalGeneration.from_pretrained('model-name') para = "Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects." input_ids = tokenizer.encode(para, return_tensors='pt') outputs = model.generate( input_ids=input_ids, max_length=64, do_sample=True, top_p=0.95, num_return_sequences=3) print("Paragraph:") print(para) print("\nGenerated Queries:") for i in range(len(outputs)): query = tokenizer.decode(outputs[i], skip_special_tokens=True) print(f'{i + 1}: {query}')