模型:
Narrativaai/BioGPT-Large-finetuned-chatdoctor
Microsoft's BioGPT Large 在 ChatDoctor 数据集上进行了 QA 微调。
这只是一个研究模型,不必超出这个范围使用。
待定
在生物医学领域,预训练语言模型在一般自然语言领域的巨大成功的启发下,吸引了越来越多的关注。在一般语言领域的两个主要分支中,即 BERT(及其变种)和GPT(及其变种),第一个已在生物医学领域进行了广泛研究,如BioBERT和PubMedBERT。虽然它们在多个歧视性下游生物医学任务上取得了巨大成功,但缺乏生成能力限制了它们的应用范围。在本文中,我们提出了BioGPT,这是一个在大规模生物医学文献上进行预训练的领域特定生成 Transformer 语言模型。我们在六个生物医学自然语言处理任务上评估了BioGPT,并证明我们的模型在大多数任务上优于先前的模型。特别是,我们在BC5CDR、KD-DTI和DDI端到端关系提取任务上分别获得了44.98%、38.42%和40.76%的F1 分数,并在PubMedQA上获得了78.2%的准确率,创造了一个新记录。我们对文本生成的案例研究进一步证明了BioGPT在生物医学文献上为生物医学术语生成流畅描述的优势。
ChatDoctor-200K 数据集是从这篇论文 https://arxiv.org/pdf/2303.14070.pdf 中收集的。
该数据集由以下内容组成:
来自 HealthCareMagic.com 的10万真实患者与医生之间的对话 HealthCareMagic-100k 。
来自 icliniq.com 的1万真实患者与医生之间的对话 icliniq-10k 。
来自 ChatGPT 的5千个患者与医师之间的生成对话 GenMedGPT-5k 和 disease database 。
import torch from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig model_id = "Narrativaai/BioGPT-Large-finetuned-chatdoctor" tokenizer = AutoTokenizer.from_pretrained("microsoft/BioGPT-Large") model = AutoModelForCausalLM.from_pretrained(model_id) def answer_question( prompt, temperature=0.1, top_p=0.75, top_k=40, num_beams=2, **kwargs, ): inputs = tokenizer(prompt, return_tensors="pt") input_ids = inputs["input_ids"].to("cuda") attention_mask = inputs["attention_mask"].to("cuda") generation_config = GenerationConfig( temperature=temperature, top_p=top_p, top_k=top_k, num_beams=num_beams, **kwargs, ) with torch.no_grad(): generation_output = model.generate( input_ids=input_ids, attention_mask=attention_mask, generation_config=generation_config, return_dict_in_generate=True, output_scores=True, max_new_tokens=512, eos_token_id=tokenizer.eos_token_id ) s = generation_output.sequences[0] output = tokenizer.decode(s, skip_special_tokens=True) return output.split(" Response:")[1] example_prompt = """ Below is an instruction that describes a task, paired with an input that provides further context.Write a response that appropriately completes the request. ### Instruction: If you are a doctor, please answer the medical questions based on the patient's description. ### Input: Hi i have sore lumps under the skin on my legs. they started on my left ankle and are approx 1 - 2cm diameter and are spreading up onto my thies. I am eating panadol night and anti allergy pills (Atarax). I have had this for about two weeks now. Please advise. ### Response: """ print(answer_question(example_prompt))
@misc {narrativa_2023, author = { {Narrativa} }, title = { BioGPT-Large-finetuned-chatdoctor (Revision 13764c0) }, year = 2023, url = { https://huggingface.co/Narrativaai/BioGPT-Large-finetuned-chatdoctor }, doi = { 10.57967/hf/0601 }, publisher = { Hugging Face } }