英文

BioGPT (Large) ? 在 ChatDoctor ? 上进行 QA 微调

Microsoft's BioGPT Large 在 ChatDoctor 数据集上进行了 QA 微调。

使用目的

这只是一个研究模型,不必超出这个范围使用。

局限性

待定

模型

Microsoft's BioGPT Large

在生物医学领域,预训练语言模型在一般自然语言领域的巨大成功的启发下,吸引了越来越多的关注。在一般语言领域的两个主要分支中,即 BERT(及其变种)和GPT(及其变种),第一个已在生物医学领域进行了广泛研究,如BioBERT和PubMedBERT。虽然它们在多个歧视性下游生物医学任务上取得了巨大成功,但缺乏生成能力限制了它们的应用范围。在本文中,我们提出了BioGPT,这是一个在大规模生物医学文献上进行预训练的领域特定生成 Transformer 语言模型。我们在六个生物医学自然语言处理任务上评估了BioGPT,并证明我们的模型在大多数任务上优于先前的模型。特别是,我们在BC5CDR、KD-DTI和DDI端到端关系提取任务上分别获得了44.98%、38.42%和40.76%的F1 分数,并在PubMedQA上获得了78.2%的准确率,创造了一个新记录。我们对文本生成的案例研究进一步证明了BioGPT在生物医学文献上为生物医学术语生成流畅描述的优势。

数据集

ChatDoctor-200K 数据集是从这篇论文 https://arxiv.org/pdf/2303.14070.pdf 中收集的。

该数据集由以下内容组成:

使用方法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig


model_id = "Narrativaai/BioGPT-Large-finetuned-chatdoctor"

tokenizer = AutoTokenizer.from_pretrained("microsoft/BioGPT-Large")

model = AutoModelForCausalLM.from_pretrained(model_id)

def answer_question(
        prompt,
        temperature=0.1,
        top_p=0.75,
        top_k=40,
        num_beams=2,
        **kwargs,
):
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].to("cuda")
    attention_mask = inputs["attention_mask"].to("cuda")
    generation_config = GenerationConfig(
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        num_beams=num_beams,
        **kwargs,
    )
    with torch.no_grad():
        generation_output = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
            max_new_tokens=512,
            eos_token_id=tokenizer.eos_token_id

        )
    s = generation_output.sequences[0]
    output = tokenizer.decode(s, skip_special_tokens=True)
    return output.split(" Response:")[1]

example_prompt = """
Below is an instruction that describes a task, paired with an input that provides further context.Write a response that appropriately completes the request.

### Instruction:
If you are a doctor, please answer the medical questions based on the patient's description.

### Input:
Hi i have sore lumps under the skin on my legs. they started on my left ankle and are approx 1 - 2cm diameter and are spreading up onto my thies. I am eating panadol night and anti allergy pills (Atarax). I have had this for about two weeks now. Please advise.

### Response:
"""

print(answer_question(example_prompt))

引用

@misc {narrativa_2023,
    author       = { {Narrativa} },
    title        = { BioGPT-Large-finetuned-chatdoctor (Revision 13764c0) },
    year         = 2023,
    url          = { https://huggingface.co/Narrativaai/BioGPT-Large-finetuned-chatdoctor },
    doi          = { 10.57967/hf/0601 },
    publisher    = { Hugging Face }
}