一种用于执行各种生物医学任务的基于指令的统一模型。
你可能想要查看:
本工作探讨了指令提示对生物医学多任务学习的影响。我们引入了BoX,一个包含32个生物医学自然语言处理任务的指令任务集,涵盖了各种领域。利用这个元数据集,我们提出了一种统一模型称为In-BoXBART,可以在没有任何特定任务模块的情况下共同学习BoX的所有任务。据我们所知,这是首次在生物医学领域中提出统一模型,并使用指令实现在多个生物医学任务之间的泛化。
可以使用该模型生成文本,用于实验和理解其能力。但不应将其直接用于对人员产生直接影响的生产或工作。
您可以使用Transformers很方便地加载模型,而不是手动下载它们。BART-base模型是我们模型的基础。以下是如何在PyTorch中使用该模型的示例:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("cogint/in-boxbart") model = AutoModelForSeq2SeqLM.from_pretrained("cogint/in-boxbart")
或者只需克隆模型存储库
git lfs install git clone https://huggingface.co/cogint/in-boxbart
在这里,我们提供了“文档分类”(HoC数据集)任务的示例。一旦您从huggigface加载模型进行推理,您可以将./templates下针对特定数据集的指令附加到输入实例中。下面是一个实例的示例。
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("cogint/in-boxbart") model = AutoModelForSeq2SeqLM.from_pretrained("cogint/in-boxbart") # Input shows how we have appended instruction from our file for HoC dataset with instance. input = "Instruction: Definition: In this task, you are given a medical text related to cancer. Your job is to classify into zero or more classes from (1) Sustaining proliferative signaling, (2) Resisting cell death, (3) Genomic instability and mutation, (4) Activating invasion and metastasis, (5) Tumor promoting inflammation, (6) Evading growth suppressors, (7) Inducing angiogenesis (8) Enabling replicative immortality, (9) Avoiding immune destruction and (10) Cellular energetics., Positive Examples: [[input: Studies of cell-cycle progression showed that the anti-proliferative effect of Fan was associated with an increase in the G1/S phase of PC3 cells ., output: Evading growth suppressors, Sustaining proliferative signaling, explanation: Given text is classified into two categories, hence, generated label is 'Evading growth suppressors, Sustaining proliferative signaling'.] ]; Instance: input: Similar to previous studies utilizing IGF-1 , pretreatment with Roscovitine leads to a significant up-regulation of p21 expression and a significant decrease in the number of PCNA positive cells ., output: ?" tokenized_input= tokenizer(input) # Ideal output for this input is 'Sustaining proliferative signaling' output = model(tokenized_input)
如果您使用了我们的模型,请引用我们的论文:
@inproceedings{parmar-etal-2022-boxbart, title = "In-{B}o{XBART}: Get Instructions into Biomedical Multi-Task Learning", author = "Parmar, Mihir and Mishra, Swaroop and Purohit, Mirali and Luo, Man and Mohammad, Murad and Baral, Chitta", booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-naacl.10", doi = "10.18653/v1/2022.findings-naacl.10", pages = "112--128", }