d4data/biomedical-ner-all | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

d4data/biomedical-ner-all

任务:

标记分类

类库:

PyTorch Safetensors Transformers

语言:

其他:

distilbert Token Classification Carbon Emissions AutoTrain Compatible Token+Classification

许可:

apache-2.0

模型介绍文件清单

英文

关于该模型

这是一个使用Maccrobat训练的英文命名实体识别模型，用于从给定的文本语料库（病例报告等）中识别生物医学实体（107个实体）。该模型是在distilbert-base-uncased的基础上构建的。

数据集：Maccrobat https://figshare.com/articles/dataset/MACCROBAT2018/9764942
碳排放：0.0279399890043426公斤
训练时间：30.16527分钟
使用的GPU：1 x GeForce RTX 3060 Laptop GPU

请查看教程视频以了解有关该模型和对应的Python库的说明： https://youtu.be/xpiDPdBpS18

使用方法

最简单的方法是从Huggingface加载推理API，第二种方法是通过transformers库提供的pipeline对象。

from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("d4data/biomedical-ner-all")
model = AutoModelForTokenClassification.from_pretrained("d4data/biomedical-ner-all")

pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # pass device=0 if using gpu
pipe("""The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.""")

作者

该模型是由Deepak John Reji和Shaina Raza开展的“生物医学领域的人工智能”研究主题的一部分。如果您使用了这个工作（代码、模型或数据集），请在以下位置给予星标：

https://github.com/dreji18/Bio-Epidemiology-NER

您可以在此支持我 :)

作者:

D 4 Data Community

数据集大小:

507.75 MB