模型:
imvladikon/t5-english-ner
这是一个简单的实验模型,它在很小的数据集上进行了3个时期的训练
from transformers import AutoTokenizer, AutoModelForTokenClassification, NerPipeline model = AutoModelForTokenClassification.from_pretrained("imvladikon/t5-english-ner", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("imvladikon/t5-english-ner", trust_remote_code=True) pipe = NerPipeline(model=model, tokenizer=tokenizer, aggregation_strategy="max") print(pipe("London is the capital city of England and the United Kingdom")) """ [{'entity_group': 'LOCATION', 'score': 0.84536326, 'word': 'London', 'start': 0, 'end': 6}, {'entity_group': 'LOCATION', 'score': 0.8957489, 'word': 'England', 'start': 30, 'end': 37}, {'entity_group': 'LOCATION', 'score': 0.73186326, 'word': 'UnitedKingdom', 'start': 46, 'end': 60}] """
pip install spacy transformers git+https://github.com/explosion/spacy-huggingface-pipelines -q
import spacy from spacy import displacy text = "My name is Sarah and I live in London" nlp = spacy.blank("en") nlp.add_pipe("hf_token_pipe", config={"model": "imvladikon/t5-english-ner", "kwargs": {"trust_remote_code":True}}) doc = nlp(text) print(doc.ents) # (Sarah, London)
这个模型是在私有(英文)数据集上对 t5-large 进行微调的版本。它在评估集上达到了以下结果:
需要更多信息
需要更多信息
需要更多信息
在训练过程中使用了以下超参数:
Training Loss | Epoch | Step | Validation Loss | Commercial Item Precision | Commercial Item Recall | Commercial Item F1 | Commercial Item Number | Date Precision | Date Recall | Date F1 | Date Number | Location Precision | Location Recall | Location F1 | Location Number | Organization Precision | Organization Recall | Organization F1 | Organization Number | Other Precision | Other Recall | Other F1 | Other Number | Person Precision | Person Recall | Person F1 | Person Number | Quantity Precision | Quantity Recall | Quantity F1 | Quantity Number | Title Precision | Title Recall | Title F1 | Title Number | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.8868 | 1.0 | 708 | 0.2725 | 0.0 | 0.0 | 0.0 | 1 | 0.8125 | 0.9286 | 0.8667 | 14 | 0.4167 | 0.75 | 0.5357 | 20 | 0.8272 | 0.8375 | 0.8323 | 80 | 1.0 | 0.0476 | 0.0909 | 21 | 0.8438 | 0.9310 | 0.8852 | 29 | 0.6667 | 0.7143 | 0.6897 | 14 | 0.0 | 0.0 | 0.0 | 7 | 0.7348 | 0.7151 | 0.7248 | 0.9446 |
0.2984 | 2.0 | 1416 | 0.2121 | 0.0 | 0.0 | 0.0 | 1 | 0.8667 | 0.9286 | 0.8966 | 14 | 0.5 | 0.8 | 0.6154 | 20 | 0.8375 | 0.8375 | 0.8375 | 80 | 0.3077 | 0.1905 | 0.2353 | 21 | 0.8182 | 0.9310 | 0.8710 | 29 | 0.7333 | 0.7857 | 0.7586 | 14 | 0.0 | 0.0 | 0.0 | 7 | 0.7077 | 0.7419 | 0.7244 | 0.9481 |
0.1729 | 3.0 | 2124 | 0.1956 | 0.0 | 0.0 | 0.0 | 1 | 0.8125 | 0.9286 | 0.8667 | 14 | 0.7143 | 0.75 | 0.7317 | 20 | 0.8588 | 0.9125 | 0.8848 | 80 | 0.3684 | 0.3333 | 0.35 | 21 | 0.8182 | 0.9310 | 0.8710 | 29 | 0.8 | 0.8571 | 0.8276 | 14 | 0.0 | 0.0 | 0.0 | 7 | 0.75 | 0.7903 | 0.7696 | 0.9534 |