正在建设中的自述文件
ner-legal-bert-base-cased-ptbr是一个用于葡萄牙语法律领域的NER模型(令牌分类),该模型通过使用NER目标从模型 dominguesm/legal-bert-base-cased-ptbr 进行了微调。
该模型旨在辅助法律领域、计算法学和法律技术应用的自然语言处理研究。使用了葡萄牙语的多个法律文本(详细信息如下),并使用了以下标签:
这些标签受到了数据集 LeNER_br 的启发。
ner-legal-bert-base-cased-ptbr的数据集包括:
使用的数据由巴西联邦最高法院提供,遵循以下使用条款: LREC 2020 。
本项目的结果不以任何方式暗示巴西联邦最高法院的立场,所有责任均由该模型的作者承担。
from transformers import AutoModelForTokenClassification, AutoTokenizer import torch # parameters model_name = "dominguesm/ner-legal-bert-base-cased-ptbr" model = AutoModelForTokenClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) input_text = "Acrescento que não há de se falar em violação do artigo 114, § 3º, da Constituição Federal, posto que referido dispositivo revela-se impertinente, tratando da possibilidade de ajuizamento de dissídio coletivo pelo Ministério Público do Trabalho nos casos de greve em atividade essencial." # tokenization inputs = tokenizer(input_text, max_length=512, truncation=True, return_tensors="pt") tokens = inputs.tokens() # get predictions outputs = model(**inputs).logits predictions = torch.argmax(outputs, dim=2) # print predictions for token, prediction in zip(tokens, predictions[0].numpy()): print((token, model.config.id2label[prediction]))
您也可以使用pipeline。但是,输入序列的最大长度似乎存在问题。
from transformers import pipeline model_name = "dominguesm/ner-legal-bert-base-cased-ptbr" ner = pipeline( "ner", model=model_name ) ner(input_text, aggregation_strategy="average")
Num examples = 971932 Num Epochs = 3 Instantaneous batch size per device = 64 Total train batch size (w. parallel, distributed & accumulation) = 128 Gradient Accumulation steps = 2 Total optimization steps = 22779 Evaluation Infos: Num examples = 53996 Batch size = 128
Step | Training Loss | Validation Loss | Precision | Recall | F1 Accuracy |
---|---|---|---|---|---|
1000 | 0.113900 | 0.057008 | 0.898600 | 0.938444 | 0.918090 |
2000 | 0.052800 | 0.048254 | 0.917243 | 0.941188 | 0.929062 |
3000 | 0.046200 | 0.043833 | 0.919706 | 0.948411 | 0.933838 |
4000 | 0.043500 | 0.039796 | 0.928439 | 0.947058 | 0.937656 |
5000 | 0.041400 | 0.039421 | 0.926103 | 0.952857 | 0.939290 |
6000 | 0.039700 | 0.038599 | 0.922376 | 0.956257 | 0.939011 |
7000 | 0.037800 | 0.036463 | 0.935125 | 0.950937 | 0.942964 |
8000 | 0.035900 | 0.035706 | 0.934638 | 0.954147 | 0.944292 |
9000 | 0.033800 | 0.034518 | 0.940354 | 0.951991 | 0.946136 |
10000 | 0.033600 | 0.033454 | 0.938170 | 0.956097 | 0.947049 |
11000 | 0.032700 | 0.032899 | 0.934130 | 0.959491 | 0.946641 |
12000 | 0.032200 | 0.032477 | 0.937400 | 0.959150 | 0.948151 |
13000 | 0.031200 | 0.033207 | 0.937058 | 0.960506 | 0.948637 |
14000 | 0.031400 | 0.031711 | 0.938765 | 0.959711 | 0.949123 |
15000 | 0.030600 | 0.031519 | 0.940488 | 0.959413 | 0.949856 |
16000 | 0.028500 | 0.031618 | 0.943643 | 0.957693 | 0.950616 |
17000 | 0.028000 | 0.031106 | 0.941109 | 0.960687 | 0.950797 |
18000 | 0.027800 | 0.030712 | 0.942821 | 0.960528 | 0.951592 |
19000 | 0.027500 | 0.030523 | 0.942950 | 0.960947 | 0.951864 |
20000 | 0.027400 | 0.030577 | 0.942462 | 0.961754 | 0.952010 |
21000 | 0.027000 | 0.030025 | 0.944483 | 0.960497 | 0.952422 |
22000 | 0.026800 | 0.030162 | 0.943868 | 0.961418 | 0.952562 |
Label | Precision | Recall | F1 Accuracy | Entity Examples |
---|---|---|---|---|
JURISPRUDENCIA | 0.8795197115548148 | 0.9037275221501844 | 0.8914593047810311 | 57223 |
LEGISLACAO | 0.9405395935529082 | 0.9514071028567378 | 0.9459421362370934 | 84642 |
LOCAL | 0.9011495452253004 | 0.9132358124779697 | 0.9071524233856495 | 56740 |
ORGANIZACAO | 0.9239028155165304 | 0.954964947845235 | 0.9391771163875446 | 183013 |
PESSOA | 0.9651685220572037 | 0.9738545198908279 | 0.9694920661875761 | 193456 |
TEMPO | 0.973704616066295 | 0.9918808401799004 | 0.9827086882453152 | 186103 |