模型:

dlicari/Italian-Legal-BERT

英文

ITALIAN-LEGAL-BERT:一个用于意大利法律的预训练Transformer语言模型

ITALIAN-LEGAL-BERT基于 bert-base-italian-xxl-cased ,并在意大利民法语料库上对意大利BERT模型进行了额外的预训练。在不同领域特定任务中,它比“通用”意大利BERT获得更好的结果。

ITALIAN-LEGAL-BERT变种[NEW!!!]

对于长文档

注意:我们正在撰写扩展版论文,其中包含更多细节和这些新模型的结果。我们会尽快更新您

训练过程

我们使用ITALIAN XXL BERT初始化ITALIAN-LEGAL-BERT,并使用Huggingface PyTorch-Transformers库在全国法律判例库中的3.7GB预处理文本上额外预训练了4个epoch。我们使用BERT架构,顶部带有语言建模头部,AdamW优化器,初始学习率为5e-5(线性衰减学习率,最终为2.525e-9),序列长度为512,批量大小为10(由GPU容量限制),840万个训练步骤,设备为1*GPU V100 16GB

用法

ITALIAN-LEGAL-BERT模型可以这样加载:

from transformers import AutoModel, AutoTokenizer
model_name = "dlicari/Italian-Legal-BERT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

您可以使用Transformers库的fill-mask pipeline以ITALIAN-LEGAL-BERT进行推理。

from transformers import pipeline
model_name = "dlicari/Italian-Legal-BERT"
fill_mask = pipeline("fill-mask", model_name)
fill_mask("Il [MASK] ha chiesto revocarsi l'obbligo di pagamento")
#[{'sequence': "Il ricorrente ha chiesto revocarsi l'obbligo di pagamento",'score': 0.7264330387115479},
# {'sequence': "Il convenuto ha chiesto revocarsi l'obbligo di pagamento",'score': 0.09641049802303314},
# {'sequence': "Il resistente ha chiesto revocarsi l'obbligo di pagamento",'score': 0.039877112954854965},
# {'sequence': "Il lavoratore ha chiesto revocarsi l'obbligo di pagamento",'score': 0.028993653133511543},
# {'sequence': "Il Ministero ha chiesto revocarsi l'obbligo di pagamento", 'score': 0.025297977030277252}]

在这个 COLAB: ITALIAN-LEGAL-BERT: Minimal Start for Italian Legal Downstream Tasks 中,介绍如何用它进行句子相似度、句子分类和命名实体识别。

引用

如果您认为我们的资源或论文有用,请在您的论文中包含以下引用。
@inproceedings{licari_italian-legal-bert_2022,
    address = {Bozen-Bolzano, Italy},
    series = {{CEUR} {Workshop} {Proceedings}},
    title = {{ITALIAN}-{LEGAL}-{BERT}: {A} {Pre}-trained {Transformer} {Language} {Model} for {Italian} {Law}},
    volume = {3256},
    shorttitle = {{ITALIAN}-{LEGAL}-{BERT}},
    url = {https://ceur-ws.org/Vol-3256/#km4law3},
    language = {en},
    urldate = {2022-11-19},
    booktitle = {Companion {Proceedings} of the 23rd {International} {Conference} on {Knowledge} {Engineering} and {Knowledge} {Management}},
    publisher = {CEUR},
    author = {Licari, Daniele and Comandè, Giovanni},
    editor = {Symeonidou, Danai and Yu, Ran and Ceolin, Davide and Poveda-Villalón, María and Audrito, Davide and Caro, Luigi Di and Grasso, Francesca and Nai, Roberto and Sulis, Emilio and Ekaputra, Fajar J. and Kutz, Oliver and Troquard, Nicolas},
    month = sep,
    year = {2022},
    note = {ISSN: 1613-0073},
    file = {Full Text PDF:https://ceur-ws.org/Vol-3256/km4law3.pdf},
}