数据集:

lexlms/legal_lama

语言:

en

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

no-annotation

源数据集:

extended

预印本库:

arxiv:2305.07507
中文

Dataset Card for "LegalLAMA"

Dataset Summary

LegalLAMA is a diverse probing benchmark suite comprising 8 sub-tasks that aims to assess the acquaintance of legal knowledge that PLMs acquired in pre-training.

Dataset Specifications

Corpus Corpus alias Examples Avg. Tokens Labels
Criminal Code Sections (Canada) canadian_sections 321 72 144
Legal Terminology (EU) cjeu_term 2,127 164 23
Contractual Section Titles (US) contract_sections 1,527 85 20
Contract Types (US) contract_types 1,089 150 15
ECHR Articles (CoE) ecthr_articles 5,072 69 13
Legal Terminology (CoE) ecthr_terms 6,803 97 250
Crime Charges (US) us_crimes 4,518 118 59
Legal Terminology (US) us_terms 5,829 308 7

Citation

Ilias Chalkidis*, Nicolas Garneau*, Catalina E.C. Goanta, Daniel Martin Katz, and Anders Søgaard. LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development. 2022. In the Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada.

@inproceedings{chalkidis-garneau-etal-2023-lexlms,
    title = {{LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development}},
    author = "Chalkidis*, Ilias and 
              Garneau*, Nicolas and
              Goanta, Catalina and 
              Katz, Daniel Martin and 
              Søgaard, Anders",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics",
    month = june,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2305.07507",
}