CAS的数据集卡片

我们人工注释了两个生物医学领域的语料库。ESSAI语料库包含法语的临床试验方案。这些方案主要来自于国家癌症研究所。典型方案包括两个部分：试验摘要，说明试验目的和采用的方法；以及试验的详细描述，包括纳入和排除标准。CAS语料库包含发表在科学文献和培训材料中的临床病例。它们发表在法语国家（法国、比利时、瑞士、加拿大、非洲国家、热带国家）的不同期刊上，并涉及各种医学专业（心脏病学、泌尿科学、肿瘤科学、产科学、肺科学、胃肠科学）。临床病例的目的是描述患者的临床情况。因此，它们的内容与临床叙述的内容类似（描述诊断、治疗或程序、进展、家族史、预期受众等）。在临床病例中，经常使用否定来描述患者的体征、症状和诊断。也会出现推测，但出现频率较低。

这个版本只包含已注释的CAS语料库。

引用信息

@inproceedings{grabar-etal-2018-cas,
  title        = {{CAS}: {F}rench Corpus with Clinical Cases},
  author       = {Grabar, Natalia  and Claveau, Vincent  and Dalloux, Cl{'e}ment},
  year         = 2018,
  month        = oct,
  booktitle    = {
    Proceedings of the Ninth International Workshop on Health Text Mining and
    Information Analysis
  },
  publisher    = {Association for Computational Linguistics},
  address      = {Brussels, Belgium},
  pages        = {122--128},
  doi          = {10.18653/v1/W18-5614},
  url          = {https://aclanthology.org/W18-5614},
  abstract     = {
    Textual corpora are extremely important for various NLP applications as
    they provide information necessary for creating, setting and testing these
    applications and the corresponding tools. They are also crucial for
    designing reliable methods and reproducible results. Yet, in some areas,
    such as the medical area, due to confidentiality or to ethical reasons, it
    is complicated and even impossible to access textual data representative of
    those produced in these areas. We propose the CAS corpus built with
    clinical cases, such as they are reported in the published scientific
    literature in French. We describe this corpus, currently containing over
    397,000 word occurrences, and the existing linguistic and semantic
    annotations.
  }
}

作者:

bigbio

数据集大小:

33.88 KB