数据集:
qanastek/WMT-16-PubMed
WMT-16-PubMed 是为 ACL 2016 收集和对齐的神经机器翻译平行语料库。
翻译:该数据集可用于训练翻译模型。
该语料库包括4种不同语言的源语句和目标语句对:
语言列表:英语(en),西班牙语(es),法语(fr),葡萄牙语(pt)。
from datasets import load_dataset dataset = load_dataset("qanastek/WMT-16-PubMed", split='train', download_mode='force_redownload') print(dataset) print(dataset[0])
lang doc_id workshop publisher source_text target_text 0 en-fr 26839447 WMT'16 Biomedical Translation Task - PubMed pubmed Global Health: Where Do Physiotherapy and Reha... La place des cheveux et des poils dans les rit... 1 en-fr 26837117 WMT'16 Biomedical Translation Task - PubMed pubmed Carabin Les Carabins 2 en-fr 26837116 WMT'16 Biomedical Translation Task - PubMed pubmed In Process Citation Le laboratoire d'Anatomie, Biomécanique et Org... 3 en-fr 26837115 WMT'16 Biomedical Translation Task - PubMed pubmed Comment on the misappropriation of bibliograph... Du détournement des références bibliographique... 4 en-fr 26837114 WMT'16 Biomedical Translation Task - PubMed pubmed Anti-aging medicine, a science-based, essentia... La médecine anti-âge, une médecine scientifiqu... ... ... ... ... ... ... ... 973972 en-pt 20274330 WMT'16 Biomedical Translation Task - PubMed pubmed Myocardial infarction, diagnosis and treatment Infarto do miocárdio; diagnóstico e tratamento 973973 en-pt 20274329 WMT'16 Biomedical Translation Task - PubMed pubmed The health areas politics A política dos campos de saúde 973974 en-pt 20274328 WMT'16 Biomedical Translation Task - PubMed pubmed The role in tissue edema and liquid exchanges ... O papel dos tecidos nos edemas e nas trocas lí... 973975 en-pt 20274327 WMT'16 Biomedical Translation Task - PubMed pubmed About suppuration of the wound after thoracopl... Sôbre as supurações da ferida operatória após ... 973976 en-pt 20274326 WMT'16 Biomedical Translation Task - PubMed pubmed Experimental study of liver lesions in the tre... Estudo experimental das lesões hepáticas no tr...
lang:类型为String的源语言和目标语言对。
source_text:类型为String的源文本。
target_text:类型为String的目标文本。
en-es:285,584条
en-fr:614,093条
en-pt:74,300条
详细信息请参阅相应的 pages 。
该共享任务由以下人员组织:
该语料库不包含个人或敏感信息。
任务的性质导致目标翻译的质量具有一定的变异性。
Hugging Face WMT-16-PubMed:Labrak Yanis,Dufour Richard(未与原始语料库关联)
WMT'16 共享任务:生物医学翻译任务:
使用此数据集时,请引用以下论文。
@inproceedings{bojar-etal-2016-findings, title = Findings of the 2016 Conference on Machine Translation, author = { Bojar, Ondrej and Chatterjee, Rajen and Federmann, Christian and Graham, Yvette and Haddow, Barry and Huck, Matthias and Jimeno Yepes, Antonio and Koehn, Philipp and Logacheva, Varvara and Monz, Christof and Negri, Matteo and Neveol, Aurelie and Neves, Mariana and Popel, Martin and Post, Matt and Rubino, Raphael and Scarton, Carolina and Specia, Lucia and Turchi, Marco and Verspoor, Karin and Zampieri, Marcos, }, booktitle = Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, month = aug, year = 2016, address = Berlin, Germany, publisher = Association for Computational Linguistics, url = https://aclanthology.org/W16-2301, doi = 10.18653/v1/W16-2301, pages = 131--198, }