这个模型是SINAI团队参加ACL 2023年举办的BioNLP研讨会时的成果。该任务的目标是推动自动放射学报告摘要系统的发展,并通过将七种不同的模态和解剖结构纳入所提供的数据来扩大其适用范围。我们提议通过“序列到序列”学习来自动化放射学印象的生成,利用公开可用的通用领域和生物医学领域特定的预训练模型的优势。该存储库提供了我们最佳表现系统的访问权限,该系统经过了对 Sci-Five base 进行了微调,即T5模型进行了额外的200k步骤训练,以在生物医学文献的上下文中进行优化。
官方评估结果证明,将通用领域系统适应生物医学文献对于随后进行放射学报告摘要任务的微调是有益的。下表总结了该模型在官方评估中获得的官方分数。团队排名可在 here 处获得。
BLEU4 | ROUGE-L | BERTscore | F1-RadGraph |
---|---|---|---|
017.38 | 32.32 | 55.04 | 33.96 |
该系统的详细描述论文已发表在 Proceedings of the 22st Workshop on Biomedical Language Processing 。
BibTeX引用:
@inproceedings{chizhikova-etal-2023-sinai, title = "{SINAI} at {R}ad{S}um23: Radiology Report Summarization Based on Domain-Specific Sequence-To-Sequence Transformer Model", author = "Chizhikova, Mariia and Diaz-Galiano, Manuel and Urena-Lopez, L. Alfonso and Martin-Valdivia, M. Teresa", booktitle = "The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.bionlp-1.53", pages = "530--534", abstract = "This paper covers participation of the SINAI team in the shared task 1B: Radiology Report Summarization at the BioNLP workshop held on ACL 2023. Our proposal follows a sequence-to-sequence approach which leverages pre-trained multilingual general domain and monolingual biomedical domain pre-trained language models. The best performing system based on domain-specific model reached 33.96 F1RadGraph score which is the fourth best result among the challenge participants. This model was made publicly available on HuggingFace. We also describe an attempt of Proximal Policy Optimization Reinforcement Learning that was made in order to improve the factual correctness measured with F1RadGraph but did not lead to satisfactory results.", }