LayoutLM

多模态（文本+布局/格式+图像）的文档AI预训练

Microsoft Document AI | GitHub

模型描述

LayoutLM是一种简单而有效的文本和布局预训练方法，用于文档图像理解和信息提取任务，例如表单理解和收据理解。LayoutLM在多个数据集上实现了SOTA结果。有关更多详细信息，请参阅我们的论文：

LayoutLM: Pre-training of Text and Layout for Document Image Understanding 徐义恒，李明浩，崔磊，黄少汉，韦福如，周明 KDD 2020

训练数据

我们使用IIT-CDIP Test Collection 1.0*数据集对LayoutLM进行预训练，共有两个设置。

LayoutLM-Base，无大小写（11M个文档，2个时期）：12层，768隐藏层，12个头，113M参数
LayoutLM-Large，无大小写（11M个文档，2个时期）：24层，1024隐藏层，16个头，343M参数（此模型）

引用

如果您在研究中使用了LayoutLM，请引用以下论文：

@misc{xu2019layoutlm,
    title={LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
    author={Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou},
    year={2019},
    eprint={1912.13318},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

作者:

Microsoft

数据集大小:

2.53 GB