模型:
hfl/chinese-macbert-large
这个仓库包含了我们论文“重新审视用于中文自然语言处理的预训练模型”的资源,将发表在“ Findings of EMNLP ”中。您可以通过ACL文献数据库或 arXiv pre-print 阅读我们的最终版论文。
Revisiting Pre-trained Models for Chinese Natural Language Processing Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu
您可能还对以下内容感兴趣:
HFL提供的更多资源: https://github.com/ymcui/HFL-Anthology
MacBERT是一种改进的BERT模型,采用了新颖的MLM(Masked Language Modeling)纠正预训练任务,减小了预训练和微调之间的差异。
我们提议使用类似单词来进行掩蔽,而不是使用[MASK]标记进行掩蔽,因为在微调阶段[MASK]标记从未出现过。类似单词是通过基于word2vec(Mikolov等,2013)相似度计算的方法得到的。如果选择掩蔽一个N-gram,我们将单独找到类似的单词进行替换。在少数情况下,如果没有类似的单词,我们将使用随机单词进行替换。
这里是我们预训练任务的一个例子。
Example | |
---|---|
Original Sentence | we use a language model to predict the probability of the next word. |
MLM | we use a language [M] to [M] ##di ##ct the pro [M] ##bility of the next word . |
Whole word masking | we use a language [M] to [M] [M] [M] the [M] [M] [M] of the next word . |
N-gram masking | we use a [M] [M] to [M] [M] [M] the [M] [M] [M] [M] [M] next word . |
MLM as correction | we use a text system to ca ##lc ##ulate the po ##si ##bility of the next word . |
除了新的预训练任务,我们还采用了以下技术:
请注意,我们的MacBERT可以直接替换原始的BERT,因为主要神经架构没有区别。
有关更多技术细节,请参阅我们的论文: Revisiting Pre-trained Models for Chinese Natural Language Processing
如果您认为我们的资源或论文有用,请考虑在您的论文中包含以下引用。
@inproceedings{cui-etal-2020-revisiting, title = "Revisiting Pre-Trained Models for {C}hinese Natural Language Processing", author = "Cui, Yiming and Che, Wanxiang and Liu, Ting and Qin, Bing and Wang, Shijin and Hu, Guoping", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.58", pages = "657--668", }