模型:
KoichiYasuoka/roberta-classical-chinese-large-char
这是一个在古代汉语文本上预训练的RoBERTa模型,源自 GuwenBERT-large 。字符嵌入被扩展到了繁简体字符。您可以微调roberta-classical-chinese-large-char用于下游任务,例如 sentence-segmentation 、 POS-tagging 、 dependency-parsing 等等。
from transformers import AutoTokenizer,AutoModelForMaskedLM tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-classical-chinese-large-char") model=AutoModelForMaskedLM.from_pretrained("KoichiYasuoka/roberta-classical-chinese-large-char")
SuPar-Kanbun :古代汉语的分词器、词性标注器和依存句法分析器