模型:
KoichiYasuoka/roberta-classical-chinese-base-char
这是一个在古代汉语文本上预训练的RoBERTa模型,源自于 GuwenBERT-base 。字符嵌入被增强为繁体/简体字符。您可以对roberta-classical-chinese-base-char模型进行微调,用于下游任务,如 sentence-segmentation , POS-tagging , dependency-parsing 等等。
from transformers import AutoTokenizer,AutoModelForMaskedLM tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-classical-chinese-base-char") model=AutoModelForMaskedLM.from_pretrained("KoichiYasuoka/roberta-classical-chinese-base-char")
SuPar-Kanbun :古代汉语的分词器、词性标注器和依存句法分析器