CKIP BERT Base 中文古文词性标注

这个模型为古代汉语提供词性标注功能。我们的训练数据集覆盖了中国语言的四个历史时期。

主页

ckiplab/han-transformers

训练数据集

数据集的版权属于中央研究院语言学研究所。

贡献者

CKIP

使用方法

在您的脚本中使用我们的模型

from transformers import (
  AutoTokenizer,
  AutoModel,
)

tokenizer = AutoTokenizer.from_pretrained("ckiplab/bert-base-han-chinese-pos")
model = AutoModel.from_pretrained("ckiplab/bert-base-han-chinese-pos")

用于推理的模型

>>> from transformers import pipeline
>>> classifier = pipeline("token-classification", model="ckiplab/bert-base-han-chinese-pos")
>>> classifier("帝堯曰放勳")

[{'entity': 'NB1',
  'score': 0.99410427,
  'index': 1,
  'word': '帝',
  'start': 0,
  'end': 1},
 {'entity': 'NB1',
  'score': 0.98874336,
  'index': 2,
  'word': '堯',
  'start': 1,
  'end': 2},
 {'entity': 'VG',
  'score': 0.97059363,
  'index': 3,
  'word': '曰',
  'start': 2,
  'end': 3},
 {'entity': 'NB1',
  'score': 0.9864504,
  'index': 4,
  'word': '放',
  'start': 3,
  'end': 4},
 {'entity': 'NB1',
  'score': 0.9543974,
  'index': 5,
  'word': '勳',
  'start': 4,
  'end': 5}]

作者:

CKIP Lab

数据集大小:

406.16 MB