CKIP BERT 基础版汉语词分割

该模型为古代汉语提供词分割功能。我们的训练数据集覆盖了中国语言的四个时代。

主页

ckiplab/han-transformers

训练数据集

数据集的版权归中央研究院语言学研究所所有。

贡献者

CKIP 中央研究院语言学研究所的Chin-Tung Lin

用法

在你的脚本中使用我们的模型

from transformers import (
  AutoTokenizer,
  AutoModel,
)

tokenizer = AutoTokenizer.from_pretrained("ckiplab/bert-base-han-chinese-ws")
model = AutoModel.from_pretrained("ckiplab/bert-base-han-chinese-ws")

用我们的模型进行推理

>>> from transformers import pipeline
>>> classifier = pipeline("token-classification", model="ckiplab/bert-base-han-chinese-ws")
>>> classifier("帝堯曰放勳")

# output
[{'entity': 'B',
'score': 0.9999793,
'index': 1,
'word': '帝',
'start': 0,
'end': 1},
{'entity': 'I',
'score': 0.9915047,
'index': 2,
'word': '堯',
'start': 1,
'end': 2},
{'entity': 'B',
'score': 0.99992275,
'index': 3,
'word': '曰',
'start': 2,
'end': 3},
{'entity': 'B',
'score': 0.99905187,
'index': 4,
'word': '放',
'start': 3,
'end': 4},
{'entity': 'I',
'score': 0.96299917,
'index': 5,
'word': '勳',
'start': 4,
'end': 5}]

作者:

CKIP Lab

数据集大小:

404.63 MB