RoGPT2-medium 模型卡片
语言:
RoGPT2: 用于文本生成的罗马尼亚语GPT2
所有可用的模型:
有关代码和评估,请参见
GitHub
。
如何使用
# TensorFlow
from transformers import AutoTokenizer, TFAutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-medium')
model = TFAutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-medium')
inputs = tokenizer.encode("Este o zi de vara", return_tensors='tf')
text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
print(tokenizer.decode(text[0]))
# PyTorch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('readerbench/RoGPT2-medium')
model = AutoModelForCausalLM.from_pretrained('readerbench/RoGPT2-medium')
inputs = tokenizer.encode("Este o zi de vara", return_tensors='pt')
text = model.generate(inputs, max_length=1024, no_repeat_ngram_size=2)
print(tokenizer.decode(text[0]))
训练
语料库统计
Corpus
|
Total size
|
Number of words
|
Number of sentences
|
OSCAR
|
11.54 GB
|
1745M
|
48.46M
|
Wiki-Ro
|
0.46 GB
|
68M
|
1.79M
|
Debates
|
0.5 GB
|
73M
|
3.61M
|
Books
|
4.37 GB
|
667M
|
37.39M
|
News
|
0.15 GB
|
23M
|
0.77M
|
训练统计
Version
|
Number of parameters
|
Number of epoch
|
Duration of an epoch
|
Context size
|
Batch size
|
PPL
|
Base
|
124M
|
15
|
7h
|
1024
|
72
|
22.96
|
Medium
|
354M
|
10
|
22h
|
1024
|
24
|
17.64
|
Large
|
774M
|
5
|
45h
|
512
|
16
|
16.77
|
评估
1. MOROCO
Model
|
Dialect
|
Md to Ro
|
Ro to Md
|
KRR + SK
|
94.06
|
67.59
|
75.47
|
BERT-base-ro
|
95.98
|
69.90
|
78.08
|
RoBERT-small
|
95.76
|
69.05
|
80.15
|
RoBERT-base
|
97.24
|
68.80
|
82.37
|
RoBERT-large
|
97.21
|
69.50
|
83.26
|
RoGPT2-base
|
96.69
|
69.82
|
77.55
|
RoGPT2-medium
|
96.42
|
69.77
|
80.51
|
RoGPT2-large
|
96.93
|
71.07
|
82.56
|
2. LaRoSeDa
Model
|
Binary: Accuracy
|
Binary: F1-Score
|
Multi-Class: Accuracy
|
Multi-Class: F1-Score
|
BERT-base-ro
|
98.07
|
97.94
|
-
|
79.61
|
RoDiBERT
|
98.40
|
98.31
|
-
|
83.01
|
RoBERT-small
|
97.44
|
97.43
|
89.30
|
84.23
|
RoBERT-base
|
98.27
|
98.26
|
90.59
|
86.27
|
RoBERT-large
|
98.20
|
98.19
|
90.93
|
86.63
|
RoGPT2-base
|
97.89
|
97.88
|
89.65
|
84.68
|
RoGPT2-medium
|
98.03
|
98.04
|
90.29
|
85.37
|
RoGPT2-large
|
98.06
|
98.07
|
90.26
|
84.89
|
3. RoSTS
Model
|
Spearman dev-set
|
Spearman test-set
|
Pearson dev-set
|
Pearson test-set
|
BERT-base-ro
|
84.26
|
80.86
|
84.59
|
81.59
|
RoDiBERT
|
77.07
|
71.47
|
77.13
|
72.25
|
RoBERT-small
|
82.06
|
78.06
|
81.66
|
78.49
|
RoBERT-base
|
84.93
|
80.39
|
85.03
|
80.39
|
RoBERT-large
|
86.25
|
83.15
|
86.58
|
83.76
|
RoGPT2-base
|
83.51
|
79.77
|
83.74
|
80.56
|
RoGPT2-medium
|
85.75
|
82.25
|
86.04
|
83.16
|
RoGPT2-large
|
85.70
|
82.64
|
86.14
|
83.46
|
4. WMT16
Model
|
Decoder method
|
Ro-En
|
En-Ro
|
mBART
|
-
|
38.5
|
38.5
|
OpenNMT
|
-
|
-
|
24.7
|
RoGPT2-base
|
Greedy
|
30.37
|
20.27
|
RoGPT2-base
|
Beam-search-4
|
31.26
|
22.31
|
RoGPT2-base
|
Beam-search-8
|
31.39
|
22.95
|
RoGPT2-medium
|
Greedy
|
32.48
|
22.18
|
RoGPT2-medium
|
Beam-search-4
|
34.08
|
24.03
|
RoGPT2-medium
|
Beam-search-8
|
34.16
|
24.13
|
RoGPT2-large
|
Greedy
|
33.69
|
23.31
|
RoGPT2-large
|
Beam-search-4
|
34.40
|
24.23
|
RoGPT2-large
|
Beam-search-8
|
34.51
|
24.32
|
5. XQuAD
Model
|
Decoder method
|
EM
|
F1-Score
|
BERT-base-ro
|
-
|
47.89
|
63.74
|
RoDiBERT
|
-
|
21.76
|
34.57
|
RoBERT-small
|
-
|
30.84
|
45.17
|
RoBERT-base
|
-
|
53.52
|
70.04
|
RoBERT-large
|
-
|
55.46
|
69.64
|
mBERT
|
-
|
59.9
|
72.7
|
XLM-R Large
|
-
|
69.7
|
83.6
|
RoGPT2-base
|
Greedy
|
23.69
|
35.97
|
RoGPT2-base
|
Beam-search-4
|
24.11
|
35.27
|
RoGPT2-medium
|
Greedy
|
29.66
|
44.74
|
RoGPT2-medium
|
Beam-search-4
|
31.59
|
45.32
|
RoGPT2-large
|
Greedy
|
29.74
|
42.98
|
RoGPT2-large
|
Beam-search-4
|
29.66
|
43.05
|
RoGPT2-base-en-ro
|
Greedy
|
23.86
|
34.27
|
RoGPT2-base-en-ro
|
Beam-search-4
|
25.04
|
34.51
|
RoGPT2-medium-en-ro
|
Greedy
|
27.05
|
39.75
|
RoGPT2-medium-en-ro
|
Beam-search-4
|
27.64
|
39.11
|
RoGPT2-large-en-ro
|
Greedy
|
28.40
|
39.79
|
RoGPT2-large-en-ro
|
Beam-search-4
|
28.73
|
39.71
|
RoGPT2-large-en-ro-mask
|
Greedy
|
31.34
|
44.71
|
RoGPT2-large-en-ro-mask
|
Beam-search-4
|
31.59
|
43.53
|
6. Wiki-Ro: LM
Model
|
PPL dev
|
PPL test
|
BERT-base-ro
|
29.0897
|
28.0043
|
RoGPT2-base
|
34.3795
|
33.7460
|
RoGPT2-medium
|
23.7879
|
23.4581
|
RoGPT2-large
|
21.7491
|
21.5200
|
7. RoGEC
Model
|
Decoder mothod
|
P
|
R
|
F
0.5
|
Transformer-tiny
|
Beam-search
|
53.53
|
26.36
|
44.38
|
Transformer-base Finetuning
|
Beam-search
|
56.05
|
46.19
|
53.76
|
Transformer-base Finetuning
|
Beam-search-LM
|
50.68
|
45.39
|
49.52
|
Transformer-base Finetuning
|
Beam-search-norm-LM
|
51.06
|
45.43
|
49.83
|
RoGPT2-base
|
Greedy
|
59.02
|
49.35
|
56.80
|
RoGPT2-base
|
Beam-search-4
|
65.23
|
49.26
|
61.26
|
RoGPT2-base
|
Beam-search-8
|
65.88
|
49.64
|
61.84
|
RoGPT2-medium
|
Greedy
|
69.97
|
57.94
|
67.18
|
RoGPT2-medium
|
Beam-search-4
|
72.46
|
57.99
|
69.01
|
RoGPT2-medium
|
Beam-search-8
|
72.24
|
57.69
|
68.77
|
RoGP2-large
|
Greedy
|
61.90
|
49.09
|
58.83
|
RoGP2-large
|
Beam-search-4
|
65.24
|
49.43
|
61.32
|
RoGP2-large
|
Beam-search-8
|
64.96
|
49.22
|
61.06
|
RoGPT2-base*
|
Greedy
|
68.67
|
49.60
|
63.77
|
RoGPT2-base*
|
Beam-search-4
|
71.16
|
50.53
|
65.79
|
RoGPT2-base*
|
Beam-search-8
|
71.68
|
50.65
|
66.18
|
RoGPT2-medium*
|
Greedy
|
58.21
|
43.32
|
54.47
|
RoGPT2-medium*
|
Beam-search-4
|
68.31
|
43.78
|
61.43
|
RoGPT2-medium*
|
Beam-search-8
|
68.68
|
43.99
|
61.75
|
RoGPT2-large*
|
Greedy
|
64.86
|
41.30
|
58.22
|
RoGPT2-large*
|
Beam-search-4
|
65.57
|
41.00
|
58.55
|
RoGPT2-large*
|
Beam-search-8
|
65.44
|
41.09
|
58.50
|
注意: * 模型是使用300万个人工生成的配对数据集进行训练的
致谢
研究得到了来自Google的
Cloud TPUs
的支持
如何引用
@inproceedings{niculescu2021rogpt2,
title={RoGPT2: Romanian GPT2 for Text Generation},
author={Niculescu, Mihai Alexandru and Ruseti, Stefan and Dascalu, Mihai},
booktitle={2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)},
pages={1154--1161},
year={2021},
organization={IEEE}
}