模型:
cjvt/gpt-sl-base
This model is a Slovene GPT model, based on the bigscience workshop fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.
GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length. The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.
The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.
Step | Validation Perplexity |
---|---|
50000 | 26.801 |
100000 | 25.574 |
150000 | 24.773 |
200000 | 24.099 |
250000 | 23.336 |
300000 | 22.607 |
350000 | 22.329 |
390000 | 22.293 |