
ALBERT Base Spanish

This is an ALBERT model trained on a big spanish corpora . The model was trained on a single TPU v3-8 with the following hyperparameters and steps/time:

  • LR: 0.0008838834765
  • Batch Size: 960
  • Warmup ratio: 0.00625
  • Warmup steps: 53333.33333
  • Goal steps: 8533333.333
  • Total steps: 3650000
  • Total training time (aprox): 70.4 days.

Training loss