模型:
ITG/DialoGPT-medium-spanish-chitchat
This is a transformer-decoder GPT-2 model , adapted for the single-turn dialogue task in Spanish . We fine-tuned a DialoGPT-medium 345M parameter model from Microsoft, following the CLM (Causal Language Modelling) objective.
We used one of the datasets available in the Bot Framework Tools repository . We processed the professional-styled personality chat dataset in Spanish , the file is available to download here
import torch from transformers import AutoModelForCausalLM, AutoTokenizer CHAT_TURNS = 5 MAX_LENGTH = 1000 model = AutoModelForCausalLM.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat') tokenizer = AutoTokenizer.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat') device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device) for i in range(CHAT_TURNS): user_input = input(f"Step - {i} >> user prompt ->") with torch.no_grad(): # User turn, where "user_input" is the question (single-turn dialogue task) user_inputs_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors="pt") user_inputs_ids = user_inputs_ids.to(device) # The chat history adds the generated tokens for the answer chat_history = model.generate(user_inputs_ids, max_length=MAX_LENGTH, pad_token_id=tokenizer.eos_token_id) # decode just the last generated output tokens from the model (do not include the user prompt again) step_model_answer = tokenizer.decode(chat_history[:, user_inputs_ids.shape[-1]:][0], skip_special_tokens=True) print(f"Step - {i} >> DialoGPT-spanish model answer -> {step_model_answer}")
Role | Response |
---|---|
User | Hola, buenos dias |
Bot | Buenos días |
User | ¿Qué tal estás? |
Bot | Fenomenal, gracias. |
User | ¡Tengo hambre! |
Bot | Quizá picar algo te ayude. |
User | Estoy triste |
Bot | Siento escuchar eso. |
User | ¿Tienes personalidad? |
Bot | Soy un bot. |
User | ¡Cuenta un chiste! |
Bot | En realidad no tengo tanta gracia. |
User | Jajajajajajajaja |
Bot | ¡Me alegro de que te rías! |
Hyper-parameter | Value |
---|---|
Validation partition (%) | 20% |
Training batch size | 8 |
Learning rate | 5e-4 |
Max training epochs | 20 |
Warmup training steps (%) | 6% |
Weight decay | 0.01 |
Optimiser (beta1, beta2, epsilon) | AdamW (0.9, 0.999, 1e-08) |
Monitoring metric (delta, patience) | Validation loss (0.1, 3) |
If you want to fine-tune your own dialogue model, we recommend you to start from the DialoGPT model . You can check the original GitHub repository .
Since our approach can assign a probability to any Unicode string, this allows us to evaluate our LMs on any dataset regardless of pre-processing, tokenization, or vocab size.