模型:
theblackcat102/pythia-1b-deduped-sft
此模型卡旨在成为新模型的基本模板。它是使用 this raw template 生成的。
请参考右侧的示例
用户(直接使用者和下游应用)应了解模型的风险、偏见和局限性。有关进一步建议,需要更多信息。
使用下面的代码来开始使用模型。
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "theblackcat102/pythia-1b-deduped-sft" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name).half().eval().cuda() input_text = "<human>What's the earth population?<bot>" inputs = tokenizer(input_text, return_tensors="pt", padding=True).to(0) outputs = model.generate( **inputs, early_stopping=True, max_new_tokens=args.max_new_tokens, do_sample=True, top_k=args.top_k, temperature=args.temperature, pad_token_id=tokenizer.eos_token_id, # dialogue_collator.py line 36 ) output = tokenizer.decode(outputs[0], truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"]) print(output)
deepspeed trainer_sft.py --configs defaults pythia-1b --deepspeed
此模型进行了1000次迭代训练。
defaults: learning_rate: 1e-5 gradient_checkpointing: false gradient_accumulation_steps: 32 per_device_train_batch_size: 2 per_device_eval_batch_size: 2 weight_decay: 0.00 warmup_steps: 600 eval_steps: 250 save_steps: 250 max_length: 512 num_train_epochs: 2 logging_steps: 10 max_grad_norm: 2.0 save_total_limit: 4 fp16: true eval_accumulation_steps: freeze_layer: datasets: - gsm8k_hard - webgpt - squad_v2 - adversarial_qa - private_tuning - oa_translated - prosocial_dialogue - math_qa - wikihow - joke - gsm8k - ted_trans_en-hi - ted_trans_de-ja - ted_trans_nl-en - ted_trans_en-ja - ted_trans_en-es - ted_trans_en-ms - xsum: fraction: 0.5 - cnn_dailymail: fraction: 0.5 - multi_news: fraction: 0.5 - tldr_news: fraction: 0.5 - scitldr: fraction: 0.5 - samsum: fraction: 0.5 - debate_sum: fraction: 0.5 - billsum: fraction: 0.5 - wmt2019_zh-en: fraction: 0.9 - wmt2019_ru-en: fraction: 0.9 - wmt2019_de-en: fraction: 0.9 - wmt2019_fr-de: fraction: 0.9 - essay_instruction - reddit_eli5 - reddit_askh - reddit_asks cache_dir: /fsx/home-theblackcat02/.cache loss_fn: CrossEntropyLoss eval_size: log_dir: "base" quantization: false seq2seqmodel: false poly_eps: 1.0 fuse_gelu: true log_wandb: true samples_mixing: true # uses collator that mixes samples in the batch to create a single sample with possible multiple tasks within verbose: false pythia-1b: learning_rate: 5e-6 model_name: EleutherAI/pythia-1b-deduped weight_decay: 0.01 max_length: 540 fp16: true warmup_steps: 1000 gradient_accumulation_steps: 20 per_device_train_batch_size: 20 per_device_eval_batch_size: 2 eval_steps: 500 save_steps: 500
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
可以使用 Machine Learning Impact calculator 和 Lacoste et al. (2019) 中提供的方法估算碳排放量。
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
BibTeX:
[需要更多信息]
APA:
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]