模型:
EMBO/BioMegatron345mCased
! ---
-- >
BioMegatron 是由NVIDIA的应用深度学习研究团队开发的转换器。这个特定的Megatron模型是在Megatron-LM模型的基础上进行训练的,将PubMed corpus添加到了Megatron-LM的语料库中(包括Wikipedia、RealNews、OpenWebText和CC-Stories)。BioMegatron的架构与BERT类似(尽管不完全相同),它有3.45亿个参数:
更多信息请访问 nVIDIA NGC CATALOG
在这个实现中,我们遵循 nvidia/megatron-bert-uncased-345m 存储库的命令,将BioMegatron可用于?。
但是,文件 convert_megatron_bert_checkpoint.py 需要进行修改。原因是 nvidia/megatron-bert-uncased-345m 中显示的Megatron模型已经包含了头层,而我们上传到该存储库中的BioMegatron模型的权重不包含头部。
下面的代码是对原始 convert_megatron_bert_checkpoint.py 的修改。
import os import torch from convert_biomegatron_checkpoint import convert_megatron_checkpoint print_checkpoint_structure = True path_to_checkpoint = "/path/to/BioMegatron345mUncased/" # Extract the basename. basename = os.path.dirname(path_to_checkpoint).split('/')[-1] # Load the model. input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu") # Convert. print("Converting") output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False) # Print the structure of converted state dict. if print_checkpoint_structure: recursive_print(None, output_state_dict) # Store the config to file. output_config_file = os.path.join(path_to_checkpoint, "config.json") print(f'Saving config to "{output_config_file}"') with open(output_config_file, "w") as f: json.dump(output_config, f) # Store the state_dict to file. output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin") print(f'Saving checkpoint to "{output_checkpoint_file}"') torch.save(output_state_dict, output_checkpoint_file)
在存储库中,我们提供了 python script 的另一个版本,以便任何用户可以交叉检查在这个存储库中复制的模型的有效性。
BioMegatron可以使用标准的?脚本来运行模型加载。这里我们展示一个与 nvidia/megatron-bert-uncased-345m 相同的示例。
import os import torch from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM checkpoint = "EMBO/BioMegatron345mCased" # The tokenizer. Megatron was trained with standard tokenizer(s). tokenizer = BertTokenizer.from_pretrained(checkpoint) # Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m. model = AutoModelForMaskedLM.from_pretrained(checkpoint) device = torch.device("cpu") # Create inputs (from the BERT example page). input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device) label = tokenizer("The capital of France is Paris", return_tensors="pt")["input_ids"].to(device) # Run the model. with torch.no_grad(): output = model(**input, labels=label) print(output)
这个实现没有在任何任务上进行微调。它只有官方nVIDIA检查点的权重。需要对其进行训练才能执行任何下游任务。
Megatron的原始代码可以在 https://github.com/NVIDIA/Megatron-LM 中找到。