模型:

EMBO/BioMegatron345mCased

英文

! ---

这个模型是由 https://huggingface.co/drAbreu 上传到HuggingFace的

该模型基于NVIDIA的检查点,位于

https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345mcased

-- >

BioMegatron 是由NVIDIA的应用深度学习研究团队开发的转换器。这个特定的Megatron模型是在Megatron-LM模型的基础上进行训练的,将PubMed corpus添加到了Megatron-LM的语料库中(包括Wikipedia、RealNews、OpenWebText和CC-Stories)。BioMegatron的架构与BERT类似(尽管不完全相同),它有3.45亿个参数:

  • 24层
  • 16个注意头,隐藏大小为1024。

更多信息请访问 nVIDIA NGC CATALOG

在?转换器中运行BioMegatron

在这个实现中,我们遵循 nvidia/megatron-bert-uncased-345m 存储库的命令,将BioMegatron可用于?。

但是,文件 convert_megatron_bert_checkpoint.py 需要进行修改。原因是 nvidia/megatron-bert-uncased-345m 中显示的Megatron模型已经包含了头层,而我们上传到该存储库中的BioMegatron模型的权重不包含头部。

下面的代码是对原始 convert_megatron_bert_checkpoint.py 的修改。

import os
import torch
from convert_biomegatron_checkpoint import convert_megatron_checkpoint

print_checkpoint_structure = True
path_to_checkpoint = "/path/to/BioMegatron345mUncased/"

# Extract the basename.
basename = os.path.dirname(path_to_checkpoint).split('/')[-1]

# Load the model.
input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")

# Convert.
print("Converting")
output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)

# Print the structure of converted state dict.
if print_checkpoint_structure:
    recursive_print(None, output_state_dict)

# Store the config to file.
output_config_file = os.path.join(path_to_checkpoint, "config.json")
print(f'Saving config to "{output_config_file}"')
with open(output_config_file, "w") as f:
    json.dump(output_config, f)

# Store the state_dict to file.
output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
print(f'Saving checkpoint to "{output_checkpoint_file}"')
torch.save(output_state_dict, output_checkpoint_file)

在存储库中,我们提供了 python script 的另一个版本,以便任何用户可以交叉检查在这个存储库中复制的模型的有效性。

BioMegatron可以使用标准的?脚本来运行模型加载。这里我们展示一个与 nvidia/megatron-bert-uncased-345m 相同的示例。

import os
import torch
from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM
checkpoint = "EMBO/BioMegatron345mCased"
# The tokenizer. Megatron was trained with standard tokenizer(s).
tokenizer = BertTokenizer.from_pretrained(checkpoint)
# Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m.
model = AutoModelForMaskedLM.from_pretrained(checkpoint)
device = torch.device("cpu")
# Create inputs (from the BERT example page).
input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device)
label = tokenizer("The capital of France is Paris",  return_tensors="pt")["input_ids"].to(device)
# Run the model.
with torch.no_grad():
    output = model(**input, labels=label)
    print(output)

局限性

这个实现没有在任何任务上进行微调。它只有官方nVIDIA检查点的权重。需要对其进行训练才能执行任何下游任务。

原始代码

Megatron的原始代码可以在 https://github.com/NVIDIA/Megatron-LM 中找到。