模型:

EMBO/BioMegatron345mUncased

英文

!---

This model has been uploaded to HuggingFace by https://huggingface.co/drAbreu

The model is based on the NVIDIA checkpoint located at

https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345muncased

-->

BioMegatron is a transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model trained on top of the Megatron-LM model, adding a PubMed corpusto the Megatron-LM corpora(Wikipedia, RealNews, OpenWebText, and CC-Stories). BioMegatron follows a similar (albeit not identical) architecture as BERT and it has 345 million parameters:

  • 24 layers
  • 16 attention heads with a hidden size of 1024.

More information available at nVIDIA NGC CATALOG

Running BioMegatron in ? transformers

In this implementation we have followed the commands of the nvidia/megatron-bert-uncased-345m repository to make BioMegatron available in ?.

However, the file convert_megatron_bert_checkpoint.py needed a modification. The reason is that the Megatron model shown in nvidia/megatron-bert-uncased-345m has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.

We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.

The code below is a modification of the original convert_megatron_bert_checkpoint.py .

import os
import torch
from convert_biomegatron_checkpoint import convert_megatron_checkpoint

print_checkpoint_structure = True
path_to_checkpoint = "/path/to/BioMegatron345mUncased/"

# Extract the basename.
basename = os.path.dirname(path_to_checkpoint).split('/')[-1]

# Load the model.
input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")

# Convert.
print("Converting")
output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)

# Print the structure of converted state dict.
if print_checkpoint_structure:
    recursive_print(None, output_state_dict)

# Store the config to file.
output_config_file = os.path.join(path_to_checkpoint, "config.json")
print(f'Saving config to "{output_config_file}"')
with open(output_config_file, "w") as f:
    json.dump(output_config, f)

# Store the state_dict to file.
output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
print(f'Saving checkpoint to "{output_checkpoint_file}"')
torch.save(output_state_dict, output_checkpoint_file)

BioMegatron can be run with the standard ? script for loading models. Here we show an example identical to that of nvidia/megatron-bert-uncased-345m .

import os
import torch

from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM
checkpoint = "EMBO/BioMegatron345mUncased"

# The tokenizer. Megatron was trained with standard tokenizer(s).
tokenizer = BertTokenizer.from_pretrained(checkpoint)
# Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m.
model = AutoModelForMaskedLM.from_pretrained(checkpoint)
device = torch.device("cpu")
# Create inputs (from the BERT example page).
input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device)
label = tokenizer("The capital of France is Paris",  return_tensors="pt")["input_ids"].to(device)

# Run the model.
with torch.no_grad():
    output = model(**input, labels=label)
    print(output)

Limitations

This implementation has not been fine-tuned in any task. It has only the weights of the official nVIDIA checkpoint. It needs to be trained to perform any downstream task.

Original code

The original code for Megatron can be found here: https://github.com/NVIDIA/Megatron-LM .

对以上内容翻译成中文,不要翻译大写的英文, 保留a标签以及所有属性,按照此约束返回翻译后的中文

!---

此模型由 https://huggingface.co/drAbreu 上传到 HuggingFace

该模型基于位于 NVIDIA检查点的 https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345muncased

-->

BioMegatron 是由NVIDIA的应用深度学习研究团队开发的转换器。这个特殊的Megatron模型是在Megatron-LM模型的基础上训练的,将PubMed语料库添加到Megatron-LM的语料库中(包括Wikipedia,RealNews,OpenWebText和CC-Stories)。 BioMegatron与BERT具有类似(尽管不完全相同)的架构,它有3.45亿个参数:

  • 24层
  • 16个注意力头,隐藏大小为1024。

更多信息请访问 nVIDIA NGC CATALOG

在? transformers中运行BioMegatron

在此实现中,我们遵循了 nvidia/megatron-bert-uncased-345m 仓库中的指令,以使BioMegatron在?中可用。

但是,文件 convert_megatron_bert_checkpoint.py 需要进行修改。原因是在 nvidia/megatron-bert-uncased-345m 中展示的Megatron模型已经包含了头层,而我们上传到此存储库的BioMegatron模型的权重不包含头层。

为了让任何用户可以交叉检查在此存储库中复制的模型的有效性,我们提供了存储库中 python script 的另一版本。

下面的代码是对原始 convert_megatron_bert_checkpoint.py 的修改。

import os
import torch
from convert_biomegatron_checkpoint import convert_megatron_checkpoint

print_checkpoint_structure = True
path_to_checkpoint = "/path/to/BioMegatron345mUncased/"

# Extract the basename.
basename = os.path.dirname(path_to_checkpoint).split('/')[-1]

# Load the model.
input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")

# Convert.
print("Converting")
output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)

# Print the structure of converted state dict.
if print_checkpoint_structure:
    recursive_print(None, output_state_dict)

# Store the config to file.
output_config_file = os.path.join(path_to_checkpoint, "config.json")
print(f'Saving config to "{output_config_file}"')
with open(output_config_file, "w") as f:
    json.dump(output_config, f)

# Store the state_dict to file.
output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
print(f'Saving checkpoint to "{output_checkpoint_file}"')
torch.save(output_state_dict, output_checkpoint_file)

BioMegatron可以使用标准的?加载模型的脚本来运行。在这里,我们展示了与 nvidia/megatron-bert-uncased-345m 相同的示例。

import os
import torch

from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM
checkpoint = "EMBO/BioMegatron345mUncased"

# The tokenizer. Megatron was trained with standard tokenizer(s).
tokenizer = BertTokenizer.from_pretrained(checkpoint)
# Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m.
model = AutoModelForMaskedLM.from_pretrained(checkpoint)
device = torch.device("cpu")
# Create inputs (from the BERT example page).
input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device)
label = tokenizer("The capital of France is Paris",  return_tensors="pt")["input_ids"].to(device)

# Run the model.
with torch.no_grad():
    output = model(**input, labels=label)
    print(output)

限制

此实现未在任何任务中进行过微调。它只有官方nVIDIA检查点的权重。需要对其进行训练以执行任何下游任务。

原始代码

Megatron的原始代码可以在此处找到: https://github.com/NVIDIA/Megatron-LM .