模型描述

Tk-Instruct是一系列的编码器-解码器Transformer模型，通过遵循上下文指令（明确的语言任务定义，k-shot示例，解释等）来解决各种NLP任务。它们建立在预训练的 T5 models 上，通过在 Natural Instructions benchmark 中收集的大量任务和指令上进行微调训练，该数据集总共包含70多个主题类别中的1600个以上的任务。这使得模型不仅能处理训练任务，还能推广到许多未见过的任务，而无需进一步的参数更新。

使用模型的更多资源:

Paper： link
代码库： Tk-Instruct
官方网站： Natural Instructions
所有发布的模型： allenai/tk-instruct

期望的用途和限制

Tk-Instruct可以通过遵循指令来执行许多NLP任务。

如何使用

在向模型提供输入时，应在原始输入之前添加任务定义、演示示例或解释，并将其输入模型。您可以按照以下方式轻松尝试Tk-Instruct模型：

>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("allenai/tk-instruct-3b-def")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("allenai/tk-instruct-3b-def")

>>> input_ids = tokenizer.encode(
        "Definition: return the currency of the given country. Now complete the following example - Input: India. Output:", 
        return_tensors="pt")
>>> output = model.generate(input_ids, max_length=10)
>>> output = tokenizer.decode(output[0], skip_special_tokens=True)   # model should output 'Indian Rupee'

>>> input_ids = tokenizer.encode(
        "Definition: negate the following sentence. Input: John went to school. Output:", 
        return_tensors="pt")
>>> output = model.generate(input_ids, max_length=10)
>>> output = tokenizer.decode(output[0], skip_special_tokens=True)   # model should output 'John did not go to shool.'

限制

我们仍在努力了解这些模型的行为，但我们已经发现了一些问题：

模型对指令通常比较敏感。有时候重新措辞指令可能会导致完全不同的输出。
模型并不总是完全遵循指令。有时候模型不会按照您的指示执行（例如，当您要求模型生成一个句子时，它可能生成一个单词或一个很长的故事）。
模型可能在某些任务上完全失败。

如果您发现了严重的问题或任何有趣的结果，欢迎与我们分享！

训练数据

Tk-Instruct使用 Natural Instructions benchmark 中的任务和指令进行训练，该数据集总共包含了70多个主题类别中的1600多个任务。我们按照官方的训练/测试划分进行训练。Tk-Instruct模型系列使用757个任务进行训练，而mTk-Instruct系列使用1271个任务进行训练（包括一些非英语任务）。

训练任务分为64个广泛的类别，如文本分类/问答/情感分析/摘要/语法错误检测/对话生成等。其他12个类别用于评估。

训练过程

我们的所有模型都是从T5模型或mT5模型初始化的。因为生成输出可以被视为一种语言建模形式，我们使用了它们的 LM adapted version 作为我们的语言模型。所有数据都被转换为文本到文本的格式，并且模型被微调以最大化输出序列的似然性。

我们的 released models 参数具有不同的大小，并且每个模型在训练时使用特定类型的指令编码进行训练。例如， tk-instruct-3b-def-pos 是从 t5-xl-lm-adapt 初始化的，在训练过程中，它使用任务定义和2个正面示例作为指令进行训练。尽管它们只使用一种类型的指令编码进行训练，但我们发现它们通常可以在测试时与其他类型的编码一起使用（请参阅我们的论文了解更多信息）。

BibTeX条目和引用信息

@article{wang2022benchmarking,
  title={Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks},
  author={Yizhong Wang and Swaroop Mishra and Pegah Alipoormolabashi and Yeganeh Kordi and Amirreza Mirzaei and A. Arunkumar and Arjun Ashok and Arut Selvan Dhanasekaran and Atharva Naik and David Stap and Eshaan Pathak and Giannis Karamanolakis and Haizhi Gary Lai and Ishan Purohit and Ishani Mondal and Jacob Anderson and Kirby Kuznia and Krima Doshi and Maitreya Patel and Kuntal Kumar Pal and M. Moradshahi and Mihir Parmar and Mirali Purohit and Neeraj Varshney and Phani Rohitha Kaza and Pulkit Verma and Ravsehaj Singh Puri and Rushang Karia and Shailaja Keyur Sampat and Savan Doshi and Siddharth Deepak Mishra and Sujan C. Reddy and Sumanta Patro and Tanay Dixit and Xu-dong Shen and Chitta Baral and Yejin Choi and Hannaneh Hajishirzi and Noah A. Smith and Daniel Khashabi},
  year={2022},
  archivePrefix={arXiv},
  eprint={2204.07705},
  primaryClass={cs.CL},
}

作者:

Allen Institute for AI

数据集大小:

6.99 GB