数据集:
Anthropic/model-written-evals
语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
machine-generated批注创建人:
machine-generated源数据集:
original许可:
cc-by-4.0该存储库包含我们在《使用模型生成的评估发现语言模型行为》一文中使用的语言模型生成的数据集。
我们希望这些数据集对以下人群有用:
这些评估是生成的,可以用于对话代理(例如一个专门对用户话语进行微调的模型,或者一个预训练的语言模型被要求表现得像一个对话代理)。然而,也可以调整数据以测试其他类型的模型。
下面我们逐一描述我们的数据集合:
有关数据集的更多详细信息,请参阅我们的论文,包括我们如何生成数据集、人工验证指标和其他数据集分析。
免责声明:如我们的论文所述,某些数据包含社会偏见和刻板印象的内容。数据还可能包含其他形式的有害或冒犯性内容。数据中表达的观点不代表Anthropic或其员工的观点。
如果有问题,请发送电子邮件至ethan at anthropic dot com
如果您想引用我们的工作或数据,可以使用以下Bibtex引用:
@misc{perez2022discovering, doi = {10.48550/ARXIV.2212.09251}, url = {https://arxiv.org/abs/2212.09251}, author = {Perez, Ethan and Ringer, Sam and Lukošiūtė, Kamilė and Nguyen, Karina and Chen, Edwin and Heiner, Scott and Pettit, Craig and Olsson, Catherine and Kundu, Sandipan and Kadavath, Saurav and Jones, Andy and Chen, Anna and Mann, Ben and Israel, Brian and Seethor, Bryan and McKinnon, Cameron and Olah, Christopher and Yan, Da and Amodei, Daniela and Amodei, Dario and Drain, Dawn and Li, Dustin and Tran-Johnson, Eli and Khundadze, Guro and Kernion, Jackson and Landis, James and Kerr, Jamie and Mueller, Jared and Hyun, Jeeyoon and Landau, Joshua and Ndousse, Kamal and Goldberg, Landon and Lovitt, Liane and Lucas, Martin and Sellitto, Michael and Zhang, Miranda and Kingsland, Neerav and Elhage, Nelson and Joseph, Nicholas and Mercado, Noemí and DasSarma, Nova and Rausch, Oliver and Larson, Robin and McCandlish, Sam and Johnston, Scott and Kravec, Shauna and {El Showk}, Sheer and Lanham, Tamera and Telleen-Lawton, Timothy and Brown, Tom and Henighan, Tom and Hume, Tristan and Bai, Yuntao and Hatfield-Dodds, Zac and Clark, Jack and Bowman, Samuel R. and Askell, Amanda and Grosse, Roger and Hernandez, Danny and Ganguli, Deep and Hubinger, Evan and Schiefer, Nicholas and Kaplan, Jared}, keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Discovering Language Model Behaviors with Model-Written Evaluations}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} }