Touché23-ValueEval 数据集包含来自六个不同来源的9324个论证。论证来源用其论证编号的首字母表示:
标注的标签基于发表在 Identifying the Human Values behind Arguments (Kiesel等,2022年)上的价值层级分类法。
[1] https://language.ml [2] https://en.wikipedia.org/wiki/Nahj_al-Balagha [3] https://en.wikipedia.org/wiki/Ghurar_al-Hikam_wa_Durar_al-Kalim
默认配置名为 main。
from datasets import load_dataset dataset = load_dataset("webis/Touche23-ValueEval") print(dataset['train'].info.description) for argument in iter(dataset['train']): print(f"{argument['Argument ID']}: {argument['Stance']} '{argument['Conclusion']}': {argument['Premise']}")
人类价值检测
论证实例都是单语的;它只包括英语(主要是en-US)文档。一些数据集部分的元数据实例还以其原始语言和措辞来说明论证。
每个论证实例具有以下属性:
此外,标签还分为价值类别(value-categories)和人类价值(human values),即价值层级分类法的二级标签和一级标签。此区分也反映在配置名中:
labels = ["Self-direction: thought", "Self-direction: action", "Stimulation", "Hedonism", "Achievement", "Power: dominance", "Power: resources", "Face", "Security: personal", "Security: societal", "Tradition", "Conformity: rules", "Conformity: interpersonal", "Humility", "Benevolence: caring", "Benevolence: dependability", "Universalism: concern", "Universalism: nature", "Universalism: tolerance", "Universalism: objectivity"]
此数据集中的配置名(替代<config>)为:
dataset_main_train = load_dataset("webis/Touche23-ValueEval", split="train") dataset_main_validation = load_dataset("webis/Touche23-ValueEval", split="validation") dataset_main_test = load_dataset("webis/Touche23-ValueEval", split="test")
dataset_nahjalbalagha_test = load_dataset("webis/Touche23-ValueEval", name="nahjalbalagha", split="test")
dataset_nyt_test = load_dataset("webis/Touche23-ValueEval", name="nyt", split="test")
dataset_zhihu_validation = load_dataset("webis/Touche23-ValueEval", name="zhihu", split="validation")
请注意,由于版权原因,尚不存在直接下载链接到纽约时报数据集中的论证。因此,访问任何nyt或nyt-level1配置将使用特别创建的 nyt-downloader program 在本地创建和访问论证。有关更多详细信息,请参阅程序的 README 。
下面列出了元数据的所有配置名。每个配置只有一个名为meta的拆分。
dataset_ibm_metadata = load_dataset("webis/Touche23-ValueEval", name="ibm-meta", split="meta")
dataset_zhihu_metadata = load_dataset("webis/Touche23-ValueEval", name="zhihu-meta", split="meta")
dataset_gdi_metadata = load_dataset("webis/Touche23-ValueEval", name="gdi-meta", split="meta")
dataset_cofe_metadata = load_dataset("webis/Touche23-ValueEval", name="cofe-meta", split="meta")
dataset_nahjalbalagha_metadata = load_dataset("webis/Touche23-ValueEval", name="nahjalbalagha-meta", split="meta")
dataset_nyt_metadata = load_dataset("webis/Touche23-ValueEval", name="nyt-meta", split="meta")
{ "<value category>": { "<level 1 value>": [ "<exemplary effect a corresponding argument might target>", ... ], ... }, ... }。由于此配置只包含单个条目,因此可以将其用于以下示例:
value_categories = load_dataset("webis/Touche23-ValueEval", name="value-categories", split="meta")[0]
[需要更多信息]
Creative Commons Attribution 4.0 International (CC BY 4.0)
@Article{mirzakhmedova:2023a, author = {Nailia Mirzakhmedova and Johannes Kiesel and Milad Alshomary and Maximilian Heinrich and Nicolas Handke\ and Xiaoni Cai and Valentin Barriere and Doratossadat Dastgheib and Omid Ghahroodi and {Mohammad Ali} Sadraei\ and Ehsaneddin Asgari and Lea Kawaletz and Henning Wachsmuth and Benno Stein}, doi = {10.48550/arXiv.2301.13771}, journal = {CoRR}, month = jan, publisher = {arXiv}, title = {{The Touch{\'e}23-ValueEval Dataset for Identifying Human Values behind Arguments}}, volume = {abs/2301.13771}, year = 2023 }