数据集:

code_x_glue_cc_defect_detection

任务:

文本分类

子任务:

multi-class-classification

语言:

code

计算机处理:

other-programming-languages

大小:

10K<n<100K

语言创建人:

found

批注创建人:

found

源数据集:

original

许可:

c-uda

数据集介绍文件清单

英文

"code_x_glue_cc_defect_detection" 的数据集卡片

数据集概述

CodeXGLUE 缺陷检测数据集，可在 https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection 处获取。

给定一个源代码，任务是识别其是否为可能攻击软件系统的不安全代码，例如资源泄漏、UAF 漏洞和 DoS 攻击。我们将任务视为二分类（0/1），其中 1 表示不安全代码，0 表示安全代码。我们使用的数据集来自论文 Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks。我们将所有项目合并，并分为 80%/10%/10% 进行训练/开发/测试。

支持的任务和排行榜

多类别分类：可以使用数据集训练模型以检测代码是否有缺陷。

语言

C 编程语言

数据集结构

数据实例

'验证集' 的示例如下所示。

{
    "commit_id": "aa1530dec499f7525d2ccaa0e3a876dc8089ed1e", 
    "func": "static void filter_mirror_setup(NetFilterState *nf, Error **errp)\n{\n    MirrorState *s = FILTER_MIRROR(nf);\n    Chardev *chr;\n    chr = qemu_chr_find(s->outdev);\n    if (chr == NULL) {\n        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,\n                  \"Device '%s' not found\", s->outdev);\n    qemu_chr_fe_init(&s->chr_out, chr, errp);", 
    "id": 8, 
    "project": "qemu", 
    "target": true
}

数据字段

以下解释了每个配置中的每个数据字段。数据字段在所有拆分中是相同的。

default

field name	type	description
id	int32	Index of the sample
func	string	The source code
target	bool	0 or 1 (vulnerability or not)
project	string	Original project that contains this code
commit_id	string	Commit identifier in the original project

数据拆分

name	train	validation	test
default	21854	2732	2732

数据集创建

收集理由

[需要更多信息]

源数据

初始数据收集和归一化

[需要更多信息]

谁是源语言的生产者？

[需要更多信息]

注释

注释过程

[需要更多信息]

注释者是谁？

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据的注意事项

数据集的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集创建者

https://github.com/microsoft ， https://github.com/madlag

许可信息

数据使用协议 (C-UDA) 许可证。

引用信息

@inproceedings{zhou2019devign,
title={Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks},
author={Zhou, Yaqin and Liu, Shangqing and Siow, Jingkai and Du, Xiaoning and Liu, Yang},
booktitle={Advances in Neural Information Processing Systems},
pages={10197--10207}, year={2019}

贡献

感谢 @madlag（部分也感谢 @ncoop57）添加此数据集。

作者:

佚名

数据集大小:

15.53 KB