Django Dataset for Code Translation Tasks

Django dataset used in the paper "Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation" , Oda et al., ASE, 2015.

The Django dataset is a dataset for code generation comprising of 16000 training, 1000 development and 1805 test annotations. Each data point consists of a line of Python code together with a manually created natural language description.

@inproceedings{oda2015ase:pseudogen1,
 author = {Oda, Yusuke and Fudaba, Hiroyuki and Neubig, Graham and Hata, Hideaki and Sakti, Sakriani and Toda, Tomoki and Nakamura, Satoshi},
 title = {Learning to Generate Pseudo-code from Source Code Using Statistical Machine Translation},
 booktitle = {Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)},
 series = {ASE '15},
 month = {November},
 year = {2015},
 isbn = {978-1-5090-0025-8},
 pages = {574--584},
 numpages = {11},
 url = {https://doi.org/10.1109/ASE.2015.36},
 doi = {10.1109/ASE.2015.36},
 acmid = {2916173},
 publisher = {IEEE Computer Society},
 address = {Lincoln, Nebraska, USA}
}

作者:

AhmedSSoliman

数据集大小:

2.16 MB