数据集:
w11wo/imdb-javanese
任务:
文本分类语言:
jv计算机处理:
monolingual大小:
10K<n<100K语言创建人:
machine-generated批注创建人:
found源数据集:
original许可:
odbl这是一个翻译成爪哇语的大型电影评论数据集。该数据集用于二元情感分类,包含比之前基准数据集更多的数据。我们提供了25,000个极性强烈的电影评论用于训练,以及25,000个用于测试。还提供了其他未标记的数据可供使用。我们使用多语言的MarianMT Transformer模型将 original IMDB Dataset 翻译成了爪哇语。
我们展示了数据集的最多5个配置的详细信息。
javanese_imdb_train.csv的一个示例如下所示。
label | text |
---|---|
1 | "Drama romantik sing digawé karo direktur Martin Ritt kuwi ora dingertèni, nanging ana momen-momen sing marahi karisma lintang Jane Fonda lan Robert De Niro (kelompok sing luar biasa). Dhèwèké dadi randha sing ora isa mlaku, iso anu anyar lan anyar-inventor-- kowé isa nganggep isiné. Adapsi novel Pat Barker ""Union Street"" (yak titel sing apik!) arep dinggo-back-back it on bland, lan pendidikan film kuwi gampang, nanging isih nyenengké; a rosy-hued-inventor-fantasi. Ora ana sing ngganggu gambar sing sejati ding kok iso dinggo nggawe gambar sing paling nyeneng." |
0 | "Pengalaman wong lanang sing nduwé perasaan sing ora lumrah kanggo babi. Mulai nganggo tuladha sing luar biasa yaiku komedia. Wong orkestra termel digawé dadi wong gila, sing kasar merga nyanyian nyanyi. Sayangé, kuwi tetep absurd wektu WHOLE tanpa ceramah umum sing mung digawé. Malah, sing ana ing jaman kuwi kudu ditinggalké. Diyalog kryptik sing nggawé Shakespeare marah gampang kanggo kelas telu. Pak teknis kuwi luwih apik timbang kowe mikir nganggo cinematografi sing apik sing jenengé Vilmos Zsmond. Masa depan bintang Saly Kirkland lan Frederic Forrest isa ndelok." |
train | unsupervised | test |
---|---|---|
25000 | 50000 | 25000 |
如果您在研究中使用了此数据集,请引用:
@inproceedings{wongso2021causal, title={Causal and Masked Language Modeling of Javanese Language using Transformer-based Architectures}, author={Wongso, Wilson and Setiawan, David Samuel and Suhartono, Derwin}, booktitle={2021 International Conference on Advanced Computer Science and Information Systems (ICACSIS)}, pages={1--7}, year={2021}, organization={IEEE} }
@InProceedings{maas-EtAl:2011:ACL-HLT2011, author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher}, title = {Learning Word Vectors for Sentiment Analysis}, booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, month = {June}, year = {2011}, address = {Portland, Oregon, USA}, publisher = {Association for Computational Linguistics}, pages = {142--150}, url = {http://www.aclweb.org/anthology/P11-1015} }