数据集:

hackathon-pln-es/neutral-es

语言:

es

计算机处理:

monolingual

大小:

1K<n<10K
中文

Spanish Gender Neutralization

Spanish is a beautiful language and it has many ways of referring to people, neutralizing the genders and using some of the resources inside the language. One would say Todas las personas asistentes instead of Todos los asistentes and it would end in a more inclusive way for talking about people. This dataset collects a set of manually anotated examples of gendered-to-neutral spanish transformations.

The intended use of this dataset is to train a spanish language model for translating from gendered to neutral, in order to have more inclusive sentences.

Compiled sources

One of the major challenges was to obtain a valuable dataset that would suit gender inclusion purpose, therefore, when building the dataset, the team opted to dedicate a considerable amount of time to build it from a scratch. You can find here the results.

The data used for the model training has been manually created form a compilation of sources, obtained from a series of guidelines and manuals issued by Spanish Ministry of Health, Social Services and Equality in the matter of the usage of non-sexist language, stipulated in this linked document .

NOTE: Appart from manually anotated samples, this dataset has been further increased by applying data augmentation so a minumin number of training examples are generated.

Team Members

Enjoy and feel free to collaborate with this dataset ?