数据集:
hackathon-pln-es/neutral-es
Spanish is a beautiful language and it has many ways of referring to people, neutralizing the genders and using some of the resources inside the language. One would say Todas las personas asistentes instead of Todos los asistentes and it would end in a more inclusive way for talking about people. This dataset collects a set of manually anotated examples of gendered-to-neutral spanish transformations.
The intended use of this dataset is to train a spanish language model for translating from gendered to neutral, in order to have more inclusive sentences.
One of the major challenges was to obtain a valuable dataset that would suit gender inclusion purpose, therefore, when building the dataset, the team opted to dedicate a considerable amount of time to build it from a scratch. You can find here the results.
The data used for the model training has been manually created form a compilation of sources, obtained from a series of guidelines and manuals issued by Spanish Ministry of Health, Social Services and Equality in the matter of the usage of non-sexist language, stipulated in this linked document .
NOTE: Appart from manually anotated samples, this dataset has been further increased by applying data augmentation so a minumin number of training examples are generated.
Guía para un discurso igualitario en la universidad de alicante
Buenas prácticas para el tratamiento del lenguaje en igualdad
Guía del lenguaje no sexista de la Universidad de Castilla-La Mancha
Guía para un uso igualitario y no sexista del lenguaje y dela imagen en la Universidad de Jaén
Guía para el uso no sexista de la lengua castellana y de imágnes en la UPV/EHV