中文

Corpus Summary

This corpus has 192050 entries made up of descriptive sentences of the faces of the CelebA dataset. The preprocessing of the corpus has been to translate into Spanish the captions of the CelebA dataset with the algorithm used in Text2FaceGAN . In particular, all sentences are combined to generate a larger corpus. Additionally, a data preprocessing was applied that consists of eliminating stopwords, separation symbols and complementary elements that are not useful for training. Finally, using the Sent2vec library and the corpus, training was done to obtain an encoder model for sentences in the Spanish language. Specifically for captions from the CelebA dataset

The training of Sent2vec + CelebA, using the present corpus was developed, resulting in the new model Sent2vec-CelebA-Sp .

Corpus Fields

Each corpus entry is composed of:

  • Descriptive sentence of a face from the CelebA dataset applied the corresponding preprocessing.

You can download the file with a .txt or .csv extension as appropriate.

Citation information

Citing : If you used CelebA_Sent2vec_Sp corpus in your work, please cite the ???? :

License

This corpus is available under the Apache License 2.0 .

Autors

Universidad Nacional de Ingeniería , Ontology Engineering Group , Universidad Politécnica de Madrid.

Contributors

See the full list of contributors here .