数据集:

qed_amara

任务:

翻译

计算机处理:

multilingual

大小:

100K<n<1M

语言创建人:

found

批注创建人:

found

源数据集:

original
中文

Dataset Card for QedAmara

Dataset Summary

To load a language pair which isn't part of the config, all you need to do is specify the language code as pairs. You can find the valid pairs in Homepage section of Dataset Description: http://opus.nlpl.eu/QED.php E.g.

dataset = load_dataset("qed_amara", lang1="cs", lang2="nb")

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The languages in the dataset are:

  • aa
  • ab
  • ae
  • aeb
  • af
  • aka: ak
  • amh: am
  • an
  • ar
  • arq
  • arz
  • as
  • ase
  • ast
  • av
  • ay
  • az
  • ba
  • bam: bm
  • be
  • ber
  • bg
  • bh
  • bi
  • bn
  • bnt
  • bo
  • br
  • bs
  • bug
  • ca
  • ce
  • ceb
  • ch
  • cho
  • cku
  • cnh
  • co
  • cr
  • cs
  • cu
  • cv
  • cy
  • da
  • de
  • dv
  • dz
  • ee
  • efi
  • el
  • en
  • eo
  • es
  • et
  • eu
  • fa
  • ff
  • fi
  • fil
  • fj
  • fo
  • fr
  • ful: ff
  • ga
  • gd
  • gl
  • gn
  • gu
  • hai
  • hau: ha
  • haw
  • haz
  • hb: ?
  • hch
  • he
  • hi
  • ho
  • hr
  • ht
  • hu
  • hup
  • hus
  • hy
  • hz
  • ia
  • ibo: ig
  • id
  • ie
  • ik
  • inh
  • io
  • iro
  • is
  • it
  • iu
  • ja
  • jv
  • ka
  • kar
  • kau: kr
  • kik: ki
  • kin: rw
  • kj
  • kk
  • kl
  • km
  • kn
  • ko
  • ksh
  • ku
  • kv
  • kw
  • ky
  • la
  • lb
  • lg
  • li
  • lin: ln
  • lkt
  • lld
  • lo
  • lt
  • ltg
  • lu
  • luo
  • luy
  • lv
  • mad
  • mfe
  • mi
  • mk
  • ml
  • mlg: mg
  • mn
  • mni
  • mo: Moldavian (deprecated tag; preferred value: Romanian; Moldavian; Moldovan ( ro ))
  • moh
  • mos
  • mr
  • ms
  • mt
  • mus
  • my
  • nb
  • nci
  • nd
  • ne
  • nl
  • nn
  • nso
  • nv
  • nya: ny
  • oc
  • or
  • orm: om
  • pam
  • pan: pa
  • pap
  • pi
  • pl
  • pnb
  • prs
  • ps
  • pt
  • que: qu
  • rm
  • ro
  • ru
  • run: rn
  • rup
  • ry: ?
  • sa
  • sc
  • scn
  • sco
  • sd
  • sg
  • sgn
  • sh
  • si
  • sk
  • sl
  • sm
  • sna: sn
  • som: so
  • sot: st
  • sq
  • sr
  • srp: sr
  • sv
  • swa: sw
  • szl
  • ta
  • te
  • tet
  • tg
  • th
  • tir: ti
  • tk
  • tl
  • tlh
  • to
  • tr
  • ts
  • tt
  • tw
  • ug
  • uk
  • umb
  • ur
  • uz
  • ve
  • vi
  • vls
  • vo
  • wa
  • wol: wo
  • xh
  • yaq
  • yi
  • yor: yo
  • za
  • zam
  • zh
  • zul: zu

Dataset Structure

Data Instances

Here are some examples of questions and facts:

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

[More Information Needed]

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

[More Information Needed]

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

[More Information Needed]

Contributions

Thanks to @abhishekkrthakur for adding this dataset.