数据集:
swda
任务:
文本分类语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
extended|other-Switchboard-1 Telephone Speech Corpus, Release 2 extended|other-Switchboard-1+Telephone+Speech+Corpus,+Release+2许可:
cc-by-nc-sa-3.0Switchboard 对话行为语料库(SwDA)扩展了 Switchboard-1 电话语音语料库第二版,提供了句子/话语级别的对话行为标签。这些标签总结了与该话语相关的句法、语义和语用信息。SwDA 项目是在 1990 年代末期由 UC Boulder 进行的。SwDA 并不与 Penn Treebank 3 的 Switchboard 解析相关联,将这两个资源对齐并不简单。此外,SwDA 并不包含 Switchboard 对话和参与者的元数据表。
Model | Accuracy | Paper / Source | Code |
---|---|---|---|
H-Seq2seq (Colombo et al., 2020) | 85.0 | 1231321 | |
SGNN (Ravi et al., 2018) | 83.1 | 1232321 | |
CASA (Raheja et al., 2019) | 82.9 | 1233321 | |
DAH-CRF (Li et al., 2019) | 82.3 | 1234321 | |
ALDMN (Wan et al., 2018) | 81.5 | 1235321 | |
CRF-ASN (Chen et al., 2018) | 81.3 | 1236321 | |
Pretrained H-Transformer (Chapuis et al., 2020) | 79.3 | [Hierarchical Pre-training for Sequence Labelling in Spoken Dialog] ( 1237321 ) | |
Bi-LSTM-CRF (Kumar et al., 2017) | 79.2 | 1238321 | 1239321 |
RNN with 3 utterances in context (Bothe et al., 2018) | 77.34 | 12310321 |
支持的语言是英语。
句子以 SWBD-DAMSL 作为 DA 进行标记。
数据集中的一个示例为:
{"act_tag": 115, "caller": "A", "conversation_no": 4325, "damsl_act_tag": 26, "from_caller": 1632, "from_caller_birth_year": 1962, "from_caller_dialect_area": "WESTERN", "from_caller_education": 2, "from_caller_sex": "FEMALE", "length": 5, "pos": "Okay/UH ./.", "prompt": "FIND OUT WHAT CRITERIA THE OTHER CALLER WOULD USE IN SELECTING CHILD CARE SERVICES FOR A PRESCHOOLER. IS IT EASY OR DIFFICULT TO FIND SUCH CARE?", "ptb_basename": "4/sw4325", "ptb_treenumbers": "1", "subutterance_index": 1, "swda_filename": "sw00utt/sw_0001_4325.utt", "talk_day": "03/23/1992", "text": "Okay. /", "to_caller": 1519, "to_caller_birth_year": 1971, "to_caller_dialect_area": "SOUTH MIDLAND", "to_caller_education": 1, "to_caller_sex": "FEMALE", "topic_description": "CHILD CARE", "transcript_index": 0, "trees": "(INTJ (UH Okay) (. .) (-DFL- E_S))", "utterance_index": 1}
name | act_tag | example | train_count | full_count | |
---|---|---|---|---|---|
1 | Statement-non-opinion | sd | Me, I'm in the legal department. | 72824 | 75145 |
2 | Acknowledge (Backchannel) | b | Uh-huh. | 37096 | 38298 |
3 | Statement-opinion | sv | I think it's great | 25197 | 26428 |
4 | Agree/Accept | aa | That's exactly it. | 10820 | 11133 |
5 | Abandoned or Turn-Exit | % | So, - | 10569 | 15550 |
6 | Appreciation | ba | I can imagine. | 4633 | 4765 |
7 | Yes-No-Question | qy | Do you have to have any special training? | 4624 | 4727 |
8 | Non-verbal | x | [Laughter], [Throat_clearing] | 3548 | 3630 |
9 | Yes answers | ny | Yes. | 2934 | 3034 |
10 | Conventional-closing | fc | Well, it's been nice talking to you. | 2486 | 2582 |
11 | Uninterpretable | % | But, uh, yeah | 2158 | 15550 |
12 | Wh-Question | qw | Well, how old are you? | 1911 | 1979 |
13 | No answers | nn | No. | 1340 | 1377 |
14 | Response Acknowledgement | bk | Oh, okay. | 1277 | 1306 |
15 | Hedge | h | I don't know if I'm making any sense or not. | 1182 | 1226 |
16 | Declarative Yes-No-Question | qy^d | So you can afford to get a house? | 1174 | 1219 |
17 | Other | fo_o_fw_by_bc | Well give me a break, you know. | 1074 | 883 |
18 | Backchannel in question form | bh | Is that right? | 1019 | 1053 |
19 | Quotation | ^q | You can't be pregnant and have cats | 934 | 983 |
20 | Summarize/reformulate | bf | Oh, you mean you switched schools for the kids. | 919 | 952 |
21 | Affirmative non-yes answers | na | It is. | 836 | 847 |
22 | Action-directive | ad | Why don't you go first | 719 | 746 |
23 | Collaborative Completion | ^2 | Who aren't contributing. | 699 | 723 |
24 | Repeat-phrase | b^m | Oh, fajitas | 660 | 688 |
25 | Open-Question | qo | How about you? | 632 | 656 |
26 | Rhetorical-Questions | qh | Who would steal a newspaper? | 557 | 575 |
27 | Hold before answer/agreement | ^h | I'm drawing a blank. | 540 | 556 |
28 | Reject | ar | Well, no | 338 | 346 |
29 | Negative non-no answers | ng | Uh, not a whole lot. | 292 | 302 |
30 | Signal-non-understanding | br | Excuse me? | 288 | 298 |
31 | Other answers | no | I don't know | 279 | 286 |
32 | Conventional-opening | fp | How are you? | 220 | 225 |
33 | Or-Clause | qrr | or is it more of a company? | 207 | 209 |
34 | Dispreferred answers | arp_nd | Well, not so much that. | 205 | 207 |
35 | 3rd-party-talk | t3 | My goodness, Diane, get down from there. | 115 | 117 |
36 | Offers, Options, Commits | oo_co_cc | I'll have to check that out | 109 | 110 |
37 | Self-talk | t1 | What's the word I'm looking for | 102 | 103 |
38 | Downplayer | bd | That's all right. | 100 | 103 |
39 | Maybe/Accept-part | aap_am | Something like that | 98 | 105 |
40 | Tag-Question | ^g | Right? | 93 | 92 |
41 | Declarative Wh-Question | qw^d | You are what kind of buff? | 80 | 80 |
42 | Apology | fa | I'm sorry. | 76 | 79 |
43 | Thanking | ft | Hey thanks a lot | 67 | 78 |
我使用了 Probabilistic-RNN-DA-Classifier 仓库中的信息:与 Stolcke et al. (2000) 使用的相同的训练和测试拆分。开发集是训练集的子集,用于加快开发和测试过程,用于 Probabilistic Word Association for Dialogue Act Classification with Recurrent Neural Networks 论文。
Dataset | # Transcripts | # Utterances |
---|---|---|
Training | 1115 | 192,768 |
Validation | 21 | 3,196 |
Test | 19 | 4,088 |
[需要更多信息]
SwDA 与 Penn Treebank 3 的 Switchboard 解析并无直接关联,对齐这两个资源并不简单,详见 Calhoun et al. 2010,§2.4。此外,SwDA 未包含 Switchboard 对话和参与者的元数据表。
谁是源语言的制作者?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
Christopher Potts ,斯坦福大学语言学系。
本作品根据 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. 许可。
@techreport{Jurafsky-etal:1997, Address = {Boulder, CO}, Author = {Jurafsky, Daniel and Shriberg, Elizabeth and Biasca, Debra}, Institution = {University of Colorado, Boulder Institute of Cognitive Science}, Number = {97-02}, Title = {Switchboard {SWBD}-{DAMSL} Shallow-Discourse-Function Annotation Coders Manual, Draft 13}, Year = {1997}} @article{Shriberg-etal:1998, Author = {Shriberg, Elizabeth and Bates, Rebecca and Taylor, Paul and Stolcke, Andreas and Jurafsky, Daniel and Ries, Klaus and Coccaro, Noah and Martin, Rachel and Meteer, Marie and Van Ess-Dykema, Carol}, Journal = {Language and Speech}, Number = {3--4}, Pages = {439--487}, Title = {Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?}, Volume = {41}, Year = {1998}} @article{Stolcke-etal:2000, Author = {Stolcke, Andreas and Ries, Klaus and Coccaro, Noah and Shriberg, Elizabeth and Bates, Rebecca and Jurafsky, Daniel and Taylor, Paul and Martin, Rachel and Meteer, Marie and Van Ess-Dykema, Carol}, Journal = {Computational Linguistics}, Number = {3}, Pages = {339--371}, Title = {Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech}, Volume = {26}, Year = {2000}}
感谢 @gmihaila 添加了该数据集。