数据集:

zeroshot/twitter-financial-news-topic

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

other

批注创建人:

other

源数据集:

original

许可:

mit
中文

Read this BLOG to see how I fine-tuned a sparse transformer on this dataset.

Dataset Description

The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their topic.

  • The dataset holds 21,107 documents annotated with 20 labels:
  • topics = {
        "LABEL_0": "Analyst Update",
        "LABEL_1": "Fed | Central Banks",
        "LABEL_2": "Company | Product News",
        "LABEL_3": "Treasuries | Corporate Debt",
        "LABEL_4": "Dividend",
        "LABEL_5": "Earnings",
        "LABEL_6": "Energy | Oil",
        "LABEL_7": "Financials",
        "LABEL_8": "Currencies",
        "LABEL_9": "General News | Opinion",
        "LABEL_10": "Gold | Metals | Materials",
        "LABEL_11": "IPO",
        "LABEL_12": "Legal | Regulation",
        "LABEL_13": "M&A | Investments",
        "LABEL_14": "Macro",
        "LABEL_15": "Markets",
        "LABEL_16": "Politics",
        "LABEL_17": "Personnel Change",
        "LABEL_18": "Stock Commentary",
        "LABEL_19": "Stock Movement",
    }
    

    The data was collected using the Twitter API. The current dataset supports the multi-class classification task.

    Task: Topic Classification

    Data Splits

    There are 2 splits: train and validation. Below are the statistics:

    Dataset Split Number of Instances in Split
    Train 16,990
    Validation 4,118

    Licensing Information

    The Twitter Financial Dataset (topic) version 1.0.0 is released under the MIT License.