数据集:
zeroshot/twitter-financial-news-topic
阅读此处 BLOG ,了解我如何在此数据集上对稀疏变换器进行微调。
Twitter金融新闻数据集是一个包含有关金融相关推文的注释语料库的英语数据集。此数据集用于对金融相关推文的主题进行分类。
topics = { "LABEL_0": "Analyst Update", "LABEL_1": "Fed | Central Banks", "LABEL_2": "Company | Product News", "LABEL_3": "Treasuries | Corporate Debt", "LABEL_4": "Dividend", "LABEL_5": "Earnings", "LABEL_6": "Energy | Oil", "LABEL_7": "Financials", "LABEL_8": "Currencies", "LABEL_9": "General News | Opinion", "LABEL_10": "Gold | Metals | Materials", "LABEL_11": "IPO", "LABEL_12": "Legal | Regulation", "LABEL_13": "M&A | Investments", "LABEL_14": "Macro", "LABEL_15": "Markets", "LABEL_16": "Politics", "LABEL_17": "Personnel Change", "LABEL_18": "Stock Commentary", "LABEL_19": "Stock Movement", }
使用Twitter API收集了这些数据。当前数据集支持多类别分类任务。
有2个拆分:训练和验证。以下是统计信息:
Dataset Split | Number of Instances in Split |
---|---|
Train | 16,990 |
Validation | 4,118 |
Twitter金融数据集(主题)版本1.0.0发布在MIT许可下。