数据集:

times_of_india_news_headlines

中文

Dataset Card for Times of India News Headlines

Dataset Summary

This news dataset is a persistent historical archive of noteable events in the Indian subcontinent from start-2001 to mid-2020, recorded in realtime by the journalists of India. It contains approximately 3.3 million events published by Times of India. Times Group as a news agency, reaches out a very wide audience across Asia and drawfs every other agency in the quantity of english articles published per day. Due to the heavy daily volume over multiple years, this data offers a deep insight into Indian society, its priorities, events, issues and talking points and how they have unfolded over time. It is possible to chop this dataset into a smaller piece for a more focused analysis, based on one or more facets.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The text in the dataset is in English.

Dataset Structure

Data Instances

 {
    'publish_date':  '20010530',
    'headline_category': city.kolkata,
    'headline_text': "Malda fake notes"
 }

Data Fields

  • publish_date : Date of publishing in yyyyMMdd format
  • headline_category : Category of event in ascii, dot-delimited values
  • headline_text : Headline of article en la Engrezi (2020-07-10)

Data Splits

This dataset has no splits.

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

The dataset was created by Rohit Kulkarni.

Licensing Information

The data is under the CC0: Public Domain

Citation Information

@data{DVN/DPQMQH_2020,
author = {Kulkarni, Rohit},
publisher = {Harvard Dataverse},
title = {{Times of India News Headlines}},
year = {2020},
version = {V1},
doi = {10.7910/DVN/DPQMQH},
url = {https://doi.org/10.7910/DVN/DPQMQH}
}

Contributions

Thanks to @tanmoyio for adding this dataset.