数据集:

datacommons_factcheck

子任务:

fact-checking

语言:

en

计算机处理:

monolingual

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original
中文

Dataset Card for DataCommons Fact Checked claims

Dataset Summary

A dataset of fact checked claims by news media maintained by datacommons.org containing the claim, author, and judgments, as well as the URL of the full explanation by the original fact-checker.

The fact checking is done by FactCheck.org , PolitiFact , and The Washington Post .

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The data is in English ( en ).

Dataset Structure

Data Instances

An example of fact checking instance looks as follows:

{'claim_author_name': 'Facebook posts',
 'claim_date': '2019-01-01',
 'claim_text': 'Quotes Michelle Obama as saying, "White folks are what’s wrong with America."',
 'review_date': '2019-01-03',
 'review_rating': 'Pants on Fire',
 'review_url': 'https://www.politifact.com/facebook-fact-checks/statements/2019/jan/03/facebook-posts/did-michelle-obama-once-say-white-folks-are-whats-/',
 'reviewer_name': 'PolitiFact'}

Data Fields

A data instance has the following fields:

  • review_date : the day the fact checking report was posted. Missing values are replaced with empty strings
  • review_url : URL for the full fact checking report
  • reviewer_name : the name of the fact checking service.
  • claim_text : the full text of the claim being reviewed.
  • claim_author_name : the author of the claim being reviewed. Missing values are replaced with empty strings
  • claim_date the date of the claim. Missing values are replaced with empty strings
  • review_rating : the judgments of the fact checker (under alternateName , names vary by fact checker)

Data Splits

No splits are provided. There are a total of 5632 claims fact-checked.

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

The fact checking is done by FactCheck.org , PolitiFact , The Washington Post , and The Weekly Standard .

  • FactCheck.org self describes as "a nonpartisan, nonprofit 'consumer advocate' for voters that aims to reduce the level of deception and confusion in U.S. politics." It was founded by journalists Kathleen Hall Jamieson and Brooks Jackson and is currently directed by Eugene Kiely.
  • PolitiFact describe their ethics as "seeking to present the true facts, unaffected by agenda or biases, [with] journalists setting their own opinions aside." It was started in August 2007 by Times Washington Bureau Chief Bill Adair. The organization was acquired in February 2018 by the Poynter Institute, a non-profit journalism education and news media research center that also owns the Tampa Bay Times.
  • The Washington Post is a newspaper considered to be near the center of the American political spectrum. In 2013 Amazon.com founder Jeff Bezos bought the newspaper and affiliated publications.

The original data source also contains 132 items reviewed by The Weekly Standard , which was a neo-conservative American newspaper. IT is the most politically loaded source of the group, which was originally a vocal creitic of the activity of fact-checking, and has historically taken stances close to the American right . It also had to admit responsibility for baseless accusations against a well known author in a public libel case . The fact checked items from this source can be found in the weekly_standard configuration but should be used only with full understanding of this context.

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

See section above describing the fact checking organizations .

[More Information Needed]

Other Known Limitations

Dataset provided for research purposes only. Please check dataset license for additional information.

Additional Information

Dataset Curators

This fact checking dataset is maintained by datacommons.org , a Google initiative.

Licensing Information

All fact checked items are released under a CC-BY-NC-4.0 License.

Citation Information

Data Commons 2020, Fact Checks, electronic dataset, Data Commons, viewed 16 Dec 2020, https://datacommons.org .

Contributions

Thanks to @yjernite for adding this dataset.