英文

成年人

来自人口普查数据集的 Adult dataset UCI ML repository 。该数据集包括个人特征和其收入阈值。

配置和任务

Configuration Task Description
encoding Encoding dictionary showing original values of encoded features.
income Binary classification Classify the person's income as over or under the threshold.
income-no race Binary classification As income , but the race feature is removed.
race Multiclass classification Predict the race of the individual.

使用

from datasets import load_dataset

dataset = load_dataset("mstz/adult", "income")["train"]

特点

目标特征根据所选配置而变化,并始终位于数据集的最后位置。

Feature Type Description
age [int64] Age of the person.
capital_gain [float64] Capital gained by the person.
capital_loss [float64] Capital lost by the person.
education [int8] Education level: the higher, the more educated the person.
final_weight [int64]
hours_worked_per_week [int64] Hours worked per week.
marital_status [string] Marital status of the person.
native_country [string] Native country of the person.
occupation [string] Job of the person.
race [string] Race of the person.
relationship [string]
is_male [bool] Man/Woman.
workclass [string] Type of job of the person.
over_threshold int8 1 for income >= 50k$ , 0 otherwise.