数据集:
mstz/adult
来自人口普查数据集的 Adult dataset 和 UCI ML repository 。该数据集包括个人特征和其收入阈值。
Configuration | Task | Description |
---|---|---|
encoding | Encoding dictionary showing original values of encoded features. | |
income | Binary classification | Classify the person's income as over or under the threshold. |
income-no race | Binary classification | As income , but the race feature is removed. |
race | Multiclass classification | Predict the race of the individual. |
from datasets import load_dataset dataset = load_dataset("mstz/adult", "income")["train"]
目标特征根据所选配置而变化,并始终位于数据集的最后位置。
Feature | Type | Description |
---|---|---|
age | [int64] | Age of the person. |
capital_gain | [float64] | Capital gained by the person. |
capital_loss | [float64] | Capital lost by the person. |
education | [int8] | Education level: the higher, the more educated the person. |
final_weight | [int64] | |
hours_worked_per_week | [int64] | Hours worked per week. |
marital_status | [string] | Marital status of the person. |
native_country | [string] | Native country of the person. |
occupation | [string] | Job of the person. |
race | [string] | Race of the person. |
relationship | [string] | |
is_male | [bool] | Man/Woman. |
workclass | [string] | Type of job of the person. |
over_threshold | int8 | 1 for income >= 50k$ , 0 otherwise. |