数据集:

imodels/compas-recidivism

中文

Port of the compas-recidivism dataset from propublica (github here ). See details there and use carefully, as there are serious known social impacts and biases present in this dataset.

Basic preprocessing done by the imodels team in this notebook .

The target is the binary outcome is_recid .

Sample usage

Load the data:

from datasets import load_dataset

dataset = load_dataset("imodels/compas-recidivism")
df = pd.DataFrame(dataset['train'])
X = df.drop(columns=['is_recid'])
y = df['is_recid'].values

Fit a model:

import imodels
import numpy as np

m = imodels.FIGSClassifier(max_rules=5)
m.fit(X, y)
print(m)

Evaluate:

df_test = pd.DataFrame(dataset['test'])
X_test = df.drop(columns=['is_recid'])
y_test = df['is_recid'].values
print('accuracy', np.mean(m.predict(X_test) == y_test))