数据集:
imodels/compas-recidivism
Port of the compas-recidivism dataset from propublica (github here ). See details there and use carefully, as there are serious known social impacts and biases present in this dataset.
Basic preprocessing done by the imodels team in this notebook .
The target is the binary outcome is_recid .
Load the data:
from datasets import load_dataset dataset = load_dataset("imodels/compas-recidivism") df = pd.DataFrame(dataset['train']) X = df.drop(columns=['is_recid']) y = df['is_recid'].values
Fit a model:
import imodels import numpy as np m = imodels.FIGSClassifier(max_rules=5) m.fit(X, y) print(m)
Evaluate:
df_test = pd.DataFrame(dataset['test']) X_test = df.drop(columns=['is_recid']) y_test = df['is_recid'].values print('accuracy', np.mean(m.predict(X_test) == y_test))