Optuna: 优化超参数的高效与自动化,释放模 型的全部潜力。
Optuna是一个用于超参数优化的开源Python库。由日本公司Preferred Networks开发,Optuna通过优化目标函数来提供一种优雅而自动化的搜索最佳超参数的方式。它被设计为用户友好且高度适应不同的机器学习框架,如scikit-learn、PyTorch、TensorFlow、XGBoost等等。
!pip install optuna
import optuna
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define objective functions for Random Forest and XGBoost
def rf_objective(trial):
n_estimators = trial.suggest_int('n_estimators', 10, 100)
max_depth = trial.suggest_int('max_depth', 2, 32, log=True)
model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
return accuracy_score(y_test, y_pred)
def xgb_objective(trial):
n_estimators = trial.suggest_int('n_estimators', 10, 100)
max_depth = trial.suggest_int('max_depth', 2, 10)
learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1)
model = XGBClassifier(n_estimators=n_estimators, max_depth=max_depth, learning_rate=learning_rate, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
return accuracy_score(y_test, y_pred)
# Optimize hyperparameters for Random Forest
rf_study = optuna.create_study(direction='maximize')
rf_study.optimize(rf_objective, n_trials=50)
# Optimize hyperparameters for XGBoost
xgb_study = optuna.create_study(direction='maximize')
xgb_study.optimize(xgb_objective, n_trials=50)
# Plot the optimization results
# Get the best hyperparameters for each model
best_rf_params = rf_study.best_params
best_xgb_params = xgb_study.best_params
# Train models with the best hyperparameters
best_rf_model = RandomForestClassifier(n_estimators=best_rf_params['n_estimators'],
max_depth=best_rf_params['max_depth'], random_state=42)
best_rf_model.fit(X_train, y_train)
best_xgb_model = XGBClassifier(n_estimators=best_xgb_params['n_estimators'],
learning_rate=best_xgb_params['learning_rate'], random_state=42)
best_xgb_model.fit(X_train, y_train)
# Evaluate and compare models
rf_accuracy = best_rf_model.score(X_test, y_test)
xgb_accuracy = best_xgb_model.score(X_test, y_test)
print("Random Forest Model Accuracy:", rf_accuracy)
print("XGBoost Model Accuracy:", xgb_accuracy)
1. 我们生成合成数据并将其分为训练集和测试集。
2. 我们为随机森林和XGBoost模型定义目标函数,以便使用Optuna进行优化。要优化的超参数包括估计器数量、最大深度和学习率。
3. 我们为每个模型创建单独的Optuna研究,并优化它们各自的目标函数。
4. 我们使用optuna.visualization.plot_optimization_history函数可视化优化历史。
5. 我们从Optuna研究中获取两个模型的最佳超参数。
6. 我们使用最佳超参数训练模型。
7. 我们评估并比较模型在测试数据上的准确性,并打印它们的准确性。
[I 2023-10-25 22:07:14,345] A new study created in memory with name: no-name-25c2cdcb-5698-46b8-926c-9a6106d9f65a
[I 2023-10-25 22:07:15,664] Trial 0 finished with value: 0.87 and parameters: {'n_estimators': 55, 'max_depth': 4}. Best is trial 0 with value: 0.87.
[I 2023-10-25 22:07:17,344] Trial 1 finished with value: 0.9 and parameters: {'n_estimators': 61, 'max_depth': 7}. Best is trial 1 with value: 0.9.
[I 2023-10-25 22:07:18,499] Trial 2 finished with value: 0.89 and parameters: {'n_estimators': 62, 'max_depth': 8}. Best is trial 1 with value: 0.9.
[I 2023-10-25 22:07:20,011] Trial 3 finished with value: 0.905 and parameters: {'n_estimators': 92, 'max_depth': 19}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:20,496] Trial 4 finished with value: 0.865 and parameters: {'n_estimators': 54, 'max_depth': 3}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:20,643] Trial 5 finished with value: 0.84 and parameters: {'n_estimators': 13, 'max_depth': 2}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:21,134] Trial 6 finished with value: 0.86 and parameters: {'n_estimators': 49, 'max_depth': 2}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:21,722] Trial 7 finished with value: 0.86 and parameters: {'n_estimators': 49, 'max_depth': 2}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:22,904] Trial 8 finished with value: 0.855 and parameters: {'n_estimators': 65, 'max_depth': 2}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:24,268] Trial 9 finished with value: 0.9 and parameters: {'n_estimators': 76, 'max_depth': 26}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:26,291] Trial 10 finished with value: 0.9 and parameters: {'n_estimators': 100, 'max_depth': 28}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:28,099] Trial 11 finished with value: 0.895 and parameters: {'n_estimators': 99, 'max_depth': 12}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:32,032] Trial 12 finished with value: 0.905 and parameters: {'n_estimators': 81, 'max_depth': 16}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:33,852] Trial 13 finished with value: 0.895 and parameters: {'n_estimators': 82, 'max_depth': 15}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:35,212] Trial 14 finished with value: 0.895 and parameters: {'n_estimators': 85, 'max_depth': 17}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:36,981] Trial 15 finished with value: 0.9 and parameters: {'n_estimators': 91, 'max_depth': 32}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:37,653] Trial 16 finished with value: 0.87 and parameters: {'n_estimators': 34, 'max_depth': 20}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:38,900] Trial 17 finished with value: 0.895 and parameters: {'n_estimators': 71, 'max_depth': 11}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:40,528] Trial 18 finished with value: 0.905 and parameters: {'n_estimators': 89, 'max_depth': 21}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:41,763] Trial 19 finished with value: 0.895 and parameters: {'n_estimators': 75, 'max_depth': 12}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:42,860] Trial 20 finished with value: 0.875 and parameters: {'n_estimators': 40, 'max_depth': 22}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:45,823] Trial 21 finished with value: 0.9 and parameters: {'n_estimators': 84, 'max_depth': 21}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:47,671] Trial 22 finished with value: 0.905 and parameters: {'n_estimators': 92, 'max_depth': 16}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:48,348] Trial 23 finished with value: 0.9 and parameters: {'n_estimators': 91, 'max_depth': 24}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:49,109] Trial 24 finished with value: 0.905 and parameters: {'n_estimators': 80, 'max_depth': 31}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:49,877] Trial 25 finished with value: 0.9 and parameters: {'n_estimators': 93, 'max_depth': 18}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:50,307] Trial 26 finished with value: 0.89 and parameters: {'n_estimators': 69, 'max_depth': 9}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:50,929] Trial 27 finished with value: 0.905 and parameters: {'n_estimators': 87, 'max_depth': 14}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:51,582] Trial 28 finished with value: 0.9 and parameters: {'n_estimators': 100, 'max_depth': 23}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:51,968] Trial 29 finished with value: 0.885 and parameters: {'n_estimators': 77, 'max_depth': 6}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:52,103] Trial 30 finished with value: 0.875 and parameters: {'n_estimators': 15, 'max_depth': 19}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:53,071] Trial 31 finished with value: 0.9 and parameters: {'n_estimators': 94, 'max_depth': 15}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:53,780] Trial 32 finished with value: 0.905 and parameters: {'n_estimators': 89, 'max_depth': 16}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:54,477] Trial 33 finished with value: 0.9 and parameters: {'n_estimators': 95, 'max_depth': 27}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:55,627] Trial 34 finished with value: 0.9 and parameters: {'n_estimators': 84, 'max_depth': 19}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:56,204] Trial 35 finished with value: 0.9 and parameters: {'n_estimators': 70, 'max_depth': 13}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:56,635] Trial 36 finished with value: 0.885 and parameters: {'n_estimators': 62, 'max_depth': 10}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:57,930] Trial 37 finished with value: 0.9 and parameters: {'n_estimators': 79, 'max_depth': 16}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:07:58,961] Trial 38 finished with value: 0.89 and parameters: {'n_estimators': 56, 'max_depth': 7}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:00,117] Trial 39 finished with value: 0.9 and parameters: {'n_estimators': 97, 'max_depth': 13}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:00,770] Trial 40 finished with value: 0.905 and parameters: {'n_estimators': 89, 'max_depth': 24}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:01,350] Trial 41 finished with value: 0.905 and parameters: {'n_estimators': 81, 'max_depth': 31}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:01,908] Trial 42 finished with value: 0.905 and parameters: {'n_estimators': 73, 'max_depth': 30}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:02,484] Trial 43 finished with value: 0.9 and parameters: {'n_estimators': 79, 'max_depth': 26}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:02,917] Trial 44 finished with value: 0.895 and parameters: {'n_estimators': 65, 'max_depth': 20}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:03,564] Trial 45 finished with value: 0.905 and parameters: {'n_estimators': 88, 'max_depth': 17}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:04,431] Trial 46 finished with value: 0.9 and parameters: {'n_estimators': 96, 'max_depth': 27}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:04,974] Trial 47 finished with value: 0.9 and parameters: {'n_estimators': 82, 'max_depth': 22}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:05,421] Trial 48 finished with value: 0.89 and parameters: {'n_estimators': 57, 'max_depth': 18}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:06,060] Trial 49 finished with value: 0.9 and parameters: {'n_estimators': 86, 'max_depth': 14}. Best is trial 3 with value: 0.905.
[I 2023-10-25 22:08:06,069] A new study created in memory with name: no-name-f6cbfaac-9671-442f-b2a7-aceb5cbc7ee8
Random Forest Model Accuracy: 0.905
XGBoost Model Accuracy: 0.91