GridSearchCVとは?どのように使いますか.
31969 ワード
GridSearchCV
こんにちは!GridSearchCVというモジュールを紹介します.GridSearchCVは機械学習においてモデル性能を向上させるための技術の一つである.
モデルのスーパーパラメータ値を含むリストを入力すると、各値の予測パフォーマンスが測定され、比較され、最適なスーパーパラメータ値が検索されます.
欠点は時間が長いことです.覚えておいてください.
早速、使い方をお見せします
例は私が直接使ったモデルで示しています.
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
xgb = XGBClassifier()
lgb = LGBMClassifier()
gbm = GradientBoostingClassifier()
cat = CatBoostClassifier()
lreg = LogisticRegression()
# 최적의 파라미터 값 찾아보기
from sklearn.model_selection import GridSearchCV
# XGB
param_xgb = {"max_depth": [10,30,50],
"min_child_weight" : [1,3,6,10],
"n_estimators": [200,300,500,1000],
}
# LGB
param_lgb = {'learning_rate' : [0.01,0.1,0.2,0.3,0.4,0.5],
"max_depth": [25, 50, 75],
"num_leaves": [100,300,500,900,1200],
'n_estimators' : [100, 200, 300,500,800,1000],
'learning_rate' : [0.01,0.1,0.2,0.3,0.4,0.5]
}
# GBM
param_gbm = {'max_depth' : [4,5,6,7,8,9,10],
'learning_rate' : [0.01,0.1,0.2,0.3,0.4,0.5],
'n_estimators' : [100,200,300,500],
}
# CAT
param_cat = {'depth':[6,4,5,7,8,9,10],
'iterations':[250,100,500,1000],
'learning_rate':[0.001,0.01,0.1,0.2,0.3],
'l2_leaf_reg':[2,5,10,20,30],
'border_count':[254],
}
# Logistic
param_lreg = { 'C' : [1.0, 3, 5, 7, 10],
'max_iter': [50, 200, 100, 300, 500,700, 800]
}
gscv_xgb = GridSearchCV (estimator = xgb, param_grid = param_xgb, scoring ='accuracy', cv = 3, refit=True, n_jobs=1, verbose=2)
gscv_lgb = GridSearchCV (estimator = lgb, param_grid = param_lgb, scoring ='accuracy', cv = 3, refit=True, n_jobs=1, verbose=2)
gscv_gbm = GridSearchCV (estimator = gbm, param_grid = param_gbm, scoring ='accuracy', cv = 3, refit=True, n_jobs=1, verbose=2)
gscv_cat = GridSearchCV (estimator = cat, param_grid = param_cat, scoring ='accuracy', cv = 3, refit=True, n_jobs=1, verbose=2)
gscv_lreg = GridSearchCV (estimator = lreg, param_grid = param_lreg, scoring ='accuracy', cv = 3, refit=True, n_jobs=1, verbose=2)
gscv_xgb.fit(trainX, trainY)
gscv_lgb.fit(trainX, trainY)
gscv_gbm.fit(trainX, trainY)
gscv_cat.fit(trainX, trainY)
gscv_lreg.fit(trainX, trainY)
# time : 49mins (학습시간이 49분이나 걸렸습니다. 파라미터 값을 더 많이 설정한다면 더 오래 걸리겠죠?)
Ex) cv = StratifiedKFold(n_splits=5, shuffle = True, random_state=42)
print("="*30)
print('XGB 파라미터: ', gscv_xgb.best_params_)
print('XGB 예측 정확도: {:.4f}'.format(gscv_xgb.best_score_))
print("="*30)
print('LGB 파라미터: ', gscv_lgb.best_params_)
print('LGB 예측 정확도: {:.4f}'.format(gscv_lgb.best_score_))
print("="*30)
print('GBM 파라미터: ', gscv_gbm.best_params_)
print('GBM 예측 정확도: {:.4f}'.format(gscv_gbm.best_score_))
print("="*30)
print('CAT 파라미터: ', gscv_cat.best_params_)
print('CAT 예측 정확도: {:.4f}'.format(gscv_cat.best_score_))
print("="*30)
print('Lreg 파라미터: ', gscv_lreg.best_params_)
print('Lreg 예측 정확도: {:.4f}'.format(gscv_lreg.best_score_))
print("="*30)
# 아래와 같은 결과값을 보여줍니다.
==============================
XGB 파라미터: {'max_depth': 10, 'min_child_weight': 3, 'n_estimators': 200}
XGB 예측 정확도: 0.9279
==============================
LGB 파라미터: {'learning_rate': 0.1, 'max_depth': 25, 'n_estimators': 1000, 'num_leaves': 100}
LGB 예측 정확도: 0.9308
==============================
GBM 파라미터: {'learning_rate': 0.2, 'max_depth': 4, 'n_estimators': 200}
GMB 예측 정확도: 0.9308
==============================
CAT 파라미터: {'border_count': 254, 'depth': 5, 'iterations': 500, 'l2_leaf_reg': 5, 'learning_rate': 0.1}
CAT 예측 정확도: 0.9350
==============================
Lreg 파라미터: {'C': 10, 'max_iter': 100}
Lreg 예측 정확도: 0.8400
==============================
GridSearchCVはモデル性能を向上させる方法の一つである.モデルのパフォーマンスが悪い場合は、GridSearchCV:)を考慮してください.より詳細な情報が必要なユーザーへのリンク
Reference
この問題について(GridSearchCVとは?どのように使いますか.), 我々は、より多くの情報をここで見つけました https://velog.io/@hyunicecream/GridSearchCV란-어떻게-사용할까テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。
Collection and Share based on the CC Protocol