K-Fold Cross Validation
K-Fold Cross Validation
1 Import libraries
2.1 Print Function MAE,MSE,R 2スコアを印刷する機能です. に必要な評価指標に従って修正して使用する. で定義されたprint functionスコアを計算する関数. ウォンで使えばいい.
得点の平均値だけでなく、倍ごとのスコアも見たい場合は、表示される場所*をアクティブにすることができます.
3.1 Model Selection and Settingを実行すると、K倍の平均値が返されます.
各foldの値を望む場合、*は表示される場所を明示します.
1 Import libraries
# import K-folds library
from sklearn.model_selection import KFold
# Evaluation Score Library
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
2 Define Functions for K-Folds2.1 Print Function
def print_function(scores):
score1, score2, score3, score4, score5, score6 = scores
print("------ MAE ------")
print("Train loss : %.4f" % score1)
print("Validation loss : %.4f" % score2)
print()
print("------ MSE ------")
print("Train loss : %.4f" % score3)
print("Validation loss : %.4f" % score4)
print()
print("------ R2 ------")
print("Train R2 score : %.4f" % score5)
print("Validation R2 score : %.4f" % score6)
print()
2.2 Calculating CV Score得点の平均値だけでなく、倍ごとのスコアも見たい場合は、表示される場所*をアクティブにすることができます.
def train_and_validation(train_data, validation_data, model, metrics, print_mode):
#
X_train, y_train = train_data
X_val, y_val = validation_data
model.fit(X_train, y_train)
train_pred = model.predict(X_train)
val_pred = model.predict(X_val)
score1 = metrics[0](y_train, train_pred)
score2 = metrics[0](y_val, val_pred)
score3 = metrics[1](y_train, train_pred)
score4 = metrics[1](y_val, val_pred)
score5 = metrics[2](y_train, train_pred)
score6 = metrics[2](y_val, val_pred)
scores = [score1, score2]
### *각 fold의 스코어를 보고 싶을 경우
# if print_mode:
# print_function(scores)
return np.array(scores)
3 K-Fold CV3.1 Model Selection and Setting
# Choose K
K = 5
# K-folds
kfcv = KFold(n_splits=K, shuffle=True, random_state=42)
# evalution Score
evalution = [mean_absolute_error, mean_squared_error, r2_score]
# models
## 원하는 모델을 설정한다
lr = LinearRegression(normalize=True)
lgbm = LGBMRegressor()
catb = CatBoostRegressor(silent=True)
xgbm = XGBRegressor(silent=True)
## 선택한 모델을 리스트로 생성한다.
models = [lr, lgbm, catb, xgbm]
print_mode = True
3.2 Run K-Fold CV各foldの値を望む場合、*は表示される場所を明示します.
# 몇몇 모델의 경우 장문의 메세지가 뜨는데 원치 않을경우 사용한다.
import warnings
warnings.filterwarnings("ignore")
for index, model in enumerate(models):
if print_mode:
print(f"\n====== Model {model} ======\n")
# generate a blank fold
folds = []
# model's scores
model_scores = []
X = X_iter
# Generate K-fold
for train_index, val_index in kfcv.split(X, y):
folds.append((train_index, val_index))
# fold 별 학습 및 검증
for i in range(K):
### * 각 fold의 값을 원할경우
# if print_mode:
# print(f"{i+1}th folds in {K} folds.")
train_index, val_index = folds[i]
X_train = X.iloc[train_index, :]
X_val = X.iloc[val_index, :]
y_train = y[train_index]
y_val = y[val_index]
# 모델별 score 산축
scores = train_and_validation((X_train, y_train), (X_val, y_val), model, evalution, print_mode)
model_scores.append(scores)
# mean of scores
model_scores = np.array(model_scores)
if print_mode:
print("Average Score in %dfolds." % K)
print_function(model_scores.mean(axis=0))
print("Done.")
Reference
この問題について(K-Fold Cross Validation), 我々は、より多くの情報をここで見つけました https://velog.io/@dohy426/K-Fold-Cross-Validationテキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。
Collection and Share based on the CC Protocol