sklearn機械学習における回帰モデルの簡単な使用コード記録
次のモデルは、ボストンの住宅価格データセットを使用してテストされます.
目次 準備 データセット をインポートする.分割データセット 評価指標関数 モデル Linear Models KNN SVM DecisionTree Random forest Bagging Xgboost Lightgbm Catboost GradientBoosting
Refercence
準備作業
データセットのインポート
分割データセット
評価指標関数
回帰モデルの良し悪しを評価するために使用される
モデル#モデル#
Linear Models
リニアモデルは一般的にベンチマークモデルとして使われています.つまり、あなたの他の複雑なモデルが料理をしても、それより料理を返すことはないでしょう.
出力結果:
KNN
K近傍アルゴリズムKernel ridge regression
SVM
ベクトルマシンSupport Vector Machineをサポート
DecisionTree
けっていじゅ
Random forest
Bagging
Xgboost
Lightgbm
Catboost
GradientBoosting
お腹が空いたらいっそ一緒に書いてしまおう
Refercence
Scikit-learn document Random forset refer
目次
準備作業
データセットのインポート
from sklearn import datasets #
boston = datasets.load_boston() #
print(boston.keys()) # ( ) ['data','target','feature_names','DESCR', 'filename']
print(boston.data.shape,boston.target.shape) # (506, 13) (506,)
print(boston.feature_names) # 13
print(boston.DESCR) # described
print(boston.filename) #
分割データセット
from sklearn.model_selection import train_test_split
# check data shape
print("boston.data.shape %s , boston.target.shape %s"%(boston.data.shape,boston.target.shape))
train = boston.data # sample
target = boston.target # target
#
X_train, x_test, y_train, y_true = train_test_split(train, target, test_size=0.2) # 20% ;80%
評価指標関数
回帰モデルの良し悪しを評価するために使用される
from sklearn import metrics
import numpy as np
def reg_calculate(true, prediction):
mse = metrics.mean_squared_error(true, prediction)
rmse = np.sqrt(mse)
mae = metrics.mean_absolute_error(true, prediction)
mape = np.mean(np.abs((true - prediction) / true)) * 100
r2 = metrics.r2_score(true, prediction)
rmsle = np.sqrt(metrics.mean_squared_log_error(true, prediction))
print("mse: {}, rmse: {}, mae: {}, mape: {}, r2: {}, rmsle: {}".format(mse, rmse, mae, mape, r2, rmsle))
# return mse, rmse, mae, mape, r2, rmsle
モデル#モデル#
Linear Models
リニアモデルは一般的にベンチマークモデルとして使われています.つまり、あなたの他の複雑なモデルが料理をしても、それより料理を返すことはないでしょう.
from sklearn.linear_model import LinearRegression #
from sklearn.linear_model import Ridge # Ridge ,
from sklearn.linear_model import Lasso # Lasso ,
linear = LinearRegression()
ridge = Ridge()
lasso = Lasso()
linear.fit(X_train, y_train)
ridge.fit(X_train, y_train)
lasso.fit(X_train, y_train)
y_pre_linear = linear.predict(x_test)
y_pre_ridge = ridge.predict(x_test)
y_pre_lasso = lasso.predict(x_test)
#
print("linear")
reg_calculate(y_true, y_pre_linear)
print("ridge")
reg_calculate(y_true, y_pre_ridge)
print("lasso")
reg_calculate(y_true, y_pre_lasso)
出力結果:
linear
mse: 31.240513455848852, rmse: 5.589321377041121, mae: 3.53633733472426, mape: 16.6595950646398, r2: 0.6614175896322294, rmsle: 0.21890383040918562
ridge
mse: 31.39335760236521, rmse: 5.602977565756016, mae: 3.5334602249253697, mape: 16.63728623401629, r2: 0.6597610759001087, rmsle: 0.21899426397078484
lasso
mse: 33.51784488799414, rmse: 5.789459809688132, mae: 3.7127882956101144, mape: 16.61875404887328, r2: 0.6367360373718366, rmsle: 0.2135345753220661
KNN
K近傍アルゴリズムKernel ridge regression
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor()
knn.fit(X_train, y_train)
y_pre_knn = knn.predict(x_test)
#
print("KNN")
reg_calculate(y_true, y_pre_knn)
KNN
mse: 43.8670431372549, rmse: 6.623219997648795, mae: 4.411764705882353, mape: 18.437381551851896, r2: 0.5245721802201037, rmsle: 0.23223282023582129
SVM
ベクトルマシンSupport Vector Machineをサポート
from sklearn import svm
regr = svm.SVR()
regr.fit(X_train, y_train)
y_pre_svm = regr.predict(x_test)
print("SVM")
reg_calculate(y_true, y_pre_knn)
SVM
mse: 43.8670431372549, rmse: 6.623219997648795, mae: 4.411764705882353, mape: 18.437381551851896, r2: 0.5245721802201037, rmsle: 0.23223282023582129
DecisionTree
けっていじゅ
from sklearn.tree import DecisionTreeRegressor
DT = DecisionTreeRegressor()
DT.fit(X_train, y_train)
y_pre_DT = DT.predict(x_test)
print("Decision Tree")
reg_calculate(y_true, y_pre_DT)
Decision Tree
mse: 26.693823529411766, rmse: 5.166606577765697, mae: 3.1813725490196085, mape: 15.562811056549139, r2: 0.710694284033029, rmsle: 0.1975892237980368
Random forest
from sklearn.ensemble import RandomForestRegressor
# from sklearn.pipeline import Pipeline
regr = RandomForestRegressor()
regr.fit(X_train, y_train)
y_pre_regr = regr.predict(x_test)
print("Decision Tree")
reg_calculate(y_true, y_pre_regr)
Decision Tree
mse: 11.207001999999997, rmse: 3.347686066524159, mae: 2.2008235294117653, mape: 10.90976874926566, r2: 0.8785393282501885, rmsle: 0.14116871196392253
Bagging
Xgboost
Lightgbm
Catboost
GradientBoosting
お腹が空いたらいっそ一緒に書いてしまおう
import xgboost as xg
import lightgbm as lgm
import catboost as cb
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import GradientBoostingRegressor
CB_Regressor=cb.CatBoostRegressor()
xg_Regressor=xg.XGBRegressor()
lgm_Regressor=lgm.LGBMRegressor()
bag_Regressor=BaggingRegressor()
gbd_Regressor=GradientBoostingRegressor()
CB_Regressor.fit(X_train, y_train)
xg_Regressor.fit(X_train, y_train)
lgm_Regressor.fit(X_train, y_train)
bag_Regressor.fit(X_train, y_train)
gbd_Regressor.fit(X_train, y_train)
y_pre_CB = CB_Regressor.predict(x_test)
y_pre_xg = xg_Regressor.predict(x_test)
y_pre_lgm = lgm_Regressor.predict(x_test)
y_pre_bag = bag_Regressor.predict(x_test)
y_pre_gbd = gbd_Regressor.predict(x_test)
print("CB")
reg_calculate(y_true, y_pre_CB)
print("XGBoost")
reg_calculate(y_true, y_pre_xg)
print("LGBM")
reg_calculate(y_true, y_pre_lgm)
print("Bagging")
reg_calculate(y_true, y_pre_bag)
print("GradientBoosting")
reg_calculate(y_true, y_pre_gbd)
CB
mse: 11.791834718420176, rmse: 3.433924099105887, mae: 2.119751585666254, mape: 9.993857387303114, r2: 0.872200953826718, rmsle: 0.13299161406349444
XGBoost
mse: 13.447662535973214, rmse: 3.667105471072957, mae: 2.3307185677921067, mape: 10.9129019218805, r2: 0.8542552124926828, rmsle: 0.14476785768682296
LGBM
mse: 11.641081825640502, rmse: 3.4119029625182047, mae: 2.1377168555798383, mape: 10.464024191664098, r2: 0.8738348027030942, rmsle: 0.1396198526477457
Bagging
mse: 12.581160784313727, rmse: 3.546993203308082, mae: 2.420392156862745, mape: 11.830690090356958, r2: 0.8636462953914766, rmsle: 0.15213713275055957
GradientBoosting
mse: 8.489139804224163, rmse: 2.9136128439146067, mae: 2.287601690996847, mape: 11.586313247266613, r2: 0.907995320853951, rmsle: 0.1372025274842797
Refercence
Scikit-learn document Random forset refer