sklearn機械学習における回帰モデルの簡単な使用コード記録


次のモデルは、ボストンの住宅価格データセットを使用してテストされます.
目次
  • 準備
  • データセット
  • をインポートする.
  • 分割データセット
  • 評価指標関数
  • モデル
  • Linear Models
  • KNN
  • SVM
  • DecisionTree
  • Random forest
  • Bagging
  • Xgboost
  • Lightgbm
  • Catboost
  • GradientBoosting

  • Refercence


  • 準備作業
    データセットのインポート
    from sklearn import datasets  #    
    
    boston = datasets.load_boston()  #          
    print(boston.keys())  #    (  )     ['data','target','feature_names','DESCR', 'filename'] 
    print(boston.data.shape,boston.target.shape)  #         (506, 13) (506,)
    print(boston.feature_names)  #            13 
    print(boston.DESCR)  # described            
    print(boston.filename)  #      
    

    分割データセット
    from sklearn.model_selection import train_test_split
    # check data shape
    print("boston.data.shape %s , boston.target.shape %s"%(boston.data.shape,boston.target.shape))
    train = boston.data  # sample
    target = boston.target  # target
    #            
    X_train, x_test, y_train, y_true = train_test_split(train, target, test_size=0.2)  # 20%   ;80%   
    

    評価指標関数
    回帰モデルの良し悪しを評価するために使用される
    from sklearn import metrics
    import numpy as np
    
    def reg_calculate(true, prediction):
        mse = metrics.mean_squared_error(true, prediction)
        rmse = np.sqrt(mse)
        mae = metrics.mean_absolute_error(true, prediction)
        mape = np.mean(np.abs((true - prediction) / true)) * 100
        r2 = metrics.r2_score(true, prediction)
        rmsle = np.sqrt(metrics.mean_squared_log_error(true, prediction))
        print("mse: {}, rmse: {}, mae: {}, mape: {}, r2: {}, rmsle: {}".format(mse, rmse, mae, mape, r2, rmsle))
        # return mse, rmse, mae, mape, r2, rmsle
      
    

    モデル#モデル#
    Linear Models
    リニアモデルは一般的にベンチマークモデルとして使われています.つまり、あなたの他の複雑なモデルが料理をしても、それより料理を返すことはないでしょう.
    from sklearn.linear_model import LinearRegression  #         
    from sklearn.linear_model import Ridge  #       Ridge  ,   
    from sklearn.linear_model import Lasso  #       Lasso  ,       
    
    
    linear = LinearRegression()  
    ridge = Ridge()
    lasso = Lasso()
    
    linear.fit(X_train, y_train)
    ridge.fit(X_train, y_train)
    lasso.fit(X_train, y_train)
    
    y_pre_linear = linear.predict(x_test)
    y_pre_ridge = ridge.predict(x_test)
    y_pre_lasso = lasso.predict(x_test)
    
    #     
    print("linear")
    reg_calculate(y_true, y_pre_linear)
    print("ridge")
    reg_calculate(y_true, y_pre_ridge)
    print("lasso")
    reg_calculate(y_true, y_pre_lasso)
    

    出力結果:
    linear
    mse: 31.240513455848852, rmse: 5.589321377041121, mae: 3.53633733472426, mape: 16.6595950646398, r2: 0.6614175896322294, rmsle: 0.21890383040918562
    ridge
    mse: 31.39335760236521, rmse: 5.602977565756016, mae: 3.5334602249253697, mape: 16.63728623401629, r2: 0.6597610759001087, rmsle: 0.21899426397078484
    lasso
    mse: 33.51784488799414, rmse: 5.789459809688132, mae: 3.7127882956101144, mape: 16.61875404887328, r2: 0.6367360373718366, rmsle: 0.2135345753220661
    

    KNN
    K近傍アルゴリズムKernel ridge regression
    from sklearn.neighbors import KNeighborsRegressor 
    knn = KNeighborsRegressor()
    knn.fit(X_train, y_train)
    y_pre_knn = knn.predict(x_test)
    #     
    print("KNN")
    reg_calculate(y_true, y_pre_knn)
    
    KNN
    mse: 43.8670431372549, rmse: 6.623219997648795, mae: 4.411764705882353, mape: 18.437381551851896, r2: 0.5245721802201037, rmsle: 0.23223282023582129
    

    SVM
    ベクトルマシンSupport Vector Machineをサポート
    from sklearn import svm
    regr = svm.SVR()
    regr.fit(X_train, y_train)
    y_pre_svm = regr.predict(x_test)
    print("SVM")
    reg_calculate(y_true, y_pre_knn)
    
    SVM
    mse: 43.8670431372549, rmse: 6.623219997648795, mae: 4.411764705882353, mape: 18.437381551851896, r2: 0.5245721802201037, rmsle: 0.23223282023582129
    

    DecisionTree
    けっていじゅ
    from sklearn.tree import DecisionTreeRegressor
    DT = DecisionTreeRegressor()
    DT.fit(X_train, y_train)
    
    y_pre_DT = DT.predict(x_test)
    print("Decision Tree")
    reg_calculate(y_true, y_pre_DT)
    
    
    Decision Tree
    mse: 26.693823529411766, rmse: 5.166606577765697, mae: 3.1813725490196085, mape: 15.562811056549139, r2: 0.710694284033029, rmsle: 0.1975892237980368
    

    Random forest
    from sklearn.ensemble import RandomForestRegressor
    # from sklearn.pipeline import Pipeline
    regr = RandomForestRegressor()
    regr.fit(X_train, y_train)
    y_pre_regr = regr.predict(x_test)
    print("Decision Tree")
    reg_calculate(y_true, y_pre_regr)
    
    Decision Tree
    mse: 11.207001999999997, rmse: 3.347686066524159, mae: 2.2008235294117653, mape: 10.90976874926566, r2: 0.8785393282501885, rmsle: 0.14116871196392253
    

    Bagging
    Xgboost
    Lightgbm
    Catboost
    GradientBoosting
    お腹が空いたらいっそ一緒に書いてしまおう
    import xgboost as xg
    import lightgbm as lgm
    import catboost as cb
    from sklearn.ensemble import BaggingRegressor
    from sklearn.ensemble import GradientBoostingRegressor
    
    CB_Regressor=cb.CatBoostRegressor()
    xg_Regressor=xg.XGBRegressor()
    lgm_Regressor=lgm.LGBMRegressor()
    bag_Regressor=BaggingRegressor()
    gbd_Regressor=GradientBoostingRegressor()
    
    CB_Regressor.fit(X_train, y_train)
    xg_Regressor.fit(X_train, y_train)
    lgm_Regressor.fit(X_train, y_train)
    bag_Regressor.fit(X_train, y_train)
    gbd_Regressor.fit(X_train, y_train)
    
    y_pre_CB = CB_Regressor.predict(x_test)
    y_pre_xg = xg_Regressor.predict(x_test)
    y_pre_lgm = lgm_Regressor.predict(x_test)
    y_pre_bag = bag_Regressor.predict(x_test)
    y_pre_gbd = gbd_Regressor.predict(x_test)
    
    print("CB")
    reg_calculate(y_true, y_pre_CB)
    print("XGBoost")
    reg_calculate(y_true, y_pre_xg)
    print("LGBM")
    reg_calculate(y_true, y_pre_lgm)
    print("Bagging")
    reg_calculate(y_true, y_pre_bag)
    print("GradientBoosting")
    reg_calculate(y_true, y_pre_gbd)
    
    CB
    mse: 11.791834718420176, rmse: 3.433924099105887, mae: 2.119751585666254, mape: 9.993857387303114, r2: 0.872200953826718, rmsle: 0.13299161406349444
    XGBoost
    mse: 13.447662535973214, rmse: 3.667105471072957, mae: 2.3307185677921067, mape: 10.9129019218805, r2: 0.8542552124926828, rmsle: 0.14476785768682296
    LGBM
    mse: 11.641081825640502, rmse: 3.4119029625182047, mae: 2.1377168555798383, mape: 10.464024191664098, r2: 0.8738348027030942, rmsle: 0.1396198526477457
    Bagging
    mse: 12.581160784313727, rmse: 3.546993203308082, mae: 2.420392156862745, mape: 11.830690090356958, r2: 0.8636462953914766, rmsle: 0.15213713275055957
    GradientBoosting
    mse: 8.489139804224163, rmse: 2.9136128439146067, mae: 2.287601690996847, mape: 11.586313247266613, r2: 0.907995320853951, rmsle: 0.1372025274842797
    

    Refercence
    Scikit-learn document Random forset refer