scikit-learn線形回帰モデルのscore関数は、戻り値が決定係数R^2である.

4031 ワード

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Linear Regressions.菵slearn.linear_model.Linear Regression
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.现skylearn.metrics.r 2_スコア

線形回帰のscore関数は、予測結果に対して算出された決定係数R^2を返します.

scikit-learn 线性回归模型的score函数，返回值是决定系数R^2_第1张图片

Linear Regressionのscore関数ソース:

def score(self, X, y, sample_weight=None):
        """Returns the coefficient of determination R^2 of the prediction.
        The coefficient R^2 is defined as (1 - u/v), where u is the residual
        sum of squares ((y_true - y_pred) ** 2).sum() and v is the total
        sum of squares ((y_true - y_true.mean()) ** 2).sum().
        The best possible score is 1.0 and it can be negative (because the
        model can be arbitrarily worse). A constant model that always
        predicts the expected value of y, disregarding the input features,
        would get a R^2 score of 0.0.
        Parameters
        ----------
        X : array-like, shape = (n_samples, n_features)
            Test samples.
        y : array-like, shape = (n_samples) or (n_samples, n_outputs)
            True values for X.
        sample_weight : array-like, shape = [n_samples], optional
            Sample weights.
        Returns
        -------
        score : float
            R^2 of self.predict(X) wrt. y.
        """

        from .metrics import r2_score
        return r2_score(y, self.predict(X), sample_weight=sample_weight,
                        multioutput='variance_weighted')

決定係数R^2
決定係数(coefficient odetermination)には、判定係数に訳された教材もあり、フィッティングの優度とも呼ばれる.
係数がyを反応させた変動のどれぐらいの割合がxの変動によって記述されるかを決定します.すなわち、変数Yを特性評価する変異のどれぐらいの割合があり、制御された引数Xによって説明されます.
フィッティングのメリットが大きいほど、xのyに対する解釈の度合いが高いことを示します.変数に対する解釈の度合いが高いほど、変数による変動は総変動の割合が高いです.観察点は回帰直線の近くに密集する.

データに対して線形回帰計算を行った後、対応する関数の係数を得ることができますが、この係数は方程式の結果に強い影響を与えるとどうやって分かりますか?
したがって、回帰式のフィット度を判断するために、coefficient of determination(決定係数)という方法を用いた.

scikit-learn 线性回归模型的score函数，返回值是决定系数R^2_第2张图片

は、 $SS_{res}$ が推定データ、つまり、回帰データと平均値の誤差です.

$SS_{tot}$ は、実際のデータと平均値の誤差です.

$SS_{res}$ は一般的に $SS_{tot}$ より小さいです.結果は普通0-1の間で $SS_{tot}$ です. データ確定後は常に固定値であり、推定が正確でないほど $SS_{res}$ が大きくなると $R^{2}$ は0に近いため、推定が正確であればあるほど1

に近い.

参考資料:
https://blog.csdn.net/grape875499765/article/details/78631435?locationNum=11&fps=1
https://blog.csdn.net/snowdroptulip/article/details/79022532

Eureka Serverシングルノードとマルチノード構成

ElasticSearch入門例(二)