データプリプロセッシング--フィーチャーのスケーリング


1.class sklearn.preprocessing.MinMaxScaler(feature_range=(0,1)、copy=True)は、各フィーチャーを所定の範囲にスケーリングすることによって行われる.この推定器(estimator)は、各特徴を個別にスケーリングおよび変換し、0〜1のような所定の範囲内に数値を落とす.MinMaxScalerクラスのパラメータは次のとおりです:feature_range : tuple (min, max), default=(0, 1) copy : boolean, optional, default True Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array). MinMaxScaleクラスのプロパティは、data_です.min_ : ndarray, shape (n_features,) Per feature minimum seen in the data data_max_ : ndarray, shape (n_features,) Per feature maximum seen in the data data_range_ : ndarray, shape (n_features,) Per feature range (data_max_ - data_min_) seen in the data
from sklearn.preprocessing import MinMaxScaler

data = [[1, 4, 2], [18, -1, 2], [4, 7, 8], [-4, 2, 10]]
scaler = MinMaxScaler()
print(scaler.fit(data))
print('----------------')
#         
print(scaler.data_max_)   
print('----------------')
#        
print(scaler.transform(data))  
print('----------------')
#              ([2, 2,2]),                        
print(scaler.transform([[2, 2, 2]]))  

出力:
MinMaxScaler(copy=True, feature_range=(0, 1))
----------------
[18.  7. 10.]
----------------
[[0.22727273 0.625      0.        ]
 [1.         0.         0.        ]
 [0.36363636 1.         0.75      ]
 [0.         0.375      1.        ]]
----------------
[[0.27272727 0.375      0.        ]]

scikit-learn公式ドキュメントリンク:https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler
2.class sklearn.preprocessing.StandardScale(copy=True,with_mean=True,with_std=True)Standardize features by removing the mean and scaling to unit varianceは、特徴値の分布を標準正規分布に変換します.
The standard score of a sample x is calculated as:
z = (x - u)/s where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False. パラメータ:with_mean : boolean, True by default
If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
with_std : boolean, True by default
If True, scale the data to unit variance (or equivalently, unit standard deviation).
copy : boolean, optional, default True
If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.
属性:
n_samples_seen__ : int The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. サンプル数、patial_fit増加
mean_ : array of floats with shape [n_features] The mean value for each feature in the training set. 各フィーチャーの平均値
var_ : array of floats with shape [nfeatures] The variance for each feature in the training set. Used to compute scale各フィーチャーの分散
scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. スケール、標準偏差
from sklearn.preprocessing import StandardScaler
import numpy as np

x=np.arange(10).reshape(5,2)
ss=StandardScaler()
ss.fit(x) 
print(x)
print('----------------------')
print(ss.n_samples_seen_ )
print('----------------------')
print(ss.mean_)   #        
print('----------------------')
print(ss.var_)    #       
print('----------------------')
print(ss.scale_)
x=ss.fit_transform(x)
print(x)

出力:
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
----------------------
5
----------------------
[4. 5.]
----------------------
[8. 8.]
----------------------
[2.82842712 2.82842712]
[[-1.41421356 -1.41421356]
 [-0.70710678 -0.70710678]
 [ 0.          0.        ]
 [ 0.70710678  0.70710678]
 [ 1.41421356  1.41421356]]


scikit-learn公式ドキュメントリンク:https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler