ランダムな森アンバランスデータの処理
3942 ワード
ランダムな森アンバランスデータの処理
balancedにbalancedパラメータを加える
#
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.feature_selection import SelectFromModel
iris = datasets.load_iris()
features = iris.data
target = iris.target
# 40
features = features[40:, :]
target = target[40:]
#
target = np.where((target == 0), 0, 1)
# balanced balanced
randomforest = RandomForestClassifier(random_state=0, n_jobs=-1, class_weight="balanced")
#
model = randomforest.fit(features, target)
Discussion
A useful argument is balanced, wherein classes are automatically weighted inversely proptional to how frequently they appear in the data:
wj=nknj
wj=nknj
where wjwj is the weight to class j, n is the number of observations, njnj is the number of observations in class j, and k is the total number of classes.