ランダムな森アンバランスデータの処理


ランダムな森アンバランスデータの処理


balancedにbalancedパラメータを加える
#         
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.feature_selection import SelectFromModel
​
iris = datasets.load_iris()
features = iris.data
target = iris.target
#    40 
features = features[40:, :]
target = target[40:]
#    
target = np.where((target == 0), 0, 1)
# balanced     balanced   
randomforest = RandomForestClassifier(random_state=0, n_jobs=-1, class_weight="balanced")
#             
model = randomforest.fit(features, target)
Discussion
A useful argument is balanced, wherein classes are automatically weighted inversely proptional to how frequently they appear in the data:
wj=nknj
wj=nknj
 
where  wjwj  is the weight to class j, n is the number of observations,  njnj  is the number of observations in class j, and k is the total number of classes.