ランダムな森アンバランスデータの処理

3942 ワード

ランダムな森アンバランスデータの処理

balancedにbalancedパラメータを加える

#         
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.feature_selection import SelectFromModel

iris = datasets.load_iris()
features = iris.data
target = iris.target
#    40 
features = features[40:, :]
target = target[40:]
#    
target = np.where((target == 0), 0, 1)
# balanced     balanced   
randomforest = RandomForestClassifier(random_state=0, n_jobs=-1, class_weight="balanced")
#             
model = randomforest.fit(features, target)
Discussion
A useful argument is balanced, wherein classes are automatically weighted inversely proptional to how frequently they appear in the data:
wj=nknj
wj=nknj
 
where  wjwj  is the weight to class j, n is the number of observations,  njnj  is the number of observations in class j, and k is the total number of classes.

ララベルの「レスキュー」ヘルパー機能は驚くべきものです

2つのブランチ間でmergeした場合の状態遷移について整理