Random Forest

2734 ワード

Forest Random Random テキストリンク

まず最初にDecision Treeについて理解する。なぜかというと、Decision TreeはRandom Forestの各ブロック（構成要素）だからだ。Decision Treeはnon-linear modelの一員である。

CART algorithm

Decision TreeではCARTアルゴリズム法に基づいて構成されていく。CARTアルゴリズムを簡単に説明すると、最初のルートノードに目的変数を最も良く分類する説明変数を採用する。この場合、最も良く分類する基準として、ジニ指標が使用される。例えば、目的変数がXとYの２価をとり、説明変数がA ,B,Cで１、２の２値をとる時、説明変数のそれぞれの値の時のGini impurity（一塊のデータの中から適当に一つピックアップしてそれの”真のラベル”が”Decision Treeのアルゴリズムによって弾き出されたラベル”と異なるかを数値化したもの。「gini impurity=0」は完璧に分類されていることを意味する。）を計算し、平均値が最も低くなるような変数をルートノードとして採用する。これを続けていき、ノードをどんどん決めていく。

Random Forest

前書き

Random ForestはDecision Treeを集めたものだ。集める理由は、Decision Tree単体でアルゴリズムを組もうとすると、train dataにoverfittingしてしまうからだ。さらに、noiseにも反応してしまう。よって、Decision Tree単体を利用するよりも、これらをCombineしensemble modelに仕上げた、Random Forestがより適しているのだ。

The random forest is a model made up many decision tree. Rather than just simply averaging the prediction of tree, this model uses 2 key concepts that give it the name random:
1. Random sampling of training data points when building trees.
2. Random subsets of features considered when splitting nodes

1. Random sampling of training data points when building trees.

When training, each tree in a random forest learns from a random sample of the data points. それぞれのDecision Treeにランダムに選択したサンプルを学習させる。Bootstrappingという手法が使用される。

2. Random subsets of features considered when splitting nodes

Only a subset of all the features are considered for splitting each node.　例えば、１６のFeaturesが存在し、そのうちの４つのみをsplitting the nodeのために使用するということ。

参考：
CARTアルゴリズムについて；
https://www.gixo.jp/blog/3980/

Gini impurityについて；
https://victorzhou.com/blog/gini-impurity/

Random Forestについて；
https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76

Author And Source

この問題について(Random Forest), 我々は、より多くの情報をここで見つけました https://qiita.com/kkawa031/items/a5c2c277cc2b6d6ecf65

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .

Pytestで例外が発生するかどうかをチェックする方法