マシン学習フレームxr-learn:decisionTree(決定ツリー)
3266 ワード
decision Tree(決定ツリー)
CARTアルゴリズム
ジニ(D,A)=D 1 D Gini(D 1)+D 2 DGini(D 2)
ビキニ指数が大きいほど、サンプルの不確実性が大きくなります.だからこのビキニ指数は小さいほどいいです.
Example
, ID3,C4.5 CART. .
ID 3アルゴリズムID3 Ross Qulinlan , . C4.5 , .
ID3 , , S features, ( ) , , , , .
:
1. (+ or -), , .
2. , .
3. , , .
:
1. ,
2. , S K ,K .
3. .
4. .
:
ID3(example, Target_Attribute, Attributes)
create a root node for tree
if all example are positive, Return sigle-node tree Root, with label = '+',( )
if all example are negetive, Return sigle-node tree Root, with label = '-',( )
if number of predicting attributes is empty,then return single node tree Root, with label = most common value of the target attribute in the example.( , label)
Other Begin:
A
H(s)=−Σinpi(x)log 2 pi(x) where S ,X ,x ,p (x) x .H(S) , ID3 , , .
ID3 :
( ,IG(A) = S H(S) - A . A ,S .
IG(A)=H(S)−Σt∈Tp(t)H(t)H(S): S .
T: A .
p(t): t S .
H(t): t .
ID3 , , , .
C 4.5アルゴリズムC4.5 ID3 , , = /( .
IGRATE(A)=IG(A)H(S)CARTアルゴリズム
,
ジニ(p)=Σk=1 Kpk(1−pk)=1−Σk=1 Kp 2 kp_k k , D, :
ジニ(D)=1−Σk=1 K(CKD)2C_k: k .K .
特徴Aの条件の下で、セットDのジニ指数は定義される.ジニ(D,A)=D 1 D Gini(D 1)+D 2 DGini(D 2)
ビキニ指数が大きいほど、サンプルの不確実性が大きくなります.だからこのビキニ指数は小さいほどいいです.
Example
import tree
import numpy as np
from imp import reload
import pandas as pd
# list, np.adarray
x = [[1,1],[1,1],[1,0],[0,1],[0,1]]
y = [1,1,0,0,0]
decisionTree = tree.DecisionTreeClassifier()
decisionTree.fit(x,y)
#
print(decisionTree.predict([[1,1],[1,0],[0,1]]))
#
decisionTree.score([[1,1],[1,0],[0,1]],[1,1,0])
# DataFrame
df_x = pd.DataFrame(x,columns = ['no_surfacing','flippers'])
df_y = pd.DataFrame(y,columns = ['label'])
decisionTree.fit(x,y)
#
print(decisionTree.predict([[1,1],[1,0],[0,1]]))
#
decisionTree.score([[1,1],[1,0],[0,1]],[1,1,0])
ソースの住所:https://github.com/xiaorancs/xr-learn