マシン学習フレームxr-learn:decisionTree(決定ツリー)


decision Tree(決定ツリー)
        ,   ID3,C4.5 CART.        .
ID 3アルゴリズム
ID3     Ross Qulinlan  ,             .   C4.5   ,                 . 
ID3  ,       ,         S       features,         (         )         ,         ,         ,        ,     .
           :
    1.                 (+ or -),              ,             .
    2.        ,                     .
    3.          ,               ,       .
  :
    1.            ,
    2.                       ,    S   K   ,K        .
    3.             .
    4.           .
   :
    ID3(example, Target_Attribute, Attributes)
        create a root node for tree
        if all example are positive, Return sigle-node tree Root, with label = '+',(     )
        if all example are negetive, Return sigle-node tree Root, with label = '-',(     )
        if number of predicting attributes is empty,then return single node tree Root, with label = most common value of the target attribute in the example.(      ,     label)
        Other Begin:
            A 
H(s)=−Σinpi(x)log 2 pi(x)
    where S       ,X        ,x      ,p  (x)  x        .H(S)    , ID3    ,              ,           .

ID3          :
         (      ,IG(A) =    S  H(S) -     A             .        A     ,S          .
IG(A)=H(S)−Σt∈Tp(t)H(t)
H(S): S    .
T:   A        .
p(t):   t        S   .
H(t):   t    .

 ID3 ,            ,           ,      .
C 4.5アルゴリズム
C4.5 ID3   ,               ,      =      /(    .
IGRATE(A)=IG(A)H(S)
CARTアルゴリズム
          ,
ジニ(p)=Σk=1 Kpk(1−pk)=1−Σk=1 Kp 2 k
p_k  k    ,         D,      :
ジニ(D)=1−Σk=1 K(CKD)2
C_k:   k      .K     .
特徴Aの条件の下で、セットDのジニ指数は定義される.
ジニ(D,A)=D 1 D Gini(D 1)+D 2 DGini(D 2)
ビキニ指数が大きいほど、サンプルの不確実性が大きくなります.だからこのビキニ指数は小さいほどいいです.
Example
import tree
import numpy as np
from imp import reload
import pandas as pd
# list, np.adarray
x = [[1,1],[1,1],[1,0],[0,1],[0,1]]
y = [1,1,0,0,0]
decisionTree = tree.DecisionTreeClassifier()
decisionTree.fit(x,y)
#   
print(decisionTree.predict([[1,1],[1,0],[0,1]]))
#   
decisionTree.score([[1,1],[1,0],[0,1]],[1,1,0])

# DataFrame
df_x = pd.DataFrame(x,columns = ['no_surfacing','flippers'])
df_y = pd.DataFrame(y,columns = ['label'])
decisionTree.fit(x,y)
#   
print(decisionTree.predict([[1,1],[1,0],[0,1]]))
#   
decisionTree.score([[1,1],[1,0],[0,1]],[1,1,0])

ソースの住所:https://github.com/xiaorancs/xr-learn