PDP/SHAP

3903 ワード

部品依存図（Partial Dependence Plot，PDP）

興味のある特性がターゲットにどのように影響するかを理解します.
各プロパティとターゲットの関係の表示
複雑なモデル–理解しにくいが、パフォーマンスは良好
シンプルなモデル–分かりやすいが、パフォーマンスが不足している

ランダムforest,boostingは特性重要度値を得ることができ,これはいくつかの特性がモデルの性能に重要であり,よく用いられる情報

である.
目標値が

の特性値に従ってどのように増減するかに関する情報はありません.
→ツリーモデルは部分依存図(Partial Dependence graph)を使用して個々のプロパティとターゲットの関係を表示できます.

1.PDP（1特性）可視化

!pip install PDPbox
# 이미지 화질
import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 144  

import sklearn
from pdpbox.pdp import pdp_isolate, pdp_plot

# 인코더와 분류모델 분리
encoder = pipe.named_steps['preprocessing']
X_train_encoded = encoder.fit_transform(X_train) # 학습데이터
X_val_encoded = encoder.transform(X_val) # 검증데이터

tree = pipe.named_steps['Classifier']  # 분류모델

# 인코딩한 데이터 fit
tree.fit(X_train_encoded, y_train)

# 관계를 구할 특성들
feature = ['AAA', 'BBB', 'CCC', 'DDD']

for i in range(len(feature)):
  isolated = pdp_isolate(
      model=tree, 
      dataset=X_val_encoded, 
      model_features=X_val_encoded.columns,
      feature=feature[i],
      grid_type='percentile',
      num_grid_points=10 
  )
  pdp_plot(isolated, feature_name=feature[i], figsize=(5, 5));

PDP（2特性）の可視化


from pdpbox.pdp import pdp_interact, pdp_interact_plot

features = ['AAA', 'BBB']    # AAA와 BBB의 관계

interaction = pdp_interact(
    model=boosting, 
    dataset=X_val_encoded,
    model_features=X_val.columns, 
    features=features
)

pdp_interact_plot(interaction, plot_type='grid', 
                  feature_names=features);

SHAP

単一観測値から属性寄与度(feature attribution)を計算する方法

1.forceplot（特定行の可視化）

!pip install shap
import warnings  
warnings.filterwarnings(action='ignore')

shap.initjs()

# 인코더와 분류모델 분리
encoder = pipe.named_steps['preprocessing']
X_train_encoded = encoder.fit_transform(X_train)   # 학습데이터
X_val_encoded = encoder.transform(X_val)           # 검증데이터

tree = pipe.named_steps['DT']

tree.fit(X_train_encoded, y_train)

row = X_train_encoded.iloc[[1]]  # 특정 row

explainer = shap.TreeExplainer(tree)
row_encoded = encoder.transform(row)
shap_values = explainer.shap_values(row_encoded)

shap.force_plot(
    base_value=explainer.expected_value[1], 
    shap_values=shap_values[1], 
    features=row, 
    link='logit' 
)

▼▼バイナリ分類時0がfalse/1がtrueの場合1の影響:shap value[1]
+/-への影響を表示

# 100개의 row
shap.initjs()
shap_values = explainer.shap_values(X_test.iloc[:100])
shap.force_plot(explainer.expected_value, shap_values, X_test.iloc[:100])

2.総括plot（完全特性）

# 1000개만 나타냄
shap_values = explainer.shap_values(X_train_encoded.iloc[:1000])
shap.summary_plot(shap_values[1], X_train_encoded.iloc[:1000])

# 바이올린 형태
shap.summary_plot(shap_values, X_test.iloc[:1000], plot_type="violin")

# bar 형태
shap.summary_plot(shap_values, X_test.iloc[:1000], plot_type="bar")
# 모델에 대한 영향력을 보여줌

Reference

この問題について(PDP/SHAP), 我々は、より多くの情報をここで見つけました https://velog.io/@ssulee0206/PDP-SHAP

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

Linuxでのsftpの構成方法

java.lang.IllegalStateException: You need to use a Theme.AppCompat theme (or descendant) with this activity.