Ridge

7270 ワード

percentile One-Hot Encoding feature selection Ridge distribution plot テキストリンク

学習内容
One hot encoding

pd.get_dummies(data, prefix='X') #칼럼설정 안하면 모든 categorical 칼럼에 대해 one-hot-encoding
pd.get_dummies(data, drop_first = True) #더미코딩

encoder = OneHotEncoder(use_cat_names = True)

#둘 차이는 get_dummies는 한번에 categorical을 몰아넣음
#encoder는 원래 순서대로 변화해주는듯

Pandas profiling

import pandas_profiling
from pandas_profiling import ProfileReport

df.profile_report() #둘 다 같은 결과
ProfileReport(df)

Distribution plot

displot
distplot
# 비슷한데 좀 차이있음, displot은 변화가 잘 안되는느낌

np.パーセント以上の値を削除

df = df[(df['price'] >= np.percentile(df['price'], 0.05)) & 
        (df['price'] <= np.percentile(df['price'], 99.5))]

K best feature selection

from sklearn.feature_selection import f_regression, SelectKBest

selector = SelectKBest(score_func = f_regression, k = 10) #f statistic을 통해 p-value를 계산
mark = selector.get_support() #선택된 특성을 불리언 값으로 표시해주어 어떤 특성이 선택되었는지 확인 가능

all_names = df.columns
select_names = all_names[mark]

selector.fit_transform(X_train, y_train) #selecting
selector.transform(X_test)

Ridge

from sklearn.linear_model import LinearRegression, Ridge, RidgeCV
from sklearn.metrics import mean_absolute_error, r2_score

ridge = Ridge(alphas = alpha, normalize = True)

RidgeCV

alphas = [0, 0.001, 0.01, 0.1, 1]

ridge = RidgeCV(alphas = alphas, normalize = True, cv = 5)
ridge.fit(X_train_selected, y_train)
print('best alpha : ', ridge.alpha_)
print('best score : ', ridge.best_score_)

Reference

この問題について(Ridge), 我々は、より多くの情報をここで見つけました https://velog.io/@tjddyd1592/407-Ridge

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

Android6.0動的アクセス権(アクセス権プロンプトボックスの許可と拒否ステータス情報を取得)

Typescript優先パラメータ