Introduction_Home_Credit_Default_Risk_Competition_load_data


Home Credit Default


*Goal
The historical loan application is used data to predict probability of replaying a loan
*Supervised classification task

Data(Home Credit)


  • application_train/application_test

  • main data: each loan application

  • Every loan: SK_ID_CURR

  • Target : 0 or 1

  • bureau
  • Multiple previous of client credits

  • bureau_balance

  • monthly data of previous credits

  • rows

  • previous_application
  • previous loans data
  • feature: SK_ID_PREV

  • POS_CASH_BALANCE
  • monthly previous of sale or cash loan

  • credit_card_bbalance

  • monthly data of credit cards clients

  • single cards in many rows

  • installments_payment

  • payment history for previous loans

  • made payment and missed payment
  • Metric


    ROC AUC

  • ROC: True positive rate versus the false positive rate

  • AUC: the area under the ROC curve

  • ROC AUC

  • have probability between 0 and 1

  • represent a better model performance
  • Code

    # imports library
    ## numpy and pandas for data manipulation
    
    import numpy as np
    import pandas as pd
    
    #sklearn preprocessing for dealing with categorical variables
    from sklearn.preprocessing import LabelEncoder
    
    #File system manangment
    
    import os
    
    #Suppress warnings
    
    import warnings
    warnings.filterwarnings('ignore')
    
    #matplotlib and seaborn for plotting
    
    import matplotlib.pyplot as plt
    import seaborn as sns
    #connect drive
    
    from google.colab import drive 
    drive.mount('/content/gdrive/')
    #Training data
    
    app_train = pd.read_csv('./gdrive/MyDrive//home-credit-default-risk/application_train.csv')
    print('Train data shape:', app_train.shape)  :
    app_train.head(10)
    	SK_ID_CURR	TARGET	NAME_CONTRACT_TYPE	CODE_GENDER	FLAG_OWN_CAR	FLAG_OWN_REALTY	CNT_CHILDREN	AMT_INCOME_TOTAL	AMT_CREDIT	AMT_ANNUITY	...	FLAG_DOCUMENT_18	FLAG_DOCUMENT_19	FLAG_DOCUMENT_20	FLAG_DOCUMENT_21	AMT_REQ_CREDIT_BUREAU_HOUR	AMT_REQ_CREDIT_BUREAU_DAY	AMT_REQ_CREDIT_BUREAU_WEEK	AMT_REQ_CREDIT_BUREAU_MON	AMT_REQ_CREDIT_BUREAU_QRT	AMT_REQ_CREDIT_BUREAU_YEAR
    0	100002	1	Cash loans	M	N	Y	0	202500.0	406597.5	24700.5	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	1.0
    1	100003	0	Cash loans	F	N	N	0	270000.0	1293502.5	35698.5	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
    2	100004	0	Revolving loans	M	Y	Y	0	67500.0	135000.0	6750.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
    3	100006	0	Cash loans	F	N	Y	0	135000.0	312682.5	29686.5	...	0	0	0	0	NaN	NaN	NaN	NaN	NaN	NaN
    4	100007	0	Cash loans	M	N	Y	0	121500.0	513000.0	21865.5	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
    5	100008	0	Cash loans	M	N	Y	0	99000.0	490495.5	27517.5	...	0	0	0	0	0.0	0.0	0.0	0.0	1.0	1.0
    6	100009	0	Cash loans	F	Y	Y	1	171000.0	1560726.0	41301.0	...	0	0	0	0	0.0	0.0	0.0	1.0	1.0	2.0
    7	100010	0	Cash loans	M	Y	Y	0	360000.0	1530000.0	42075.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
    8	100011	0	Cash loans	F	N	Y	0	112500.0	1019610.0	33826.5	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	1.0
    9	100012	0	Revolving loans	M	N	Y	0	135000.0	405000.0	20250.0	...	0	0	0	0	NaN	NaN	NaN	NaN	NaN	NaN
    10 rows × 122 columns
     #Test data : target
    app_test = pd.read_csv('./gdrive/MyDrive//home-credit-default-risk/application_test.csv')
    print('Train data shape:', app_test.shape)
    app_test.head(10)
    	SK_ID_CURR	NAME_CONTRACT_TYPE	CODE_GENDER	FLAG_OWN_CAR	FLAG_OWN_REALTY	CNT_CHILDREN	AMT_INCOME_TOTAL	AMT_CREDIT	AMT_ANNUITY	AMT_GOODS_PRICE	...	FLAG_DOCUMENT_18	FLAG_DOCUMENT_19	FLAG_DOCUMENT_20	FLAG_DOCUMENT_21	AMT_REQ_CREDIT_BUREAU_HOUR	AMT_REQ_CREDIT_BUREAU_DAY	AMT_REQ_CREDIT_BUREAU_WEEK	AMT_REQ_CREDIT_BUREAU_MON	AMT_REQ_CREDIT_BUREAU_QRT	AMT_REQ_CREDIT_BUREAU_YEAR
    0	100001	Cash loans	F	N	Y	0	135000.0	568800.0	20560.5	450000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
    1	100005	Cash loans	M	N	Y	0	99000.0	222768.0	17370.0	180000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	3.0
    2	100013	Cash loans	M	Y	Y	0	202500.0	663264.0	69777.0	630000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	1.0	4.0
    3	100028	Cash loans	F	N	Y	2	315000.0	1575000.0	49018.5	1575000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	3.0
    4	100038	Cash loans	M	Y	N	1	180000.0	625500.0	32067.0	625500.0	...	0	0	0	0	NaN	NaN	NaN	NaN	NaN	NaN
    5	100042	Cash loans	F	Y	Y	0	270000.0	959688.0	34600.5	810000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	1.0	2.0
    6	100057	Cash loans	M	Y	Y	2	180000.0	499221.0	22117.5	373500.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	1.0
    7	100065	Cash loans	M	N	Y	0	166500.0	180000.0	14220.0	180000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	2.0
    8	100066	Cash loans	F	N	Y	0	315000.0	364896.0	28957.5	315000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	5.0
    9	100067	Cash loans	F	Y	Y	1	162000.0	45000.0	5337.0	45000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0