サードパーティEDA(Agent、Pclass、Sex、Agent、Embarked)

4844 ワード

kaggle テキストリンク

Age

Agentプロパティの表示

エージェントの生存ヒストグラムを描く

生存者の中には年齢が小さい場合が多い

等級が高ければ高いほど、年上の人の割合が高くなる.

年を取るにつれて、生存率はどのくらいですか.

年齢を広げ、生存率を確認する

年齢が小さいほど生存率が高くなる

重要な年齢特徴データ

Pclass, Sex, Age

sebornのviolinplotを使って描いたすべてのもの

x軸は私たちが別々に見たいcase(Pclass、性)

y軸の分布を表示したい(Agent)

Agentの分布と存在するかどうかをPclassで区別します.

Sex,生存分布図

生存だけを見ると,各階級の年齢が小さいほど生存率が高くなる.

女児の生存率が高く、先に世話をすることができる.

Embarked

搭乗便

ランダム生存率

似たような生存率(Cが最も高い)

featureが強調されていないため、モデルに大きな影響を及ぼす未知数

splitを使用して確認

Figure(1):全体的にSが最も多い

Figure(2):CとQの男女の割合の差は多くなく、sの男性はもっと多い

Figure(3):生存確率Sの場合はかなり低い

Figure(4):Classに分裂し、Cが最も高い(乗客が多いためかもしれない).

3 rd級が多く、生存率が低い

Age

#Age feature살펴보기
print('제일 나이 많은 탑승객 : {:.1f} Years'.format(df_train['Age'].max()))
print('제일 어린 탑승객 : {:.1f} Years'.format(df_train['Age'].min()))
print("탑승객 평균 나이 : {:.1f} Years".format(df_train['Age'].mean()))

#생존 Age의 historgram 그리기

fig, ax = plt.subplots(1, 1, figsize=(9,5))
sns.kdeplot(df_train[df_train['Survived'] == 1]['Age'], ax=ax)
sns.kdeplot(df_train[df_train['Survived'] == 0]['Age'], ax=ax)
plt.legend(['Survived == 1', 'Survived == 0'] )
plt.show()

#Age distribution withing classes

plt.figure(figsize=(8,6))
df_train['Age'][df_train['Pclass'] == 1].plot(kind = 'kde')
df_train['Age'][df_train['Pclass'] == 2].plot(kind = 'kde')
df_train['Age'][df_train['Pclass'] == 3].plot(kind ='kde')

plt.xlabel('Age')
plt.title('Age Distribution within classes')
plt.legend(['1st Class', '2nd Class', '3nd Class'])

cummulate_survival_ratio = []
for i in range(1, 80):
    cummulate_survival_ratio.append(df_train[df_train['Age'] < i]['Survived'].sum() / len(df_train[df_train['Age'] < i]['Survived']))
plt.figure(figsize=(7,7))
plt.plot(cummulate_survival_ratio)
plt.title('Survival rate change dependings on range of Age', y=1.02)
plt.ylabel('Survival rate')
plt.xlabel('Range of Age(0~x)')
plt.show()

Pclass, Sex, Age

f, ax=plt.subplots(1,2,figsize=(18,8))
sns.violinplot("Pclass", "Age", hue = "Survived", data = df_train, scale = 'count', split=True, ax=ax[0])
ax[0].set_title('Pclass and Age vs Survived')
ax[0].set_yticks(range(0,110,10))
sns.violinplot("Sex","Age", hue="Survived", data=df_train, scale='count', split=True, ax=ax[1])
ax[1].set_title('Sex and Age vs Survived')
ax[1].set_yticks(range(0,110,10))
plt.show()

Embarked

f, ax = plt.subplots(1, 1, figsize=(7,7))
df_train[['Embarked', 'Survived']].groupby(['Embarked'], as_index=True).mean().sort_values(by='Survived', ascending=False).plot.bar(ax=ax)

f,ax = plt.subplots(2, 2, figsize=(20,15))
sns.countplot('Embarked', data=df_train, ax=ax[0,0])
ax[0,0].set_title('(1) No. of Passengers Boarded')
sns.countplot('Embarked', hue='Sex', data = df_train, ax=ax[0,1])
ax[0,1].set_title('(2) Male-Female Split for Embarked')
sns.countplot('Embarked', hue = 'Survived', data = df_train, ax=ax[1,0])
ax[1,0].set_title('(3) Embarked vs Survived')
sns.countplot('Embarked', hue='Pclass', data= df_train, ax=ax[1,1])
ax[1,1].set_title('(4) Embarked vs Pclass')
plt.subplots_adjust(wspace=0.2, hspace=0.5)
plt.show()

Reference

この問題について(サードパーティEDA(Agent、Pclass、Sex、Agent、Embarked)), 我々は、より多くの情報をここで見つけました https://velog.io/@qsdcfd/타이타닉EDAAge-Pclass-Sex-AgeEmbarked

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

[Leetcode]994. Rotting Oranges

反射によるオブジェクトの作成(パラメータ付きおよびパラメータなしの構築方法)