titanicの解法
Titanic Data Science Solutionsと最初のKaggleプロジェクトであるタイタニック号のコード構想を参考にした.
問題提出 データの導入と整理 特徴工事 Ageの処理 Fareの処理 Embarkedの処理,'S'最も一般的な sexの0-1 乗船港Embarkedはone-hot符号化を行い、同時に元の変数を 削除する.客室等級Pclassはone-hot符号化を行い、同時に元の変数を 削除する.名前 名前を整理する意味 名前をone-hot符号化するとともに、元の変数を 削除する.所在家庭サイズ(船上の) は空の値が多すぎるCabinを除去し,情報が乱雑なTicket,PassengerIdも不要である. train、test を分離機械学習 計算スコア 結果 いくつかの発見 問題提起
欠損値を補う過程で、AgeとFareの整列はACC点数の 上昇に来ない.はAgeとFareをone−hot符号化できるが,ACCスコアは次の発見と同様に低下した.
ステップ
問題提起
どんな人がタイタニック号で生きやすいですか?
データのインポートと整理
本明細書で使用するデータはKaggleにあり、次いでpandas
を用いてデータのインポートが行われる.import pandas as pd
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
欠落値の存在はデータの使用に大きく影響するため、.info()
を使用してデータの欠落値を確認します.train_df.info()
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
trainデータセットでは、Age
、Cabin
、およびEmbarked
がデータの補完を行う必要がある.test_df.info()
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
PassengerId 418 non-null int64
Pclass 418 non-null int64
Name 418 non-null object
Sex 418 non-null object
Age 332 non-null float64
SibSp 418 non-null int64
Parch 418 non-null int64
Ticket 418 non-null object
Fare 417 non-null float64
Cabin 91 non-null object
Embarked 418 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 36.0+ KB
testデータセットでは、Age
、Fare
、およびCabin
がデータの補完を行う必要がある.
データセットを結合し、同時に2つのデータセットを洗浄します.combine = pd.concat([train_df, test_df], axis = 0)
フィーチャーエンジニアリング import numpy as np
#
np.random.seed()
Ageの処理
ランダム生産正規分布数列、平均値:np.mean()
分散np.std()
Age_null = combine[combine['Age'].isna()]
Age_null['Age'] = np.random.normal(np.mean(combine['Age']), np.std(combine['Age'])\
, (Age_null.shape[0], 1))
#Age_null['Age'] = Age_null['Age'].apply(round)
Age_notnull = combine[combine['Age'].notna()]
combine = pd.concat([Age_null, Age_notnull], axis = 0)
Fareの処理
Fare_null = combine[combine['Fare'].isna()]
Fare_null['Fare'] = np.random.normal(np.mean(combine['Fare']), np.std(combine['Fare']), \
(Fare_null.shape[0], 1))
#Fare_null['Fare'] = Fare_null['Fare'].apply(round)
Fare_notnull = combine[combine['Fare'].notna()]
combine = pd.concat([Fare_null, Fare_notnull], axis = 0)
Embarkedの処理,'S'が最も一般的である
combine['Embarked'] = combine['Embarked'].fillna('S')
sexの0-1
sex = {'male': 1, 'female': 0}
combine['Sex'] = combine['Sex'].map(sex)
乗船港Embarkedはone-hotコードを行い、元の変数を削除します
data_Embark = pd.get_dummies(combine['Embarked'], prefix = 'Embarked')
combine = pd.concat([data_Embark, combine], axis = 1)
combine = combine.drop('Embarked', axis = 1)
客室等級Pclassはone-hot符号化を行い、同時に元の変数を削除する
data_Pclass = pd.get_dummies(combine['Pclass'], prefix = 'Pclass')
combine = pd.concat([data_Pclass, combine], axis = 1)
combine = combine.drop('Pclass', axis = 1)
名前
名前の意味を整理する
combine['NameTitle'] = combine.Name.str.extract(' ([A-Za-z]+)\.', expand=False)
combine['NameTitle'] = combine['NameTitle'].replace(['Lady', 'Countess','Capt', 'Col',\
'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')
combine['NameTitle'] = combine['NameTitle'].replace(['Mlle', 'Ms'], 'Miss')
combine['NameTitle'] = combine['NameTitle'].replace('Mme', 'Mrs')
combine = combine.drop('Name', axis = 1)
名前をone-hot符号化し、元の変数を削除
data_NameTitle = pd.get_dummies(combine['NameTitle'], prefix = 'NameTitle')
combine = pd.concat([data_NameTitle, combine], axis = 1)
combine = combine.drop('NameTitle', axis = 1)
ホームサイズ combine['FamilySize'] = combine['SibSp'] + combine['Parch'] + 1
combine = combine.drop(['SibSp', 'Parch'], axis = 1)
空の値が多すぎるCabinを除いて,情報が乱雑なTicket,PassengerIdも不要である. combine = combine.drop(['Cabin', 'PassengerId', 'Ticket'], axis = 1)
train、testを切り離す train = combine[combine['Survived'].notna()]
test = combine[combine['Survived'].isna()].drop('Survived', axis=1)
X_train = train.drop('Survived', axis = 1)
Y_train = train['Survived']
X_test = test
機械学習 from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
svc = SVC()
svc.fit(X_train, Y_train)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
knn = KNeighborsClassifier(n_neighbors = 33)
knn.fit(X_train, Y_train)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
gaussian = GaussianNB()
gaussian.fit(X_train, Y_train)
acc_gaussian = round(gaussian.score(X_train, Y_train) * 100, 2)
perceptron = Perceptron()
perceptron.fit(X_train, Y_train)
acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)
linear_svc = LinearSVC()
linear_svc.fit(X_train, Y_train)
acc_linear_svc = round(linear_svc.score(X_train, Y_train) * 100, 2)
sgd = SGDClassifier()
sgd.fit(X_train, Y_train)
acc_sgd = round(sgd.score(X_train, Y_train) * 100, 2)
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
acc_decision_tree = round(decision_tree.score(X_train, Y_train) * 100, 2)
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)
スコアの計算
models = pd.DataFrame({
'Model': ['Support Vector Machines', 'KNN',
'Random Forest', 'Naive Bayes', 'Perceptron',
'Stochastic Gradient Decent', 'Linear SVC',
'Decision Tree'],
'Score': [acc_svc, acc_knn,
acc_random_forest, acc_gaussian, acc_perceptron,
acc_sgd, acc_linear_svc, acc_decision_tree]})
結果
Model
Score
0
Support Vector Machines
88.55
1
KNN
72.62
2
Random Forest
99.10
3
Naive Bayes
79.91
4
Perceptron
58.59
5
Stochastic Gradient Decent
73.51
6
Linear SVC
82.04
7
Decision Tree
99.10
いくつかの発見
本明細書で使用するデータはKaggleにあり、次いで
pandas
を用いてデータのインポートが行われる.import pandas as pd
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
欠落値の存在はデータの使用に大きく影響するため、
.info()
を使用してデータの欠落値を確認します.train_df.info()
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
trainデータセットでは、
Age
、Cabin
、およびEmbarked
がデータの補完を行う必要がある.test_df.info()
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
PassengerId 418 non-null int64
Pclass 418 non-null int64
Name 418 non-null object
Sex 418 non-null object
Age 332 non-null float64
SibSp 418 non-null int64
Parch 418 non-null int64
Ticket 418 non-null object
Fare 417 non-null float64
Cabin 91 non-null object
Embarked 418 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 36.0+ KB
testデータセットでは、
Age
、Fare
、およびCabin
がデータの補完を行う必要がある.データセットを結合し、同時に2つのデータセットを洗浄します.
combine = pd.concat([train_df, test_df], axis = 0)
フィーチャーエンジニアリング import numpy as np
#
np.random.seed()
Ageの処理
ランダム生産正規分布数列、平均値:np.mean()
分散np.std()
Age_null = combine[combine['Age'].isna()]
Age_null['Age'] = np.random.normal(np.mean(combine['Age']), np.std(combine['Age'])\
, (Age_null.shape[0], 1))
#Age_null['Age'] = Age_null['Age'].apply(round)
Age_notnull = combine[combine['Age'].notna()]
combine = pd.concat([Age_null, Age_notnull], axis = 0)
Fareの処理
Fare_null = combine[combine['Fare'].isna()]
Fare_null['Fare'] = np.random.normal(np.mean(combine['Fare']), np.std(combine['Fare']), \
(Fare_null.shape[0], 1))
#Fare_null['Fare'] = Fare_null['Fare'].apply(round)
Fare_notnull = combine[combine['Fare'].notna()]
combine = pd.concat([Fare_null, Fare_notnull], axis = 0)
Embarkedの処理,'S'が最も一般的である
combine['Embarked'] = combine['Embarked'].fillna('S')
sexの0-1
sex = {'male': 1, 'female': 0}
combine['Sex'] = combine['Sex'].map(sex)
乗船港Embarkedはone-hotコードを行い、元の変数を削除します
data_Embark = pd.get_dummies(combine['Embarked'], prefix = 'Embarked')
combine = pd.concat([data_Embark, combine], axis = 1)
combine = combine.drop('Embarked', axis = 1)
客室等級Pclassはone-hot符号化を行い、同時に元の変数を削除する
data_Pclass = pd.get_dummies(combine['Pclass'], prefix = 'Pclass')
combine = pd.concat([data_Pclass, combine], axis = 1)
combine = combine.drop('Pclass', axis = 1)
名前
名前の意味を整理する
combine['NameTitle'] = combine.Name.str.extract(' ([A-Za-z]+)\.', expand=False)
combine['NameTitle'] = combine['NameTitle'].replace(['Lady', 'Countess','Capt', 'Col',\
'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')
combine['NameTitle'] = combine['NameTitle'].replace(['Mlle', 'Ms'], 'Miss')
combine['NameTitle'] = combine['NameTitle'].replace('Mme', 'Mrs')
combine = combine.drop('Name', axis = 1)
名前をone-hot符号化し、元の変数を削除
data_NameTitle = pd.get_dummies(combine['NameTitle'], prefix = 'NameTitle')
combine = pd.concat([data_NameTitle, combine], axis = 1)
combine = combine.drop('NameTitle', axis = 1)
ホームサイズ combine['FamilySize'] = combine['SibSp'] + combine['Parch'] + 1
combine = combine.drop(['SibSp', 'Parch'], axis = 1)
空の値が多すぎるCabinを除いて,情報が乱雑なTicket,PassengerIdも不要である. combine = combine.drop(['Cabin', 'PassengerId', 'Ticket'], axis = 1)
train、testを切り離す train = combine[combine['Survived'].notna()]
test = combine[combine['Survived'].isna()].drop('Survived', axis=1)
X_train = train.drop('Survived', axis = 1)
Y_train = train['Survived']
X_test = test
機械学習 from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
svc = SVC()
svc.fit(X_train, Y_train)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
knn = KNeighborsClassifier(n_neighbors = 33)
knn.fit(X_train, Y_train)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
gaussian = GaussianNB()
gaussian.fit(X_train, Y_train)
acc_gaussian = round(gaussian.score(X_train, Y_train) * 100, 2)
perceptron = Perceptron()
perceptron.fit(X_train, Y_train)
acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)
linear_svc = LinearSVC()
linear_svc.fit(X_train, Y_train)
acc_linear_svc = round(linear_svc.score(X_train, Y_train) * 100, 2)
sgd = SGDClassifier()
sgd.fit(X_train, Y_train)
acc_sgd = round(sgd.score(X_train, Y_train) * 100, 2)
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
acc_decision_tree = round(decision_tree.score(X_train, Y_train) * 100, 2)
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)
スコアの計算
models = pd.DataFrame({
'Model': ['Support Vector Machines', 'KNN',
'Random Forest', 'Naive Bayes', 'Perceptron',
'Stochastic Gradient Decent', 'Linear SVC',
'Decision Tree'],
'Score': [acc_svc, acc_knn,
acc_random_forest, acc_gaussian, acc_perceptron,
acc_sgd, acc_linear_svc, acc_decision_tree]})
結果
Model
Score
0
Support Vector Machines
88.55
1
KNN
72.62
2
Random Forest
99.10
3
Naive Bayes
79.91
4
Perceptron
58.59
5
Stochastic Gradient Decent
73.51
6
Linear SVC
82.04
7
Decision Tree
99.10
いくつかの発見
import numpy as np
#
np.random.seed()
Age_null = combine[combine['Age'].isna()]
Age_null['Age'] = np.random.normal(np.mean(combine['Age']), np.std(combine['Age'])\
, (Age_null.shape[0], 1))
#Age_null['Age'] = Age_null['Age'].apply(round)
Age_notnull = combine[combine['Age'].notna()]
combine = pd.concat([Age_null, Age_notnull], axis = 0)
Fare_null = combine[combine['Fare'].isna()]
Fare_null['Fare'] = np.random.normal(np.mean(combine['Fare']), np.std(combine['Fare']), \
(Fare_null.shape[0], 1))
#Fare_null['Fare'] = Fare_null['Fare'].apply(round)
Fare_notnull = combine[combine['Fare'].notna()]
combine = pd.concat([Fare_null, Fare_notnull], axis = 0)
combine['Embarked'] = combine['Embarked'].fillna('S')
sex = {'male': 1, 'female': 0}
combine['Sex'] = combine['Sex'].map(sex)
data_Embark = pd.get_dummies(combine['Embarked'], prefix = 'Embarked')
combine = pd.concat([data_Embark, combine], axis = 1)
combine = combine.drop('Embarked', axis = 1)
data_Pclass = pd.get_dummies(combine['Pclass'], prefix = 'Pclass')
combine = pd.concat([data_Pclass, combine], axis = 1)
combine = combine.drop('Pclass', axis = 1)
combine['NameTitle'] = combine.Name.str.extract(' ([A-Za-z]+)\.', expand=False)
combine['NameTitle'] = combine['NameTitle'].replace(['Lady', 'Countess','Capt', 'Col',\
'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')
combine['NameTitle'] = combine['NameTitle'].replace(['Mlle', 'Ms'], 'Miss')
combine['NameTitle'] = combine['NameTitle'].replace('Mme', 'Mrs')
combine = combine.drop('Name', axis = 1)
data_NameTitle = pd.get_dummies(combine['NameTitle'], prefix = 'NameTitle')
combine = pd.concat([data_NameTitle, combine], axis = 1)
combine = combine.drop('NameTitle', axis = 1)
combine['FamilySize'] = combine['SibSp'] + combine['Parch'] + 1
combine = combine.drop(['SibSp', 'Parch'], axis = 1)
空の値が多すぎるCabinを除いて,情報が乱雑なTicket,PassengerIdも不要である. combine = combine.drop(['Cabin', 'PassengerId', 'Ticket'], axis = 1)
train、testを切り離す train = combine[combine['Survived'].notna()]
test = combine[combine['Survived'].isna()].drop('Survived', axis=1)
X_train = train.drop('Survived', axis = 1)
Y_train = train['Survived']
X_test = test
機械学習 from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
svc = SVC()
svc.fit(X_train, Y_train)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
knn = KNeighborsClassifier(n_neighbors = 33)
knn.fit(X_train, Y_train)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
gaussian = GaussianNB()
gaussian.fit(X_train, Y_train)
acc_gaussian = round(gaussian.score(X_train, Y_train) * 100, 2)
perceptron = Perceptron()
perceptron.fit(X_train, Y_train)
acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)
linear_svc = LinearSVC()
linear_svc.fit(X_train, Y_train)
acc_linear_svc = round(linear_svc.score(X_train, Y_train) * 100, 2)
sgd = SGDClassifier()
sgd.fit(X_train, Y_train)
acc_sgd = round(sgd.score(X_train, Y_train) * 100, 2)
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
acc_decision_tree = round(decision_tree.score(X_train, Y_train) * 100, 2)
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)
スコアの計算
models = pd.DataFrame({
'Model': ['Support Vector Machines', 'KNN',
'Random Forest', 'Naive Bayes', 'Perceptron',
'Stochastic Gradient Decent', 'Linear SVC',
'Decision Tree'],
'Score': [acc_svc, acc_knn,
acc_random_forest, acc_gaussian, acc_perceptron,
acc_sgd, acc_linear_svc, acc_decision_tree]})
結果
Model
Score
0
Support Vector Machines
88.55
1
KNN
72.62
2
Random Forest
99.10
3
Naive Bayes
79.91
4
Perceptron
58.59
5
Stochastic Gradient Decent
73.51
6
Linear SVC
82.04
7
Decision Tree
99.10
いくつかの発見
combine = combine.drop(['Cabin', 'PassengerId', 'Ticket'], axis = 1)
train = combine[combine['Survived'].notna()]
test = combine[combine['Survived'].isna()].drop('Survived', axis=1)
X_train = train.drop('Survived', axis = 1)
Y_train = train['Survived']
X_test = test
機械学習 from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
svc = SVC()
svc.fit(X_train, Y_train)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
knn = KNeighborsClassifier(n_neighbors = 33)
knn.fit(X_train, Y_train)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
gaussian = GaussianNB()
gaussian.fit(X_train, Y_train)
acc_gaussian = round(gaussian.score(X_train, Y_train) * 100, 2)
perceptron = Perceptron()
perceptron.fit(X_train, Y_train)
acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)
linear_svc = LinearSVC()
linear_svc.fit(X_train, Y_train)
acc_linear_svc = round(linear_svc.score(X_train, Y_train) * 100, 2)
sgd = SGDClassifier()
sgd.fit(X_train, Y_train)
acc_sgd = round(sgd.score(X_train, Y_train) * 100, 2)
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
acc_decision_tree = round(decision_tree.score(X_train, Y_train) * 100, 2)
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)
スコアの計算
models = pd.DataFrame({
'Model': ['Support Vector Machines', 'KNN',
'Random Forest', 'Naive Bayes', 'Perceptron',
'Stochastic Gradient Decent', 'Linear SVC',
'Decision Tree'],
'Score': [acc_svc, acc_knn,
acc_random_forest, acc_gaussian, acc_perceptron,
acc_sgd, acc_linear_svc, acc_decision_tree]})
結果
Model
Score
0
Support Vector Machines
88.55
1
KNN
72.62
2
Random Forest
99.10
3
Naive Bayes
79.91
4
Perceptron
58.59
5
Stochastic Gradient Decent
73.51
6
Linear SVC
82.04
7
Decision Tree
99.10
いくつかの発見
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
svc = SVC()
svc.fit(X_train, Y_train)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
knn = KNeighborsClassifier(n_neighbors = 33)
knn.fit(X_train, Y_train)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
gaussian = GaussianNB()
gaussian.fit(X_train, Y_train)
acc_gaussian = round(gaussian.score(X_train, Y_train) * 100, 2)
perceptron = Perceptron()
perceptron.fit(X_train, Y_train)
acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)
linear_svc = LinearSVC()
linear_svc.fit(X_train, Y_train)
acc_linear_svc = round(linear_svc.score(X_train, Y_train) * 100, 2)
sgd = SGDClassifier()
sgd.fit(X_train, Y_train)
acc_sgd = round(sgd.score(X_train, Y_train) * 100, 2)
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
acc_decision_tree = round(decision_tree.score(X_train, Y_train) * 100, 2)
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)
models = pd.DataFrame({
'Model': ['Support Vector Machines', 'KNN',
'Random Forest', 'Naive Bayes', 'Perceptron',
'Stochastic Gradient Decent', 'Linear SVC',
'Decision Tree'],
'Score': [acc_svc, acc_knn,
acc_random_forest, acc_gaussian, acc_perceptron,
acc_sgd, acc_linear_svc, acc_decision_tree]})