Introducing Scikit-Learn
3832 ワード
About Scikit-Learn
Scikit-Learn is one of the most-used open-source machine learning library for Python. Scikit-Learn provides various unsupervised and supervised learning algorithms which many data-scientists rely on.
Install Scikit-Learn
conda install scikit-learn
pip install scikit-learn
import sklearn
print(sklearn.__version__)
Output0.21.3
Predict Types of Irises
We will try to classify types of irises based on the imported feature dataset (i.e - sepal length, sepal width, petal length, petal width).
Classification
supervised-learning problem where a class label is predicted for a given exmaple of input data (i.e - classify COVID-19, classify spam mails)
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
# load iris dataset
iris = load_iris()
# iris.data contains feature-data in a numpy format
iris_data = iris.data
# iris.target contains label-data in a numpy format
iris_label = iris.target
print('Iris Target Values : \n', iris_label)
print('Iris Target Names : \n', iris.target_names)
# convert data-set to DataFrame
iris_df = pd.DataFrame(data=iris_data, columns=iris.feature_names)
iris_df['label'] = iris.target
iris_df.head(3)
OutputIris Target Values :
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
Iris Target Names :
['setosa' 'versicolor' 'virginica']
Split to Train & Test Data
Train and test data must be splitted in order to evaluate the performance of the trained model. Scikit-Learn provies train_test_split() API to easily split dataset.
X_train, X_test, y_train, y_test = train_test_split(iris_data, iris_label, test_size=0.2, random_state=11)
# craete Decision Tree Classifier object
dt_clf = DecisionTreeClassifier(random_state=11)
# perform train
# fit() calls train feature data set & train label data set
dt_clf.fit(X_train, y_train)
Now, DecisionTreeClassifier has completed its training on data based on train data-set. Prediction must use another dataset (test data-set) by calling predict(). # perform prediction on dt_clf using test data-set
pred = dt_clf.predict(X_test)
Now import accuracy_score to evaluate the performance of the model from sklearn.metrics import accuracy_score
print('Accuracy Score : {0:4f}'.format(accuracy_score(y_test, pred)))
OutputAccuracy Score : 0.933333
The trained algorithm of decision tree classifer is measured to have 93.33% of accuracy. To Summarize
Reference
この問題について(Introducing Scikit-Learn), 我々は、より多くの情報をここで見つけました https://velog.io/@jiselectric/Machine-Learning-with-Scikit-Learn-01テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。
Collection and Share based on the CC Protocol