Kaggle Challenge 09 - Your First Machine Learning Model
5759 ワード
Kaggle Challenge 09 - Your First Machine Learning Model
Tutorial01
Import
Quesition
print the list of columns in the dataset to find the name of the prediction target
Quesition
Step 2: Create X
Now you will create a DataFrame called X holding the predictive features.
Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X.
You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes): LotArea YearBuilt 1stFlrSF 2ndFlrSF FullBath BedroomAbvGr * TotRmsAbvGrd
After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.
Quesition
Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.
Then fit the model you just created using the data in X and y that you saved above.
Quesition
Quesition
Use the train_test_split function to split up your data.
Give it the argument random_state=1 so the check functions know what to expect when verifying your code.
Recall, your features are loaded in the DataFrame X and your target is loaded in y.
Quesition
Create a DecisionTreeRegressor model and fit it to the relevant data. Set random_state to 1 again when creating the model.
Quesition
Quesition
Tutorial01
Import
import pandas as pd
melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path)
melbourne_data.columns
Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
'Longtitude', 'Regionname', 'Propertycount'],
dtype='object')
Step 1: Specify Prediction TargetQuesition
print the list of columns in the dataset to find the name of the prediction target
y = ____
# Check your answer
step_1.check()
Solutiony = home_data.SalePrice
Step 2: Create XQuesition
Step 2: Create X
Now you will create a DataFrame called X holding the predictive features.
Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X.
You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes): LotArea YearBuilt 1stFlrSF 2ndFlrSF FullBath BedroomAbvGr * TotRmsAbvGrd
After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.
# Create the list of features below
feature_names = ___
# Select data corresponding to features in feature_names
X = ____
Solutionfeature_names = ["LotArea", "YearBuilt", "1stFlrSF", "2ndFlrSF",
"FullBath", "BedroomAbvGr", "TotRmsAbvGrd"]
X=home_data[feature_names]
Step 3: Specify and Fit ModeQuesition
Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.
Then fit the model you just created using the data in X and y that you saved above.
# from _ import _
#specify the model.
#For model reproducibility, set a numeric value for random_state when specifying the model
iowa_model = ____
# Fit the model
Solutionfrom sklearn.tree import DecisionTreeRegressor
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(X, y)
Step 4: Make PredictionsQuesition
predictions = ____
print(predictions)
Solutioniowa_model.predict(X)
Tutorial02from sklearn.metrics import mean_absolute_error
predicted_home_prices = melbourne_model.predict(X)
mean_absolute_error(y, predicted_home_prices)
from sklearn.model_selection import train_test_split
# split data into training and validation data, for both features and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
# Define model
melbourne_model = DecisionTreeRegressor()
# Fit model
melbourne_model.fit(train_X, train_y)
# get predicted prices on validation data
val_predictions = melbourne_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))
Step 1: Split Your DataQuesition
Use the train_test_split function to split up your data.
Give it the argument random_state=1 so the check functions know what to expect when verifying your code.
Recall, your features are loaded in the DataFrame X and your target is loaded in y.
# Import the train_test_split function and uncomment
# from _ import _
# fill in and uncomment
# train_X, val_X, train_y, val_y = ____
Solutionfrom sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
Step 2: Specify and Fit the ModelQuesition
Create a DecisionTreeRegressor model and fit it to the relevant data. Set random_state to 1 again when creating the model.
# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again
# Specify the model
iowa_model = ____
# Fit iowa_model with the training data.
Solutioniowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(train_X, train_y)
Step 3: Make Predictions with Validation dataQuesition
# Predict with all validation observations
val_predictions = ____
Solutionval_predictions = iowa_model.predict(val_X)
Step 4: Calculate the Mean Absolute Error in Validation DataQuesition
from sklearn.metrics import mean_absolute_error
val_mae = ____
# uncomment following line to see the validation_mae
#print(val_mae)
Solutionval_mae = mean_absolute_error(val_predictions, val_y)
Reference
この問題について(Kaggle Challenge 09 - Your First Machine Learning Model), 我々は、より多くの情報をここで見つけました https://velog.io/@ljsk99499/Kaggle09テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。
Collection and Share based on the CC Protocol