Kaggle Challenge 09 - Your First Machine Learning Model


Kaggle Challenge 09 - Your First Machine Learning Model
Tutorial01
Import
import pandas as pd

melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
melbourne_data.columns
Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
       'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
       'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
       'Longtitude', 'Regionname', 'Propertycount'],
      dtype='object')
Step 1: Specify Prediction Target
Quesition
print the list of columns in the dataset to find the name of the prediction target
y = ____

# Check your answer
step_1.check()
Solution
y = home_data.SalePrice
Step 2: Create X
Quesition
Step 2: Create X
Now you will create a DataFrame called X holding the predictive features.
Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X.
You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes): LotArea YearBuilt 1stFlrSF 2ndFlrSF FullBath BedroomAbvGr * TotRmsAbvGrd
After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.
# Create the list of features below
feature_names = ___

# Select data corresponding to features in feature_names
X = ____
Solution
feature_names = ["LotArea", "YearBuilt", "1stFlrSF", "2ndFlrSF",
                      "FullBath", "BedroomAbvGr", "TotRmsAbvGrd"]

X=home_data[feature_names]
Step 3: Specify and Fit Mode
Quesition
Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.
Then fit the model you just created using the data in X and y that you saved above.
# from _ import _
#specify the model. 
#For model reproducibility, set a numeric value for random_state when specifying the model
iowa_model = ____

# Fit the model
Solution
from sklearn.tree import DecisionTreeRegressor
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(X, y)
Step 4: Make Predictions
Quesition
predictions = ____
print(predictions)
Solution
iowa_model.predict(X)
Tutorial02
from sklearn.metrics import mean_absolute_error

predicted_home_prices = melbourne_model.predict(X)
mean_absolute_error(y, predicted_home_prices)
from sklearn.model_selection import train_test_split

# split data into training and validation data, for both features and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
# Define model
melbourne_model = DecisionTreeRegressor()
# Fit model
melbourne_model.fit(train_X, train_y)

# get predicted prices on validation data
val_predictions = melbourne_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))
Step 1: Split Your Data
Quesition
Use the train_test_split function to split up your data.
Give it the argument random_state=1 so the check functions know what to expect when verifying your code.
Recall, your features are loaded in the DataFrame X and your target is loaded in y.
# Import the train_test_split function and uncomment
# from _ import _

# fill in and uncomment
# train_X, val_X, train_y, val_y = ____
Solution
from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
Step 2: Specify and Fit the Model
Quesition
Create a DecisionTreeRegressor model and fit it to the relevant data. Set random_state to 1 again when creating the model.
# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again

# Specify the model
iowa_model = ____

# Fit iowa_model with the training data.
Solution
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(train_X, train_y)
Step 3: Make Predictions with Validation data
Quesition
# Predict with all validation observations
val_predictions = ____
Solution
val_predictions = iowa_model.predict(val_X)
Step 4: Calculate the Mean Absolute Error in Validation Data
Quesition
from sklearn.metrics import mean_absolute_error
val_mae = ____

# uncomment following line to see the validation_mae
#print(val_mae)
Solution
val_mae = mean_absolute_error(val_predictions, val_y)