Hands-On Machine Learning with Scikit-Learn & TensorFlow Exercise Q&A Chapter07
Q1. If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? If not, why?
A1: I can try combining them into a voting ensemble. If the models are very different, this ensemble methods will surely be better!
Q2. What is the difference between hard and soft voting classifiers?
A2: A hard voting classifier just counts the votes of each classifier in the ensemble and picks the class that gets the most votes; while a soft voting classifier computes the average estimated class probability for each class and picks the class with the highest probability.
Q3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, random forests, or stacking ensembles?
A3: It is possible to speed up training of a bagging ensemble by distributing it across multiple servers, because bagging ensemble is a parallel method, so does pasting ensemble and random forests. But as for boosting ensemble, it is based on the previous predictor, so training is necessarily sequential, thus you cannot speed up boosting ensemble by distributing it across multiple servers. For stacking ensemble, all the predictors are independent so it can be trained parallel, but the predictors in one llayer can only be trained after the predictors in the previous layer have all been trained.
Q4. What is the benefit of out-of-bag evaluation?
A4: With out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that it was not trained on. This makes it possible to have a fairly unbiased evaluation of the ensemble without the need for an additional validation set. Thus, you have more instances available for training, and your ensemble can perform slightly better.
Q5. What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slower or faster than regular Random Forests?
A5: Extra-Trees use random thresholds for each feature, rather than the best possible thresholds which regular Random Forests do. This extra randomness acts like a form of regularization, so if a Random Forests overfit the training data, Extra-Trees may help. Since Extra-Trees don't search for the best possible thresholds, they are much faster than the regular Random Forests.
Q6. If your AdaBoost ensemble underfits the training data, what hyperparameters should you tweak and how?
A6: I can increase the n_eatimator and learning_rate. or reduce the regularization hyperparameters of the base estimator.
Q7. If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate?
A7: Decrease the learning rate or use early stopping.
Q8. Voting Classifier.
A8:
Q9. Stacking Ensemble
A9:
A1: I can try combining them into a voting ensemble. If the models are very different, this ensemble methods will surely be better!
Q2. What is the difference between hard and soft voting classifiers?
A2: A hard voting classifier just counts the votes of each classifier in the ensemble and picks the class that gets the most votes; while a soft voting classifier computes the average estimated class probability for each class and picks the class with the highest probability.
Q3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, random forests, or stacking ensembles?
A3: It is possible to speed up training of a bagging ensemble by distributing it across multiple servers, because bagging ensemble is a parallel method, so does pasting ensemble and random forests. But as for boosting ensemble, it is based on the previous predictor, so training is necessarily sequential, thus you cannot speed up boosting ensemble by distributing it across multiple servers. For stacking ensemble, all the predictors are independent so it can be trained parallel, but the predictors in one llayer can only be trained after the predictors in the previous layer have all been trained.
Q4. What is the benefit of out-of-bag evaluation?
A4: With out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that it was not trained on. This makes it possible to have a fairly unbiased evaluation of the ensemble without the need for an additional validation set. Thus, you have more instances available for training, and your ensemble can perform slightly better.
Q5. What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slower or faster than regular Random Forests?
A5: Extra-Trees use random thresholds for each feature, rather than the best possible thresholds which regular Random Forests do. This extra randomness acts like a form of regularization, so if a Random Forests overfit the training data, Extra-Trees may help. Since Extra-Trees don't search for the best possible thresholds, they are much faster than the regular Random Forests.
Q6. If your AdaBoost ensemble underfits the training data, what hyperparameters should you tweak and how?
A6: I can increase the n_eatimator and learning_rate. or reduce the regularization hyperparameters of the base estimator.
Q7. If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate?
A7: Decrease the learning rate or use early stopping.
Q8. Voting Classifier.
A8:
from sklearn.model_selection import train_test_split
X_train_val, X_test, y_train_val, y_test = train_test_split(
mnist.data, mnist.target, test_size=10000, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(
X_train_val, y_train_val, test_size=10000, random_state=42)
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.svm import LinearSVC
from sklearn.neural_network import MLPClassifier
random_forest_clf = RandomForestClassifier(n_estimators=10, random_state=42)
extra_trees_clf = ExtraTreesClassifier(n_estimators=10, random_state=42)
svm_clf = LinearSVC(random_state=42)
mlp_clf = MLPClassifier(random_state=42)
estimators = [random_forest_clf, extra_trees_clf, svm_clf, mlp_clf]
for estimator in estimators:
print("Training the", estimator)
estimator.fit(X_train, y_train)
[estimator.score(X_val, y_val) for estimator in estimators]
from sklearn.ensemble import VotingClassifier
named_estimators = [
("random_forest_clf", random_forest_clf),
("extra_trees_clf", extra_trees_clf),
("svm_clf", svm_clf),
("mlp_clf", mlp_clf),
]
voting_clf = VotingClassifier(named_estimators)
voting_clf.fit(X_train, y_train)
voting_clf.score(X_val, y_val)
[estimator.score(X_val, y_val) for estimator in voting_clf.estimators_]
voting_clf.set_params(svm_clf=None)
voting_clf.estimators
voting_clf.estimators_
del voting_clf.estimators_[2]
voting_clf.score(X_val, y_val)
voting_clf.voting = "soft"
voting_clf.score(X_val, y_val)
voting_clf.score(X_test, y_test)
[estimator.score(X_test, y_test) for estimator in voting_clf.estimators_]
Q9. Stacking Ensemble
A9:
X_val_predictions = np.empty((len(X_val), len(estimators)), dtype=np.float32)
for index, estimator in enumerate(estimators):
X_val_predictions[:, index] = estimator.predict(X_val)
X_val_predictions
rnd_forest_blender = RandomForestClassifier(n_estimators=200, oob_score=True, random_state=42)
rnd_forest_blender.fit(X_val_predictions, y_val)
rnd_forest_blender.oob_score_
X_test_predictions = np.empty((len(X_test), len(estimators)), dtype=np.float32)
for index, estimator in enumerate(estimators):
X_test_predictions[:, index] = estimator.predict(X_test)
y_pred = rnd_forest_blender.predict(X_test_predictions)
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)