へいきんさ
4614 ワード
平均差信頼区間問題: For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for coffee and non-coffee drinkers. Build a 99% confidence interval using your sampling distribution. Use your interval to start answering the first quiz question below. For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for those older than 21 and those younger than 21. Build a 99% confidence interval using your sampling distribution. Use your interval to finish answering the first quiz question below. For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to start answering question 2 below. For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to finish answering the second quiz question below. As well as the following questions.
1.
10000反復に対して、自己展法(bootstrap)はあなたのサンプルデータをサンプリングし、コーヒーを飲む人と飲まない人の平均身長の違いを計算します.あなたのサンプリング分布を使用して99%の信頼区間を構築します.あなたの区間に基づいて次の最初のテスト問題に答え始めます.2.
は10000反復に対して、自己展法でサンプルデータをサンプリングし、21歳以上と21歳以下の平均身長の差を計算する.あなたのサンプリング分布を用いて99%の信頼区間を構築します.あなたの区間に基づいて、次の最初のテスト問題に答えることができます.3.
10000回の反復に対して、自己展法はあなたのサンプルデータをサンプリングし、21歳以下の個人のコーヒーを飲む人の平均身長とコーヒーを飲まない人の平均身長の違いを計算します.あなたのサンプリング分布を使用して、95%の信頼区間を確立します.あなたの区間に基づいて次の2番目のテスト問題に答えます.4.
10000回の反復に対して、自己展法はあなたのサンプルデータをサンプリングし、21歳以上の個人のコーヒーを飲む人の平均身長とコーヒーを飲まない人の平均身長の違いを計算します.あなたのサンプリング分布を使用して、95%の信頼区間を確立します.あなたの区間に基づいて、次の2番目のテスト問題と以下の質問に答えます.import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(42)
full_data = pd.read_csv('coffee_dataset.csv')
sample_data = full_data.sample(200)
sample_data.head()
diffs = []
for _ in range(10000):
bootsamp = sample_data.sample(200, replace = True)
coff_mean = bootsamp[bootsamp['drinks_coffee'] == True]['height'].mean()
nocoff_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
diffs.append(coff_mean - nocoff_mean)
np.percentile(diffs, 0.5), np.percentile(diffs, 99.5)
# statistical evidence coffee drinkers are on average taller
plt.hist(diffs)
diffs_age = []
for _ in range(10000):
bootsamp = sample_data.sample(200, replace = True)
under21_mean = bootsamp[bootsamp['age'] == '<21']['height'].mean()
over21_mean = bootsamp[bootsamp['age'] != '<21']['height'].mean()
diffs_age.append(over21_mean - under21_mean)
np.percentile(diffs_age, 0.5), np.percentile(diffs_age, 99.5)
# statistical evidence that over21 are on average taller
# diffs_coff_under211=[]
for _ in range(10000):
bootsamp=sample_data.sample(200,replace=True)
under21_coff_mean=bootsamp[bootsamp['age']]
diffs_coff_under21 = []
for _ in range(10000):
bootsamp = sample_data.sample(200, replace = True)
under21_coff_mean = bootsamp.query("age == '<21' and drinks_coffee == True")['height'].mean()
under21_nocoff_mean = bootsamp.query("age == '<21' and drinks_coffee == False")['height'].mean()
diffs_coff_under21.append(under21_nocoff_mean - under21_coff_mean)
np.percentile(diffs_coff_under21, 2.5), np.percentile(diffs_coff_under21, 97.5)
# For the under21 group, we have evidence that the non-coffee drinkers are on average taller
diffs_coff_over21 = []
for _ in range(10000):
bootsamp = sample_data.sample(200, replace = True)
over21_coff_mean = bootsamp.query("age != '<21' and drinks_coffee == True")['height'].mean()
over21_nocoff_mean = bootsamp.query("age != '<21' and drinks_coffee == False")['height'].mean()
diffs_coff_over21.append(over21_nocoff_mean - over21_coff_mean)
np.percentile(diffs_coff_over21, 2.5), np.percentile(diffs_coff_over21, 97.5)
# For the over21 group, we have evidence that on average the non-coffee drinkers are taller