へいきんさ

4614 ワード

平均差信頼区間問題:1. 10000反復に対して、自己展法(bootstrap)はあなたのサンプルデータをサンプリングし、コーヒーを飲む人と飲まない人の平均身長の違いを計算します.あなたのサンプリング分布を使用して99%の信頼区間を構築します.あなたの区間に基づいて次の最初のテスト問題に答え始めます.2.は10000反復に対して、自己展法でサンプルデータをサンプリングし、21歳以上と21歳以下の平均身長の差を計算する.あなたのサンプリング分布を用いて99%の信頼区間を構築します.あなたの区間に基づいて、次の最初のテスト問題に答えることができます.3. 10000回の反復に対して、自己展法はあなたのサンプルデータをサンプリングし、21歳以下の個人のコーヒーを飲む人の平均身長とコーヒーを飲まない人の平均身長の違いを計算します.あなたのサンプリング分布を使用して、95%の信頼区間を確立します.あなたの区間に基づいて次の2番目のテスト問題に答えます.4. 10000回の反復に対して、自己展法はあなたのサンプルデータをサンプリングし、21歳以上の個人のコーヒーを飲む人の平均身長とコーヒーを飲まない人の平均身長の違いを計算します.あなたのサンプリング分布を使用して、95%の信頼区間を確立します.あなたの区間に基づいて、次の2番目のテスト問題と以下の質問に答えます.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
np.random.seed(42)

full_data = pd.read_csv('coffee_dataset.csv')
sample_data = full_data.sample(200)
sample_data.head()
  • For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for coffee and non-coffee drinkers. Build a 99% confidence interval using your sampling distribution. Use your interval to start answering the first quiz question below.
  • diffs = []
    for _ in range(10000):
        bootsamp = sample_data.sample(200, replace = True)
        coff_mean = bootsamp[bootsamp['drinks_coffee'] == True]['height'].mean()
        nocoff_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
        diffs.append(coff_mean - nocoff_mean)
     
    np.percentile(diffs, 0.5), np.percentile(diffs, 99.5) 
    # statistical evidence coffee drinkers are on average taller
    
    plt.hist(diffs)
    
  • For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for those older than 21 and those younger than 21. Build a 99% confidence interval using your sampling distribution. Use your interval to finish answering the first quiz question below.
  • diffs_age = []
    for _ in range(10000):
        bootsamp = sample_data.sample(200, replace = True)
        under21_mean = bootsamp[bootsamp['age'] == '<21']['height'].mean()
        over21_mean = bootsamp[bootsamp['age'] != '<21']['height'].mean()
        diffs_age.append(over21_mean - under21_mean)
    
    np.percentile(diffs_age, 0.5), np.percentile(diffs_age, 99.5)
    # statistical evidence that over21 are on average taller
    
    # diffs_coff_under211=[]
    for _ in range(10000):
        bootsamp=sample_data.sample(200,replace=True)
        under21_coff_mean=bootsamp[bootsamp['age']]
    
    
  • For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to start answering question 2 below.
  • diffs_coff_under21 = []
    for _ in range(10000):
        bootsamp = sample_data.sample(200, replace = True)
        under21_coff_mean = bootsamp.query("age == '<21' and drinks_coffee == True")['height'].mean()
        under21_nocoff_mean = bootsamp.query("age == '<21' and drinks_coffee == False")['height'].mean()
        diffs_coff_under21.append(under21_nocoff_mean - under21_coff_mean)
    
    np.percentile(diffs_coff_under21, 2.5), np.percentile(diffs_coff_under21, 97.5)
    # For the under21 group, we have evidence that the non-coffee drinkers are on average taller
    
  • For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to finish answering the second quiz question below. As well as the following questions.
  • diffs_coff_over21 = []
    for _ in range(10000):
        bootsamp = sample_data.sample(200, replace = True)
        over21_coff_mean = bootsamp.query("age != '<21' and drinks_coffee == True")['height'].mean()
        over21_nocoff_mean = bootsamp.query("age != '<21' and drinks_coffee == False")['height'].mean()
        diffs_coff_over21.append(over21_nocoff_mean - over21_coff_mean)
    
    np.percentile(diffs_coff_over21, 2.5), np.percentile(diffs_coff_over21, 97.5)
    # For the over21 group, we have evidence that on average the non-coffee drinkers are taller