#2.1.10 ★Guided Project: Analyzing Thanksgiving Dinner.md

6372 ワード

1. Introducing Thanksgiving Dinner Data
Instructions
  • Import the pandas package.
  • pandasを使用する.read_csv()関数はthanksgivingを読み出す.csvファイル.
  • は、指定されたキーワードパラメータencoding="Latin-1"であることを確認します.例えば、CSVファイルは通常符号化されません.
  • は、結果の変数dataを割り当てる.
  • に表示される最初の数行のdataは、行と列の様子を見てみましょう.
  • In a separate notebook cell, display all of the column names to get a sense of what the data consists of.
  • pandasを使用できます.DataFrame.columnsプロパティに表示されるカラム名.

  • import pandas as pd
    data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
    data.head()
    data.columns()
    
    

    3. Using value_counts To Explore Main Dishes
    input
    print(data['What is typically the main dish at your Thanksgiving dinner?'].value_counts())
    

    output
    Turkey 859Other (please specify) 35Ham/Pork 29Tofurkey 20Chicken 12Roast beef 11I don't know 5Turducken 3Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64
    

    4. Figuring Out What Pies People Eat
    input
    apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
    pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
    pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])
    ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
    print(ate_pies.value_counts())
    

    output
    False 876True 182dtype: int64
    #    182          pie     
    

    5. Converting Age To Numeric
    input
    
    print(data['Age'].value_counts())
    
    

    output
    
    45 - 59 28660+ 26430 - 44 25918 - 29 216Name: Age, dtype: int64
    
    

    input
    
    def str_to_int(age_str):
    
        if pd.isnull(age_str):    # Use the isnull() function to check if the value is null. If it is, return None.
    
            return None
    
        age_str = age_str.split(' ')[0]# Split the string on the space character (), and extract the first item of the resulting list.
    
        age_str = age_str.replace('+', '') # Replace the + character in the result with an empty string to remove it.
    
        return int(age_str) # Use int() to convert the result to an integer.
    
    data['int_age'] = data['Age'].apply(str_to_int) # Use the pandas.Series.apply() method to apply the function to each value in the Age column of data.
    
    data['int_age'].describe() # Call the pandas.Series.describe() method on the int_age column of data, and display the result.
    
    

    output
    
    count 1025.000000mean 39.383415std 15.398493min 18.00000025% 30.00000050% 45.00000075% 60.000000max 60.000000Name: int_age, dtype: float64
    

    6. Converting Income To Numeric
    input
    print(data['How much total combined money did all members of your HOUSEHOLD earn last year?'].value_counts())
    

    output
    $25,000 to $49,999 180Prefer not to answer 136$50,000 to $74,999 135$75,000 to $99,999 133$100,000 to $124,999 111$200,000 and up 80$10,000 to $24,999 68$0 to $9,999 66$125,000 to $149,999 49$150,000 to $174,999 40$175,000 to $199,999 27Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64
    

    input
    def income_to_int(income_str):
        if pd.isnull(income_str):  # Use the isnull() function to check if the value is null. If it is, return None.
            return None
        income_str = income_str.split(' ')[0] # Split the string on the space character (), and extract the first item of the resulting list.
        if income_str == 'Prefer':
            return None
        income_str = income_str.replace('$', '')
        income_str = income_str.replace(',', '')
        return int(income_str)
    
    data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(income_to_int)
    print(data['int_income'].describe())
    

    output
    count 889.000000mean 74077.615298std 59360.742902min 0.00000025% 25000.00000050% 50000.00000075% 100000.000000max 200000.000000Name: int_income, dtype: float64
    
    

    7. Correlating Travel Distance And Income
    input
    print(data[data['int_income'] < 150000]['How far will you travel for Thanksgiving?'].value_counts())
    print('--------------------------------------------------')
    print(data[data['int_income'] > 150000]['How far will you travel for Thanksgiving?'].value_counts())
    

    output
    Thanksgiving is happening at my home--I won't travel at all 281Thanksgiving is local--it will take place in the town I live in 203Thanksgiving is out of town but not too far--it's a drive of a few hours or less 150Thanksgiving is out of town and far away--I have to drive several hours or fly 55Name: How far will you travel for Thanksgiving?, dtype: int64--------------------------------------------------Thanksgiving is happening at my home--I won't travel at all 49Thanksgiving is local--it will take place in the town I live in 25Thanksgiving is out of town but not too far--it's a drive of a few hours or less 16Thanksgiving is out of town and far away--I have to drive several hours or fly 12Name: How far will you travel for Thanksgiving?, dtype: int64
    
    

    8. Linking Friendship And Age
    input
    data.pivot_table(
        index = "Have you ever tried to meet up with hometown friends on Thanksgiving night?",
        columns = 'Have you ever attended a "Friendsgiving?"',
        values = 'int_age'
    )
    

    output
    [画像アップロード中...(1)]#####input
    data.pivot_table(
        index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?',
        columns = 'Have you ever attended a "Friendsgiving?"',
        values = 'int_income'
    )
    

    output
    [画像アップロード中...(2)]