#2.1.10 ★Guided Project: Analyzing Thanksgiving Dinner.md
6372 ワード
1. Introducing Thanksgiving Dinner Data
Instructions Import the pandas package. pandasを使用する.read_csv()関数はthanksgivingを読み出す.csvファイル. は、指定されたキーワードパラメータencoding="Latin-1"であることを確認します.例えば、CSVファイルは通常符号化されません. は、結果の変数dataを割り当てる. に表示される最初の数行のdataは、行と列の様子を見てみましょう. In a separate notebook cell, display all of the column names to get a sense of what the data consists of. pandasを使用できます.DataFrame.columnsプロパティに表示されるカラム名.
3. Using value_counts To Explore Main Dishes
input
output
4. Figuring Out What Pies People Eat
input
output
5. Converting Age To Numeric
input
output
input
output
6. Converting Income To Numeric
input
output
input
output
7. Correlating Travel Distance And Income
input
output
8. Linking Friendship And Age
input
output
[画像アップロード中...(1)]#####input
output
[画像アップロード中...(2)]
Instructions
import pandas as pd
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
data.head()
data.columns()
3. Using value_counts To Explore Main Dishes
input
print(data['What is typically the main dish at your Thanksgiving dinner?'].value_counts())
output
Turkey 859Other (please specify) 35Ham/Pork 29Tofurkey 20Chicken 12Roast beef 11I don't know 5Turducken 3Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64
4. Figuring Out What Pies People Eat
input
apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
print(ate_pies.value_counts())
output
False 876True 182dtype: int64
# 182 pie
5. Converting Age To Numeric
input
print(data['Age'].value_counts())
output
45 - 59 28660+ 26430 - 44 25918 - 29 216Name: Age, dtype: int64
input
def str_to_int(age_str):
if pd.isnull(age_str): # Use the isnull() function to check if the value is null. If it is, return None.
return None
age_str = age_str.split(' ')[0]# Split the string on the space character (), and extract the first item of the resulting list.
age_str = age_str.replace('+', '') # Replace the + character in the result with an empty string to remove it.
return int(age_str) # Use int() to convert the result to an integer.
data['int_age'] = data['Age'].apply(str_to_int) # Use the pandas.Series.apply() method to apply the function to each value in the Age column of data.
data['int_age'].describe() # Call the pandas.Series.describe() method on the int_age column of data, and display the result.
output
count 1025.000000mean 39.383415std 15.398493min 18.00000025% 30.00000050% 45.00000075% 60.000000max 60.000000Name: int_age, dtype: float64
6. Converting Income To Numeric
input
print(data['How much total combined money did all members of your HOUSEHOLD earn last year?'].value_counts())
output
$25,000 to $49,999 180Prefer not to answer 136$50,000 to $74,999 135$75,000 to $99,999 133$100,000 to $124,999 111$200,000 and up 80$10,000 to $24,999 68$0 to $9,999 66$125,000 to $149,999 49$150,000 to $174,999 40$175,000 to $199,999 27Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64
input
def income_to_int(income_str):
if pd.isnull(income_str): # Use the isnull() function to check if the value is null. If it is, return None.
return None
income_str = income_str.split(' ')[0] # Split the string on the space character (), and extract the first item of the resulting list.
if income_str == 'Prefer':
return None
income_str = income_str.replace('$', '')
income_str = income_str.replace(',', '')
return int(income_str)
data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(income_to_int)
print(data['int_income'].describe())
output
count 889.000000mean 74077.615298std 59360.742902min 0.00000025% 25000.00000050% 50000.00000075% 100000.000000max 200000.000000Name: int_income, dtype: float64
7. Correlating Travel Distance And Income
input
print(data[data['int_income'] < 150000]['How far will you travel for Thanksgiving?'].value_counts())
print('--------------------------------------------------')
print(data[data['int_income'] > 150000]['How far will you travel for Thanksgiving?'].value_counts())
output
Thanksgiving is happening at my home--I won't travel at all 281Thanksgiving is local--it will take place in the town I live in 203Thanksgiving is out of town but not too far--it's a drive of a few hours or less 150Thanksgiving is out of town and far away--I have to drive several hours or fly 55Name: How far will you travel for Thanksgiving?, dtype: int64--------------------------------------------------Thanksgiving is happening at my home--I won't travel at all 49Thanksgiving is local--it will take place in the town I live in 25Thanksgiving is out of town but not too far--it's a drive of a few hours or less 16Thanksgiving is out of town and far away--I have to drive several hours or fly 12Name: How far will you travel for Thanksgiving?, dtype: int64
8. Linking Friendship And Age
input
data.pivot_table(
index = "Have you ever tried to meet up with hometown friends on Thanksgiving night?",
columns = 'Have you ever attended a "Friendsgiving?"',
values = 'int_age'
)
output
[画像アップロード中...(1)]#####input
data.pivot_table(
index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?',
columns = 'Have you ever attended a "Friendsgiving?"',
values = 'int_income'
)
output
[画像アップロード中...(2)]