Pandas基礎1.1｜Python学習ノート

22805 ワード

【練習一】米ドラマ「権力のゲーム」のシナリオに関するデータセットがありますが、以下の問題を解決してください.(a)すべてのデータの中で、全部で何人の人物が現れましたか.

import pandas as pd
import numpy as np

df = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Game_of_Thrones_Script.csv')
df.head()

Release Date
Season
Episode
Episode Title
Name
Sentence
0
2011/4/17
Season 1
Episode 1
Winter is Coming
waymar royce
What do you expect? They're savages. One lot s...
1
2011/4/17
Season 1
Episode 1
Winter is Coming
will
I've never seen wildlings do a thing like this...
2
2011/4/17
Season 1
Episode 1
Winter is Coming
waymar royce
How close did you get?
3
2011/4/17
Season 1
Episode 1
Winter is Coming
will
Close as any man would.
4
2011/4/17
Season 1
Episode 1
Winter is Coming
gared
We should head back to the wall.

df.describe()

Release Date
Season
Episode
Episode Title
Name
Sentence
count
23911
23911
23911
23911
23911
23911
unique
73
8
10
73
564
22300
top
2017/8/13
Season 2
Episode 5
Eastwatch
tyrion lannister
No.
freq
505
3914
3083
505
1760
103

df['Name'].nunique() #  Name

(b)1つのセルを単純に1つのセルと見なすセル数で、誰が一番多くのことを言ったのですか.

df['Name'].value_counts()

tyrion lannister          1760
jon snow                  1133
daenerys targaryen        1048
cersei lannister          1005
jaime lannister            945
                          ... 
janos slunt                  1
steward of house stark       1
archmaester                  1
night watch stable boy       1
bryndel                      1
Name: Name, Length: 564, dtype: int64

df['Name'].value_counts().index[0]

'tyrion lannister'

(c)単語数で、誰が一番多くの単語を話しましたか.

#apply(lambda x:len(x.split())) apply              
df_words = df.assign(Words=df['Sentence'].apply(lambda x:len(x.split()))).sort_values(by='Name')
df_words.head()

Release Date
Season
Episode
Episode Title
Name
Sentence
Words
276
2011/4/17
Season 1
Episode 1
Winter is Coming
a voice
It's Maester Luwin, my lord.
5
3012
2011/6/19
Season 1
Episode 10
Fire and Blood
addam marbrand
ls it true about Stannis and Renly?
7
3017
2011/6/19
Season 1
Episode 10
Fire and Blood
addam marbrand
Kevan Lannister
2
13610
2014/6/8
Season 4
Episode 9
The Watchers on the Wall
aemon
And what is it that couldn't wait until mornin...
10
13614
2014/6/8
Season 4
Episode 9
The Watchers on the Wall
aemon
Oh, no need. I know my way around this library...
48

df.assign(Words=df['Sentence'].apply(lambda x:len(x.split())))

Release Date
Season
Episode
Episode Title
Name
Sentence
Words
0
2011/4/17
Season 1
Episode 1
Winter is Coming
waymar royce
What do you expect? They're savages. One lot s...
25
1
2011/4/17
Season 1
Episode 1
Winter is Coming
will
I've never seen wildlings do a thing like this...
21
2
2011/4/17
Season 1
Episode 1
Winter is Coming
waymar royce
How close did you get?
5
3
2011/4/17
Season 1
Episode 1
Winter is Coming
will
Close as any man would.
5
4
2011/4/17
Season 1
Episode 1
Winter is Coming
gared
We should head back to the wall.
7
...
...
...
...
...
...
...
...
23906
2019/5/19
Season 8
Episode 6
The Iron Throne
brienne
I think we can all agree that ships take prece...
12
23907
2019/5/19
Season 8
Episode 6
The Iron Throne
bronn
I think that's a very presumptuous statement.
7
23908
2019/5/19
Season 8
Episode 6
The Iron Throne
tyrion lannister
I once brought a jackass and a honeycomb into ...
11
23909
2019/5/19
Season 8
Episode 6
The Iron Throne
man
The Queen in the North!
5
23910
2019/5/19
Season 8
Episode 6
The Iron Throne
all
The Queen in the North! The Queen in the North...
25
23911 rows × 7 columns

#    ：       
# will  。    ，   will     words    ，      。
#L_count[-1]          
L_count = []
N_words = list(zip(df_words['Name'],df_words['Words']))
for i in N_words:
    if i == N_words[0]:
        L_count.append(i[1])
        last = i[0]
    else:
        L_count.append(L_count[-1]+i[1] if i[0]==last else i[1])
        last = i[0]
df_words['Count']=L_count
df_words['Name'][df_words['Count'].idxmax()]

'tyrion lannister'

【練習2】コビーに関するシュートデータセットがあるので、(a)どのaction_typeとcombined_shot_typeの組み合わせが一番多いかという問題を解決してください.

df_1 = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Kobe_data.csv',index_col='shot_id')
df_1.head()

action_type
combined_shot_type
game_event_id
game_id
lat
loc_x
loc_y
lon
minutes_remaining
period
...
shot_made_flag
shot_type
shot_zone_area
shot_zone_basic
shot_zone_range
team_id
team_name
game_date
matchup
opponent
shot_id
1
Jump Shot
Jump Shot
10
20000012
33.9723
167
72
-118.1028
10
1
...
NaN
2PT Field Goal
Right Side(R)
Mid-Range
16-24 ft.
1610612747
Los Angeles Lakers
2000/10/31
LAL @ POR
POR
2
Jump Shot
Jump Shot
12
20000012
34.0443
-157
0
-118.4268
10
1
...
0.0
2PT Field Goal
Left Side(L)
Mid-Range
8-16 ft.
1610612747
Los Angeles Lakers
2000/10/31
LAL @ POR
POR
3
Jump Shot
Jump Shot
35
20000012
33.9093
-101
135
-118.3708
7
1
...
1.0
2PT Field Goal
Left Side Center(LC)
Mid-Range
16-24 ft.
1610612747
Los Angeles Lakers
2000/10/31
LAL @ POR
POR
4
Jump Shot
Jump Shot
43
20000012
33.8693
138
175
-118.1318
6
1
...
0.0
2PT Field Goal
Right Side Center(RC)
Mid-Range
16-24 ft.
1610612747
Los Angeles Lakers
2000/10/31
LAL @ POR
POR
5
Driving Dunk Shot
Dunk
155
20000012
34.0443
0
0
-118.2698
6
2
...
1.0
2PT Field Goal
Center(C)
Restricted Area
Less Than 8 ft.
1610612747
Los Angeles Lakers
2000/10/31
LAL @ POR
POR
5 rows × 24 columns

gamegroup = list(zip(df_1['action_type'],df_1['combined_shot_type']))
gamegroup

[('Jump Shot', 'Jump Shot'),
 ('Jump Shot', 'Jump Shot'),
...]

pd.Series(gamegroup).value_counts().index[0]

('Jump Shot', 'Jump Shot')

(b)記録されたすべてのgame_idの中で、最も遭遇したopponentは1本ですか?

df_1.iloc[:,[3,-1]]#

game_id
opponent
shot_id
1
20000012
POR
2
20000012
POR
3
20000012
POR
4
20000012
POR
5
20000012
POR
...
...
...
30693
49900088
IND
30694
49900088
IND
30695
49900088
IND
30696
49900088
IND
30697
49900088
IND
30697 rows × 2 columns

gamegroup_1 = list(zip(df_1['game_id'],df_1['opponent']))
pd.Series(list(list(zip(*(pd.Series(gamegroup_1).unique()).tolist()))[1])).value_counts().index[0]

'SAS'

zip(df_1['game_id'],df_1['opponent'])
list(zip(df_1['game_id'],df_1['opponent']))

[(20000012, 'POR'),
 (20000012, 'POR'),
 ...]

#unique  ：  。    list ，    pd.series 。
a = pd.Series(list(zip(df_1['game_id'],df_1['opponent']))).unique()
a

array([(20000012, 'POR'), (20000019, 'UTA'), (20000047, 'VAN'), ...,
       (49900086, 'IND'), (49900087, 'IND'), (49900088, 'IND')],
      dtype=object)

a = pd.Series(list(zip(df_1['game_id'],df_1['opponent']))).unique().tolist()
a #

[(20000012, 'POR'),
 (20000019, 'UTA'),
 (20000047, 'VAN'),
 (20000049, 'LAC'),
 (20000058, 'HOU'),
 ...]

#  ——           ，       。        。
#     ：        、        。
n = list(zip(*a))
n

pd.Series(n) #

0    (20000012, 20000019, 20000047, 20000049, 20000...
1    (POR, UTA, VAN, LAC, HOU, SAS, HOU, DEN, SAC, ...
dtype: object

PythonでMeCabを使おうとして躓いたので覚書

「R」をbrewでOSX 10.9 Mavericksにインストールしても動かない時のTips