【pyQuery分析例】スポーツネットチャンピオン連盟の試合成績を分析する

8045 ワード

宛先:http://www.espncricinfo.com/champions-league-twenty20-2012/engine/match/574265.html
liz@nb-liz:~$ script pyquery.log2
Script started, file is pyquery.log2
liz@nb-liz:~$ ipython
Python 2.7.3 (default, Jan  2 2013, 16:53:07) 
Type "copyright", "credits" or "license" for more information.

IPython 1.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from pyquery  import PyQuery as pq #   

In [2]: d=pq(url="http://www.espncricinfo.com/champions-league-twenty20-2012/engine/match/574265.html") #   

In [6]: d('#inningsBat1')
Out[6]: [<table#inningsBat1.inningsTable>]

In [8]: d('#inningsBat1').html()
。。。

In [11]: d('#inningsBat1').find('.playerName').html()
Out[11]: u'<a href="/champions-league-twenty20-2012/content/player/237095.html" target="" title="view the player profile for Murali Vijay" class="playerName">M Vijay</a>  '

In [14]: d('#inningsBat1').eq(1).find('.playerName').html()

In [18]: t=d('#inningsBat1')

In [19]: t
Out[19]: [<table#inningsBat1.inningsTable>]

In [20]: t.children()
Out[20]: [<tr>, <tr.inningsHead>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr>, <tr.inningsRow>]

In [22]: t('tr.inningsRow')
Out[22]: [<tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>]

In [23]: trs=t('tr.inningsRow')

In [24]: trs
Out[24]: [<tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>]

In [25]: trs.eq(0).html()
Out[25]: u'<td class="inningsIcon" onclick="ToggleRowVisibility(\'inningsBat1\',3); "><img src="http://i.imgci.com/espncricinfo/col_ps.gif" width="7" height="7" name="inningsBat1.1" class="inningsIcon" alt="View dismissal" title="View dismissal" id="inningsBat1.1" /></td>
<td width="192" class="playerName"><a href="/champions-league-twenty20-2012/content/player/237095.html" target="" title="view the player profile for Murali Vijay" class="playerName">M Vijay</a> </td>
<td width="259" class="battingDismissal"> b Ojha </td>
<td class="battingRuns">39</td>
<td class="battingDetails">36</td>
<td class="battingDetails">25</td>
<td class="battingDetails">5</td>
<td class="battingDetails">2</td>
<td class="battingDetails">156.00</td>
' In [27]: trs.eq(0).find('.playerName').html() Out[27]: u'<a href="/champions-league-twenty20-2012/content/player/237095.html" target="" title="view the player profile for Murali Vijay" class="playerName">M Vijay</a> ' In [28]: n1=trs.eq(0).find('.playerName') In [29]: n1.find('a').html() Out[29]: 'M Vijay' In [34]: trs[0] Out[34]: <Element tr at 0x968e44c> In [40]: trs Out[40]: [<tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>] In [42]: for i in trs: print d(i).find('.playerName').find('a').text() ....: M Vijay F du Plessis SK Raina MS Dhoni S Badrinath RA Jadeja JA Morkel WP Saha R Ashwin BW Hilfenhaus None None In [44]: for i in trs: print d(i).find('.playerName').find('a').text() print d(i).find('.battingRuns').text() print d(i).find('.battingDismissal').text()# for...in pyquery (d) ....: print '
' ....: M Vijay 39 b Ojha F du Plessis 52 c Sharma b Malinga SK Raina 8 c Johnson b Malinga MS Dhoni 35 c Smith b Malinga S Badrinath 2 c †Karthik b Smith RA Jadeja 12 run out (†Karthik/Johnson) JA Morkel 0 c Tendulkar b Malinga WP Saha 5 c †Karthik b Malinga R Ashwin 13 not out BW Hilfenhaus 0 not out None 7 (b 1, w 6) None 173 (8 wickets; 20 overs; 100 mins) In [45]: for i in trs: print 'Player Name:',d(i).find('.playerName').find('a').text() print 'Batting Runs:',d(i).find('.battingRuns').text()
    print 'Batting Dismissal:',d(i).find('.battingDismissal').text()# for...in pyquery (d) print '
' ....: Player Name: M Vijay Batting Runs: 39 Batting Dismissal: b Ojha Player Name: F du Plessis Batting Runs: 52 Batting Dismissal: c Sharma b Malinga Player Name: SK Raina Batting Runs: 8 Batting Dismissal: c Johnson b Malinga Player Name: MS Dhoni Batting Runs: 35 Batting Dismissal: c Smith b Malinga Player Name: S Badrinath Batting Runs: 2 Batting Dismissal: c †Karthik b Smith Player Name: RA Jadeja Batting Runs: 12 Batting Dismissal: run out (†Karthik/Johnson) Player Name: JA Morkel Batting Runs: 0 Batting Dismissal: c Tendulkar b Malinga Player Name: WP Saha Batting Runs: 5 Batting Dismissal: c †Karthik b Malinga Player Name: R Ashwin Batting Runs: 13 Batting Dismissal: not out Player Name: BW Hilfenhaus Batting Runs: 0 Batting Dismissal: not out Player Name: None Batting Runs: 7 Batting Dismissal: (b 1, w 6) Player Name: None Batting Runs: 173 Batting Dismissal: (8 wickets; 20 overs; 100 mins) In [46]: Do you really want to exit ([y]/n)? y liz@nb-liz:~$ exit exit Script done, file is pyquery.log2 liz@nb-liz:~$