python asyncio協程非同期爬虫類
2379 ワード
pythonのasyncio非同期協程関連のライブラリを実践して、豆弁映画top 250を登って、自分で調べながら試して、最も基本的な使い方を書きましょう.
実行結果:douban_synch:3.096079111099243 douban_asynch:0.32878708839416504
同期の3秒、非同期の0.3秒、速度が10倍に上がるので、効果はやはりいいです.
参照先:
https://docs.python.org/3/library/asyncio-task.html#running-tasks-concurrently
https://docs.aiohttp.org/en/stable/client_quickstart.html
https://blog.csdn.net/SL_World/article/details/86633611
https://morvanzhou.github.io/tutorials/data-manipulation/scraping/4-02-asyncio/
import time
import asyncio
from functools import wraps
import requests
import aiohttp
from lxml import etree
base_url = 'https://movie.douban.com/top250?start={}&filter='
headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"}
#
def count_time(func):
@wraps(func)
def wrapper(*arg, **kwargs):
s_time = time.time()
res = func(*arg, **kwargs)
e_time = time.time()
c_time = e_time - s_time
print('%s:%s' % (func.__name__, c_time))
return res
return wrapper
def download_one(url):
page_source = requests.get(url, headers=headers).text
html = etree.HTML(page_source)
title_list = html.xpath('//ol[@class="grid_view"]//div[@class="hd"]/a/span[position()=1]/text()')
print(title_list)
@count_time
def douban_synch():
for i in range(10):
url = base_url.format(i * 25)
download_one(url)
async def download_one_synch(url, session):
async with session.get(url) as response:
page_source = await response.text()
html = etree.HTML(page_source)
title_list = html.xpath('//ol[@class="grid_view"]//div[@class="hd"]/a/span[position()=1]/text()')
print(title_list)
#@count_time
async def download_all():
# session, session,
async with aiohttp.ClientSession() as session:
tasks = [(download_one_synch(base_url.format(i), session)) for i in range(10)]
# gather
await asyncio.gather(*tasks)
@count_time
def douban_asynch():
# , run
asyncio.run(download_all())
if __name__ == '__main__':
douban_synch()
douban_asynch()
実行結果:douban_synch:3.096079111099243 douban_asynch:0.32878708839416504
同期の3秒、非同期の0.3秒、速度が10倍に上がるので、効果はやはりいいです.
参照先:
https://docs.python.org/3/library/asyncio-task.html#running-tasks-concurrently
https://docs.aiohttp.org/en/stable/client_quickstart.html
https://blog.csdn.net/SL_World/article/details/86633611
https://morvanzhou.github.io/tutorials/data-manipulation/scraping/4-02-asyncio/