Python-豆瓣爬虫登録
2118 ワード
どうやってrequestsを使って豆瓣を登録して内容を取得しますか?
コード:
s=requests.session()
r = s.post(loginUrl, data=formData, headers=headers
res=s.get("http://movie.douban.com/mine",cookies=r.cookies,headers=headers)
2.r.historyは、loginの後の302 statusを記録することができます。コード:
# -*- encoding:utf-8 -*-
##############################
__author__ = "KevinZhou"
__date__ = "2017/7/23"
###############################
import requests
from bs4 import BeautifulSoup
import urllib.request
import re
loginUrl = 'https://accounts.douban.com/login'
formData = {
"redir": "http://movie.douban.com/mine",
"form_email": "******",
"form_password": "******",
"login": u' ',
"source":"index_nav"
}
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
r = requests.post(loginUrl, data=formData, headers=headers)
page = r.text
print (r.url)
''''' '''
# bs4 captcha
soup = BeautifulSoup(page, "html.parser")
captchaAddr = soup.find('img', id='captcha_image')['src']
# captcha ID
# reCaptchaID = r'