postはシミュレーションに静的htmlページへのアクセスを要求し、結果をpython 3に保存する.7

7454 ワード

python

1つのtxtドキュメントの各行に1つの検索語が保存されています.1つのサイトに検索語を入力し、必要な結果を保存する必要があります.

シミュレーションpostは問題を要求しhtml

を保存する

BeautifulSoupを介してhtmlに必要なclassまたはidを

に検索

readline txtドキュメントの各行の検索語を読み出し、前の2ステップのコード

を統合する.

クエリーした結果を

に保存します.
headersはfiddlerで取得できます

import urllib.request
import urllib.parse
from bs4 import BeautifulSoup

file = open('    .txt', 'w', encoding='utf-8')
with open('question.txt', 'r', encoding='utf-8') as question_file:
    while True:
        lines = question_file.readline()
        if not lines:
            break
        url = 'http://127.0.0.1:8000/   '
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36'
                                 ' (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
                   'Referer': 'http://47.93.34.71:2021/',
                   'Accept': 'text/html,application/xhtml+xml,application/xml;'
                   		  'q=0.9,image/webp,image/apng,*/*;q=0.8',
                    'Accept-Encoding': 'gzip, deflate',
                    'Accept-Language': 'zh-CN,zh;q=0.9'
                   }
        postdata = {'desc_q_by_user': lines, 'csrfmiddlewaretoken': 'I3SeW9DO0l289Kvtc2wke1v4zwcr6Rxj'}
        data = urllib.parse.urlencode(postdata).encode(encoding='utf-8')
        request = urllib.request.Request(url, data=data, headers=headers)
        response = urllib.request.urlopen(request)
        soup_string = BeautifulSoup(response, 'html.parser')
        question_related = soup_string.find(attrs={'class': 't1'})
        file.write(lines+'
')
        for question in question_related:
            file.write(str.strip(question))
            print(str.strip(question))
        file.write('



')

file.close()

GlidedSky爬虫類サイト練習基礎1

Socket.ioを思いっきり勘違いしていた