Python requestsモジュールの学習

19572 ワード

学習環境:python 2.7 windows 10一、requests getリクエスト1.getリクエストを取得する
r = requests.get("http://www.hactcm.edu.cn"

2.Webページのテキストを取得
print r.text 
    

<html><head><title>河南中医药大学中文网title>
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7" />
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><link rel="stylesheet" type="text/css" href="style/style.css">
<style>

3.文字化けしが見える.requestsで取得したWebエンコードを印刷
print r.encoding

出力結果は
ISO-8859-1

4.正しい符号化が得られていないことを知ることができ、手動で符号化を指定することができる
r.encoding='utf-8'

5.Webページのテキストの再取得
print r.text

入力先のWebページのテキスト

<html><head><title>          title>
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7" />
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><link rel="stylesheet" type="text/css" href="style/style.css">
<style>

符号化が正しい6.パラメータ付きgetリクエストを指定する
url='http://www.sinopharm-henan.com/front/index/section1'
pars={"sectionId":'2'}#  
r = requests.get(url,params=pars)
print r.url

出力の結果は
http://www.sinopharm-henan.com/front/index/section1?sectionId=2

7.headヘッダは、例えば、
url='http://www.sinopharm-henan.com/front/index/section1'
pars={"sectionId":'2'}#  
header={"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0",\
         "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",\
         "Content-Type":"application/x-www-form-urlencoded"
         }
r = requests.get(url,params=pars)
print r.url

8.レスポンスコードの取得
print r.status_code

出力結果
200

具体的にもっと多くのパラメータはw 3 cあるいはhttpのこの本を図解することを参照することができます9.get関数のコードを少し深く見てみましょう
def get(url, params=None, **kwargs):
    """Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response ` object
    :rtype: requests.Response
    """

    kwargs.setdefault('allow_redirects', True)
    return request('get', url, params=params, **kwargs)

実際に呼び出されたrequest関数です
def request(method, url, **kwargs):
:param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': ('filename', fileobj)}``) for multipart encoding upload.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How long to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) ` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response ` object
    :rtype: requests.Response

    ....  
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

requestの関数はsessionのrequest,session.requestを呼び出し、session.sendメソッドを呼び出します.具体的にはソースコードを自分で見ることができます.
二、post要求1.post要求を得る
url='http://www.sinopharm-henan.com/front/index/section1'
data={"sectionId":'2'}
header={"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0",\
         "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",\
         "Content-Type":"application/x-www-form-urlencoded"
         }
r = requests.post(url, data=data, headers=header)
print r.url

2.cookiesへの転送
url='http://www.sinopharm-henan.com/front/index/section1'
cookie={'sdf':'123'}
data={"sectionId":'2'}
header={"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0",\ "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",\ "Content-Type":"application/x-www-form-urlencoded" }
r = requests.post(url, data=data, headers=header,cookies=cookie)
print r.url

パケットをキャプチャしてPython requests模块的学习_第1张图片3.r.textとr.contentの違いを検証しますまずcontent関数のソースコードを見てみましょう
def content(self):
        """Content of the response, in bytes."""

        if self._content is False:
            # Read the contents.
            try:
                if self._content_consumed:
                    raise RuntimeError(
                        'The content for this response was already consumed')

                if self.status_code == 0:
                    self._content = None
                else:
                    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()

            except AttributeError:
                self._content = None

        self._content_consumed = True
        # don't need to release the connection; that's been handled by urllib3
        # since we exhausted the data.
        return self._content

    @property

text関数のソースコードをもう一度見てみましょう
 def text(self):
        """Content of the response, in unicode.

        If Response.encoding is None, encoding will be guessed using
        ``chardet``.

        The encoding of the response content is determined based solely on HTTP
        headers, following RFC 2616 to the letter. If you can take advantage of
        non-HTTP knowledge to make a better guess at the encoding, you should
        set ``r.encoding`` appropriately before accessing this property.
        """

        # Try charset from content-type
        content = None
        encoding = self.encoding

        if not self.content:
            return str('')

        # Fallback to auto-detected encoding.
        if self.encoding is None:
            encoding = self.apparent_encoding

        # Decode unicode from given encoding.
        try:
            content = str(self.content, encoding, errors='replace')
        except (LookupError, TypeError):
            # A LookupError is raised if the encoding was not found which could
            # indicate a misspelling or similar mistake.
            #
            # A TypeError can be raised if encoding is None
            #
            # So we try blindly encoding.
            content = str(self.content, errors='replace')

        return content

戻り値のタイプも見てみましょう
content        
print type(r.content) #
<type 'str'>

text        
print type(r.text)
<type 'unicode'>

ソースコードの注釈もはっきり言って、contentが返すbytes配列が変換された文字列です.textは符号化されたUnicode型のデータ