python 3でurlopen解析中文url符号化エラー
5670 ワード
これはipythonでテストした結果です。
バックグラウンドに提出された中国語のurlを見て、最後に百分コードモードに変えて提出しました。ですから、私たちは中国語を百分コードしてから、encodeはasciiになります。
python 3では、このコードの関数を見つけることができます。
In [24]: x
Out[24]: 'http://127.0.0.1:8000/xxx/?id=a45ex0bad3c9&game= '
In [25]: urlopen(x)
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-25-441e6d61f53c> in <module>()
----> 1 urlopen(x)
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault)
154 else:
155 opener = _opener
--> 156 return opener.open(url, data, timeout)
157
158 def install_opener(opener):
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in open(self, fullurl, data, timeout)
467 req = meth(req)
468
--> 469 response = self._open(req, data)
470
471 # post-process response
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in _open(self, req, data)
485 protocol = req.type
486 result = self._call_chain(self.handle_open, protocol, protocol +
--> 487 '_open', req)
488 if result:
489 return result
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
445 for handler in handlers:
446 func = getattr(handler, meth_name)
--> 447 result = func(*args)
448 if result is not None:
449 return result
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in http_open(self, req)
1272
1273 def http_open(self, req):
-> 1274 return self.do_open(http.client.HTTPConnection, req)
1275
1276 http_request = AbstractHTTPHandler.do_request_
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1246
1247 try:
-> 1248 h.request(req.get_method(), req.selector, req.data, headers)
1249 except socket.error as err: # timeout error
1250 h.close()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py in request(self, method, url, body, headers)
1063 def request(self, method, url, body=None, headers={}):
1064 """Send a complete request to the server."""
-> 1065 self._send_request(method, url, body, headers)
1066
1067 def _set_content_length(self, body):
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py in _send_request(self, method, url, body, headers)
1091 skips['skip_accept_encoding'] = 1
1092
-> 1093 self.putrequest(method, url, **skips)
1094
1095 if body is not None and ('content-length' not in header_names):
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py in putrequest(self, method, url, skip_host, skip_accept_encoding)
955
956 # Non-ASCII characters should have been eliminated earlier
--> 957 self._output(request.encode('ascii'))
958
959 if self._http_vsn == 11:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 46-48: ordinal not in range(128)
中国語文字列がエンコードできないため、asciiコードになります。バックグラウンドに提出された中国語のurlを見て、最後に百分コードモードに変えて提出しました。ですから、私たちは中国語を百分コードしてから、encodeはasciiになります。
python 3では、このコードの関数を見つけることができます。
In [36]: from urllib.request import quote
In [37]: quote(' ')
Out[37]: '%E4%BD%A0%E5%A5%BD'