[python]エクスペリエンス1

31994 ワード

リアルタイムクエリースクロールアイテム
次のポータルサイトでは、出力は変更に伴って変更されます.
クローラーについて
爬虫類:爬虫類
私はウェブサイトに登ってデータを収集します.
Webデータ収集ソフト
crawling :
スクロールを使用してWebページからデータを抽出する動作
pythonのcrawlerを使用!
ブロックアセンブリキット
外部モジュールが必要です.
他人が作成した機能を使用するようにインストールします.

pip install requests // requests를 설치

外部モジュールの使用に必要

import requests

print(requests)

request 저장된 경로가 print 된다.

関数とモジュール
関数:重複タスクの実行
モジュール:一般的な関数を含むファイル
使用方法
get関数:応答値を返します~
get関数の使用
requests.get(url)の使用

import requests

print(requests.get)

결과: <function get at ~~~~~>
함수가 맞다는의미

GETリクエストを送信する機能.
リクエストには複数のタイプがあります
PUT, GET, POST, DELETE
GET
リクエスト/レスポンス
クライアント(リクエストの存在)/サーバ(レスポンスの存在)
リクエストと応答2
get関数:GETリクエストを送信する機能

requests.get(url) //서버의 주소

return : requests.response // 서버에서 응답 받은 값

실습1

import requests

url = "http://www.daum.net"
print(requests.get(url))

//결과 : Response [200] //요청이 성공적으로 이루어짐을 의미한다.

실습2

import requests

url = "http://www.daum.net"
response = requests.get(url)  // response 는 통

print(response.text)

//結果:htmlコードはすべて持参しました.
//その他

import requests

url = "http://www.daum.net"
response = requests.get(url)

print(response.text)

#print(response.url)

#print(response.content)

#print(response.encoding) // encoding 방법

#print(response.headers)

#print(response.json)

#print(response.links)

#print(response.ok)

#print(response.status_code)

Beautiful Soup
セパレータまたはモジュール名は使用できません
使用するモジュールはインストールする必要があります.
bs 4(モジュール)にあります.BeautifulSoup(機能)실습

import requests
from bs4 import BeautifulSoup

url = "http://www.daum.net/"
response = requests.get(url)
print(response.text) // 아래 값과 똑같이 나옴 

print(BeautifulSoup(response.text, 'html.parser')) //위 값과 똑같이 나옴.

! BeautifulSoupを使うとどんな違いがありますか?실습

import requests
from bs4 import BeautifulSoup

url = "http://www.daum.net/"
response = requests.get(url)
print(type(response.text))

print(type(BeautifulSoup(response.text, 'html.parser')))

//결과 
<class 'str'>
<class 'bs4.BeautifulSoup'>

タイプの違いを特定できます
結論:すべての文字列ブロックを削除し、BeautifulSoupというバケツに整理
このバケツの情報が十分に利用されていることを確認します.
BeautifulSoup
(データ、グループ化方法:集約された文字列を有意義なデータに変換)
htmlとxmlを配布します.
BeautifulSoup(response.text, 'html.parser')
BeautifulSoupのデータを使いましょう!실습

import requests
from bs4 import BeautifulSoup

url = "http://www.daum.net/"
response = requests.get(url)
#print(type(response.text))

soup = BeautifulSoup(response.text, 'html.parser')
//soup에 담긴 데이터 값을 사용해보자!

print(soup.title)

//결과
<title>Daum</title>

文字列が長すぎて500文字だけ出力したい

response.text[:500]

실습

import requests
from bs4 import BeautifulSoup

url = "https://www.daum.net/"
response = requests.get(url)
#print(response.text[:500])

soup = BeautifulSoup(response.text, 'html.parser')

print(soup.title)
print(soup.title.string)
print(soup.span) // 제일 상단에 위치한 span 값만 출력된다.
print(soup.findAll('span')) //모든 span 값이 출력된다.

-> 그런데 span 값이 출력이 안됨 왜 그럴까? (스터디 내용)

なぜspan値は出力できないのですか?

クライアントエンドツーエンド(REEX)<->サーバエンドツーエンド-ダイナミックPythonライブラリの使用(Seleniunなどの他の関数の使用)
△授業の内容は、今勉強しているページとは違います.

-クエリーヘッダ...?△今後の訪問者がタイトルの中の人であることを示し、訪問を容易にする.

span非同期受信-1番ソリューションは

と類似しています.
Pythonファイルの処理

from bs4 import BeautifulSoup
import requests

url = "http://www.daum.net/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

file = open("daum.html","w")
file.write(response.text)
file.close()

print(soup.title)
print(soup.title.string)
print(soup.span)
print(soup.findAll('span'))

=>htmlはファイルに含まれます.
ここで熱捜しをする
共通点:1.リアルタイム検索語はaトピックにあります.2.class=「link favorsch@」は共通です.

htmlにすべてのaタグをインポートするコードを記述しましょう.

「link favorsch」を教室にインポートし、

を表示します.

from bs4 import BeautifulSoup
import requests

url = "http://www.daum.net/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# file = open("daum.html","w")
# file.write(response.text)
# file.close()

# print(soup.title)
# print(soup.title.string)
# print(soup.span)
# print(soup.findAll('span'))

# html 문서에서 모든 a태그를 가져오는 코드
print(soup.findAll("a","link_favorsch"))

次の授業はリアルタイム検索語しかありません!抽出

出力がきれいだ

出力

出力に変換

の結果では、テキストのみが出力されます.

位

は、数ヶ月数日の熱検索出力

です.

日付出力がきれいです

from bs4 import BeautifulSoup
import requests
from datetime import datetime

url = "http://www.daum.net/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
rank = 1

results = soup.findAll('a','link_favorsch')

print(datetime.today().strftime("%Y년 %m월 %d일의 실시간 검색어 순위입니다.\n"))

for result in results:
    print(rank,"위 : ",result.get_text(),"\n")
    rank += 1

ファイルを開く
ファイルに出力
Open(ファイル、モード)

from bs4 import BeautifulSoup
import requests
from datetime import datetime

url = "http://www.daum.net/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
rank = 1

results = soup.findAll('a','link_favorsch')

search_rank_file = open("rankresult.txt","w") //"a" 로 바꾸면 작성된 파일 뒤에 추가 작성된다.

print(datetime.today().strftime("%Y년 %m월 %d일의 실시간 검색어 순위입니다.\n"))

for result in results:
    search_rank_file.write(str(rank)+"위:"+result.get_text()+"\n")
    print(rank,"위 : ",result.get_text(),"\n")
    rank += 1

NAVER銀剣スクロールゲームをする
ネイバーが急上昇20代標準検索語
NAVERホットサーチ共通点
span tag、item titleがあります.
ネイバーがロボットの接近を阻止したのかもしれません.
李京宇、私はロボットではありません.一人の使用者です.彼らに伝えなければならない.
タイトルを使用!

from bs4 import BeautifulSoup
import requests
from datetime import datetime

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = "https://datalab.naver.com/keyword/realtimeList.naver?age=20s"
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
rank = 1
# span - item_title
results = soup.findAll('span','item_title')

print(response.text)

search_rank_file = open("rankresult.txt","a")

print(datetime.today().strftime("%Y년 %m월 %d일의 실시간 검색어 순위입니다.\n"))

for result in results:
    search_rank_file.write(str(rank)+"위:"+result.get_text()+"\n")
    print(rank,"위 : ",result.get_text(),"\n")
    rank += 1

->次もheadersを使いましょうか?
今日の天気を飛ばす
APIキーの取得
https://home.openweathermap.org/users/sign_up
事前にログインしてAPIキーを取得します.
私のページで、->my APIキーをクリックし、APIキーをメモします.
天気情報を取得するプログラム
APIの理解
アプリケーションプログラミングインタフェース
インタフェースとは?
接続者とコンピュータ=キーボード、マウス=インタフェース
携帯電話==人と人をつなぐインターフェース
API=プログラムとプログラムを接続するためのインタフェース
クライアントのサーバに対する要求が多すぎます.
クライアントとサーバがシームレスにコミュニケーションできるようにする=API
スクロールは限られた情報しか得られません.
Openweathermap API==世界の天気情報を提供するAPI
OpenAPI=有料で利用できるAPI
APIキーの理解
でも芳名録を書いた人にしか使えない!
自分が誰だかを示すOpenApi Key
APIのサーバは鍵を提示して使用することができます!
APIリンクの作成
openweathermap.orgからapiをnavbarに入れ、
次のデータ収集リストで、「現在のWeather Data」の下部にあるAPI docに入ります.
Call current weather datafor one locationのapi呼び出しでは、一番上のコードが使用されます!
括弧の中の私たちは記入します.
呼び出し=apiは、要求を必要なapiアドレスに送信します.
f-string실습 코드

city = "Seoul"
apikey = "################################" 
#위에다가 발급받은 api key를 넣어준다,
api = f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid={apikey}"
//이 링크로 요청을 보낼 것임.
print(api)

api = f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid={apikey}"

? 前は汎用url、後はapiリクエストの情報パラメータ
リクエストはimport request실습 코드

import requests

city = "Seoul"
apikey = "################################"
api = f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid={apikey}"

result = requests.get(api)
print(result.text) 
//print(result)

印刷します

まずタイプを知る->strです.

-使用->json(内蔵)

JavaScript object notationの略
データ交換に使用するフォーマット.
外観は=>Dictionaryに似ています
json.loads(result.text)

import requests
import json

city = "Seoul"
apikey = "################################"
api = f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid={apikey}"

result = requests.get(api)
print(result.text)

data = json.loads(result.text) //json 형태로 넣어준다.

print(type(result.text)) 
print(type(data))


//결과
<class 'str'>
<class 'dict'>

この天気はどこの情報ですか?

「name」のキー値にはソウルがあります.

Wearterキー値の[0]リストに存在するmatinキー値

プライマリキー値tempのプライマリキー値

(これは、データをよく見ることを意味します.)

import requests
import json

city = "Seoul"
apikey = "################################"
api = f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid={apikey}"

result = requests.get(api)
print(result.text)

data = json.loads(result.text)

# print(type(result.text))
# print(type(data))

print(data["name"],"의 날씨입니다.")
print("날씨는 ",data["weather"][0]["main"],"입니다.")
print("현재 온도는 ",data["main"]["temp"],"입니다.")
print("하지만 체감 온도는 ",data["main"]["feels_like"],"입니다.")
# 최저 기온 : main - temp_min
print("최저 기온은 ",data["main"]["temp_min"],"입니다.")
print("최고 기온은 ",data["main"]["temp_max"],"입니다.")
# 습도 : main - humidity
print("습도는 ",data["main"]["humidity"],"입니다.")
# 기압 : main - pressure
print("기압은 ",data["main"]["pressure"],"입니다.")
# 풍향 : wind - deg
print("풍향은 ",data["wind"]["deg"],"입니다.")
# 풍속 : wind - speed
print("풍속은 ",data["wind"]["speed"],"입니다.")

결과
Seoul 의 날씨입니다.
날씨는  Clear 입니다.
현재 온도는  268.96 입니다.
하지만 체감 온도는  268.96 입니다.
최저 기온은  265.57 입니다.

Reference

この問題について([python]エクスペリエンス1), 我々は、より多くの情報をここで見つけました https://velog.io/@been_gam/python-심화-맛보기

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

mockito -同じ関数呼び出しに対して異なる値を返す

パラグラフで語を見つける方法