Beautifulsoup の select_one の使い方

4464 ワード

scraping Python3 beautifulsoup4 scraping テキストリンク

次のページを参考にしました。
Python Webスクレイピングテクニック集「取得できない値は無い」JavaScript対応@追記あり6/12
ここで示されたサンプルに対して、
requests を使い、python3 に対応しました。

get_nikkei.py

#! /usr/bin/python
#
#   get_nikkei.py
#
#                   Jul/13/2018
# ------------------------------------------------------------------------
import requests
import sys
from bs4 import BeautifulSoup

sys.stderr.write("*** start ***\n")
# アクセスするURL
url = "https://www.nikkei.com/markets/kabu/"
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0"}

# URLにアクセスする 戻り値にはアクセスした結果やHTMLなどが入ったinstanceが帰ってきます
response = requests.get(url=url, headers=headers)
html = response.content

# instanceからHTMLを取り出して、BeautifulSoupで扱えるようにパースします
soup = BeautifulSoup(html, "html.parser")

# CSSセレクターを使って指定した場所のtextを表示します

selector = "#CONTENTS_MARROW > div.mk-top_stock_average.cmn-clearfix > div.cmn-clearfix > div.mkc-guidepost > div.mkc-prices > span.mkc-stock_prices"

print(soup.select_one(selector))
print(soup.select_one(selector).text)
print(soup.select_one(selector).string)

sys.stderr.write("*** end ***\n")
# ------------------------------------------------------------------------

実行結果

$ ./get_nikkei.py 
*** start ***
<span class="mkc-stock_prices">28,317.83</span>
28,317.83
28,317.83
*** end ***

日経平均（円）が取得できています。

次のバージョンで確認しました。

$ python --version
Python 3.9.5

Author And Source

この問題について(Beautifulsoup の select_one の使い方), 我々は、より多くの情報をここで見つけました https://qiita.com/ekzemplaro/items/6c8ded6f2a819a2c7818

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .