Selenium+Pythonでブラウザの自動操作

9679 ワード

BeautifulSoup Python Selenium Python テキストリンク

Selenium とは？

ブラウザの操作を自動化するツールです。

https://docs.seleniumhq.org/
OS・ブラウザごとのWebDriverと合わせて使用します。

実行環境

オフィス用のPC上で実行することを想定して、今回はWindowsを使用します。
Windows10 Pro 1903

Python3の準備(Anaconda)

Pythonでも良いんですが、今回はAnacondaをインストールします。

https://www.anaconda.com/distribution/
Python 3.7 version をダウンロードしてインストール。

Anaconda Promptを開いて、パッケージをインストールします。

conda install beautifulsoup4
conda install selenium

WebDriverの配置

https://sites.google.com/a/chromium.org/chromedriver/downloads
Current ReleasesからChromeに対応したものをダウンロードし、パスが通るところ（環境変数Pathで設定された場所）に起きます。

Seleniumの動作の流れ

Seleniumの基本的な流れは以下です。
- Chromeを開く
- ページ内の要素を見つける
- 要素を操作する

今回は例として、Chromeでブラウザを操作して、Googleで猫画像を検索し、その画像が掲載されたHTMLファイルを作成して開きます。

今回のコード

import time 
import re
import csv
from selenium import webdriver
from bs4 import BeautifulSoup
import os

# get current path
path = os.getcwd()

# open chrome
driver = webdriver.Chrome()
driver.get('https://www.google.com')

time.sleep(1)

# input search form 
neko = driver.find_element_by_name("q")
neko.send_keys("猫");
neko.submit();

time.sleep(1)

# move image search page
nekoimg = driver.find_element_by_xpath('//*[@id="hdtb-msb-vis"]/div[2]/a')
nekoimg.click()

time.sleep(1)

# get page source
html = driver.page_source
bs = BeautifulSoup(html, "html.parser")

# select neko image 
rows = bs.findAll("img",src=re.compile("data:image/jpeg"))

with open("neko.html", "w", encoding='utf-8') as file:
# write html header
    file.write('<!DOCTYPE html>\n')
    file.write('<html>\n')
    file.write('<body>\n')

# write html body 
    for row in rows:
        if row.get("src") != None:
            str = '<img src=\"' + row.get("src") + '">\n'
            file.write(str)
# write html footer
    file.write('</body>\n')
    file.write('</html>\n')

# close browser
driver.get('file:///' + path + '/neko.html')

解説

まずはwebdriver.Chrome()でサイトを開き、driver.getでサイトを開きます。

  driver = webdriver.Chrome()
  driver.get('https://www.google.com')

Googleのサイトの検索ボックスに「猫」を入れて検索します。driver.find_element_by_nameで要素（今回だと検索ボックスのnameの値であるq）を指定します。

send_keysでフォームに文字列を入力し、submitで検索します。

neko = driver.find_element_by_name("q")
neko.send_keys("猫");
neko.submit();

検索画面が開いたら「画像」をクリックします。要素はxpathで指定します。

nekoimg = driver.find_element_by_xpath('//*[@id="hdtb-msb-vis"]/div[2]/a')
nekoimg.click()

driver.page_sourceでソースコードを取得し、BeautifulSoupでパースします。

html = driver.page_source
bs = BeautifulSoup(html, "html.parser")

findAllでimgタグのうち、data:image/jpegを含む文字列を検索します。

rows = bs.findAll("img",src=re.compile("data:image/jpeg"))

取得したimgタグをそのまま貼り付けます。

for row in rows:
    if row.get("src") != None:
         str = '<img src=\"' + row.get("src") + '">\n'
         file.write(str)

作成したファイルを開きます。

driver.get('file:///' + path + '/neko.html')

Author And Source

この問題について(Selenium+Pythonでブラウザの自動操作), 我々は、より多くの情報をここで見つけました https://qiita.com/icebird009/items/beff460839d797b4d171

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .

Python文字列の前後のスペースを除去するいくつかの方法

文字列の小さなテスト