SeleniumとBeautiful Soupのよく使うメソッド

9592 ワード

BeautifulSoup Python jQuery スクレイピング Selenium Python テキストリンク

SeleniumとBeautiful Soupのコーディングでいつも迷うので自分用にメモ。
jQueryも載せておく。

環境

Python 3.7.4
selenium 3.141.0
beautifulsoup4 4.8.0
requests 2.22.0

pip

pip install requests
pip install beautifulsoup4

Selenium (Python)

# 初期化
from selenium import webdriver
driver = webdriver.Chrome()
url = 'https://qiita.com/users'
driver.get(url)

# 一つのエレメント取得(css selector)
element1 = driver.find_element_by_css_selector('.UsersPage__header')
# テキストの取得
print(element1.text)

# 複数のエレメント取得(css selector)
elements = driver.find_elements_by_css_selector('.UsersPage__user')
for elem in elements:
    # 属性の取得
    href = elem.find_element_by_tag_name('a').get_attribute('href')
    print('{}<{}>'.format(elem.text, href))

driver.quit()

Beautiful Soup (Python)

# 初期化
from bs4 import BeautifulSoup
import requests
url = 'https://qiita.com/users'
resp = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'})
# resp.encoding = resp.apparent_encoding # 文字化けする場合は追加
html = resp.text
soup = BeautifulSoup(html, 'html.parser')

# 一つのエレメント取得(css selector)
element1 = soup.select_one('.UsersPage__header')
# テキストの取得
print(element1.get_text())

# 複数のエレメント取得(css selector)
elements = soup.select('.UsersPage__user')
for elem in elements:
    # 属性の取得
    href = elem.find('a').attrs['href']
    print('{}<{}>'.format(elem.get_text(), href))

jQuery (JavaScript)

// 初期化
location.href = "https://qiita.com/users";
var s=document.createElement("script");
s.setAttribute("src","https://code.jquery.com/jquery-2.2.4.min.js");
document.body.append(s);

// 一つのエレメント取得(css selector)
const $element1 = $(".UsersPage__header");
// テキストの取得
console.log($element1.text());

// 複数のエレメント取得(css selector)
const $elements = $(".UsersPage__user");
$elements.each(function(i,elem) {
  let $elem = $(elem);
  // 属性の取得
  let href = $elem.find("a").attr("href");
  console.log(`${$elem.text()}<${href}>`);
});

Author And Source

この問題について(SeleniumとBeautiful Soupのよく使うメソッド), 我々は、より多くの情報をここで見つけました https://qiita.com/goyae/items/ae5dcd1d94017c0db0f6

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .

InvocationException:GraphViz's executables not foundソリューション

LeetCodeテーマ--回文チェーンテーブル(python実装)