【python】価格.comからレンズ情報をスクレイピング

17997 ワード

pythonで価格.comからレンズ情報をスクレイピングします。
価格情報とレンズのスペック全てをcsvで取得します。
取得する項目は以下のものです。

カメラ名,順位,最低価格,クレジット最低価格,価格URL,対応マウント,レンズタイプ,フォーカス,詳細レンズタイプ,フルサイズ対応,APS-C専用,レンズ構成,絞り羽根枚数,焦点距離,最短撮影距離,最大撮影倍率,開放F値,画角,手ブレ補正機構,防滴,防塵,広角,望遠,マクロ,高倍率,フィッシュアイ(魚眼),アオリ撮影,ミラー,大口径,パンケーキ,フィルター径,最大径x長さ,重量

renzu.py

from bs4 import BeautifulSoup
import urllib.request
import re
import requests
import time
import datetime

#対象のサイトURL
#価格.com レンズランキング
url = "https://kakaku.com/camera/camera-lens/ranking_1050/"
page_count = 1
linklist = []
#ランキングの全ページから各レンズのページを取得
while True:
    category_res = requests.get(url + "?page=" + str(page_count)).text
    soup = BeautifulSoup(category_res, 'html.parser') # BeautifulSoupの初期化
    print ("{} ページ目".format(page_count))
    for elm in soup.find_all("a"):
        if 'href' in elm.attrs:
            link_url = elm.attrs['href']
            if "https://kakaku.com/item/" in link_url:
                linklist.append(link_url)
 #               print(link_url)
#次ページがなくなるまでのフラグ
    a_next_tag= soup.find_all("li", {"class": "next"})
    if a_next_tag:
#    if page_count < 1:
        page_count += 1
        continue
    break
#重複を削除
linklist = sorted(list(set(linklist)),key=linklist.index)
################################################################
#書き込みファイル名（取得日時）
now = datetime.datetime.now()
filename = "renzu"+now.strftime('%Y%m%d_%H%M%S') + '.csv'
f = open(filename,'a', encoding='cp932',errors='ignore')
f.write("カメラ名,順位,最低価格,クレジット最低価格,価格URL,")
#レンズのスペックのシートに移動
page_html = linklist[0] + "spec/#tab"
res = urllib.request.urlopen(page_html)
page_soup = BeautifulSoup(res, 'html.parser')
#レンズのスペックのテーブルを取得
table = page_soup.findAll("table", {"class":"tblBorderGray mTop15"})[0]
rows = table.findAll("tr")
index=-1
#レンズのスペックのテーブルの各見出しを書き込み
for row in rows:
    csvRow = []
    for cell in row.findAll('th'):
        index +=1
        if index==0:
            continue
        if index==17:
            continue
        if index==26:
            continue
        if index==29:
            continue        
        cell=cell.get_text()
        cell=re.sub(r"[\n\t\s]*", "", str(cell))        
        f.write(cell)
        f.write(",")         
f.write("\n")
#レンズの価格情報を書き込み
for page_url in linklist:
     page_html = page_url + "spec/#tab"
     res = urllib.request.urlopen(page_html)
     page_soup = BeautifulSoup(res, 'html.parser')
#必要な要素とclass名
     name =  page_soup.find("h2",itemprop="name").text
     try:
        rank =  page_soup.find("span",class_="rankNum").text
     except AttributeError:
         rank = ''
     try:
         low_price = page_soup.find("div", class_="priceWrap").find("span",class_="priceTxt").text
         low_price =low_price.replace(',', '')
     except AttributeError:
         low_price = ''
     try:
         cre_price = page_soup.find("div", class_="creditCard").find("span",class_="priceTxt").text
         cre_price =cre_price.replace(',', '')
     except AttributeError:
         cre_price = ''
     print(rank)
     print(low_price)
     f.write(name)
     f.write(",")
     f.write(rank)
     f.write(",")
     f.write(low_price)
     f.write(",")
     f.write(cre_price)
     f.write(",")
     f.write(page_url)
     f.write(",")
#レンズの情報を書き込み
# テーブルを指定
     table = page_soup.findAll("table", {"class":"tblBorderGray mTop15"})[0]
     rows = table.findAll("tr")
#テーブルの書き込み
     for row in rows:
        csvRow = []
        for cell in row.findAll('td'):
            cell=cell.get_text()
            cell=re.sub(r"[\n\t\s]*", "", str(cell))
            f.write(cell)
            f.write(",")
     f.write("\n") 
f.close()

Author And Source

この問題について(【python】価格.comからレンズ情報をスクレイピング), 我々は、より多くの情報をここで見つけました https://qiita.com/saber72237/items/f72ef4a7187b9ec7ebe0

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .

leetcode面接問題16.17.連続数列(C++)

iOSノート-カスタムナビゲーションバーボタン