Python Scraping


目標

  • サイト
    • eBay.com
  • 検索ワード
    • Dragon Ball
  • 検索条件
    • 売り切れ
  • 取得データ
    • 商品名
    • 状態
    • 価格
    • 送料
    • 1 ページ目のみ
  • 出力先
    • eBayScraping_(日付).csv

ソースコード


from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import csv
import pandas as pd
import datetime


driver = webdriver.Chrome(executable_path="/Users/name/home/work/eBay/Python/chromedriver")

url  = ('https://www.ebay.com/sch/i.html?_from=R40&_nkw=dragon+ball&_sacat=0&rt=nc&LH_Sold=1&LH_Complete=1')
driver.get(url)
driver.implicitly_wait(10)

names = []
statuses = []
prices = []
shippings=[]

items = driver.find_elements_by_class_name('s-item__info.clearfix')

for item in items:
    # get Name
    name = item.find_element_by_class_name('s-item__title').text
    name = name.replace("NEW LISTING", "")
    names.append(name)
    # get Status
    try:
        status = item.find_element_by_class_name('SECONDARY_INFO').text
        statuses.append(status)
    except:
        statuses.append(" ")
    # get Price
    price = item.find_element_by_class_name('s-item__price').text
    price = price.replace("JPY ","")
    prices.append(price)

    # get ShippingCost
    try:
        shipping = item.find_element_by_class_name('s-item__logisticsCost').text
        shipping = shipping.replace("+JPY ","").replace(" shipping","").replace("Free International Shipping","0")
        shippings.append(shipping)
    except:
        shippings.append(" ")
    #print(shippings)

df = pd.DataFrame()
df['name'] = names
df['status'] = statuses
df['price'] = prices
df['shippingcost'] = shippings

csv_date = datetime.datetime.today().strftime("%Y%m%d")
csv_file_name = "eBayScraping_" + csv_date + ".csv"

df.to_csv(csv_file_name, index = False)

driver.quit()

出力結果

問題点

  • 以下の 2 つの項目を明確に区別できない
    • Free Shipping
    • Free International Shipping
  • 以下の項目を取得する際、かなりの時間がかかる
    • 送料

改善・発展

  • 送料の 2 つの項目について、分類方法を見直す
  • 複数ページのデータを取得する
  • 頻出単語を抽出する