Python Scraping
目標
- サイト
- eBay.com
- 検索ワード
- Dragon Ball
- 検索条件
- 売り切れ
- 取得データ
- 商品名
- 状態
- 価格
- 送料
- 1 ページ目のみ
- 出力先
- eBayScraping_(日付).csv
ソースコード
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import csv
import pandas as pd
import datetime
driver = webdriver.Chrome(executable_path="/Users/name/home/work/eBay/Python/chromedriver")
url = ('https://www.ebay.com/sch/i.html?_from=R40&_nkw=dragon+ball&_sacat=0&rt=nc&LH_Sold=1&LH_Complete=1')
driver.get(url)
driver.implicitly_wait(10)
names = []
statuses = []
prices = []
shippings=[]
items = driver.find_elements_by_class_name('s-item__info.clearfix')
for item in items:
# get Name
name = item.find_element_by_class_name('s-item__title').text
name = name.replace("NEW LISTING", "")
names.append(name)
# get Status
try:
status = item.find_element_by_class_name('SECONDARY_INFO').text
statuses.append(status)
except:
statuses.append(" ")
# get Price
price = item.find_element_by_class_name('s-item__price').text
price = price.replace("JPY ","")
prices.append(price)
# get ShippingCost
try:
shipping = item.find_element_by_class_name('s-item__logisticsCost').text
shipping = shipping.replace("+JPY ","").replace(" shipping","").replace("Free International Shipping","0")
shippings.append(shipping)
except:
shippings.append(" ")
#print(shippings)
df = pd.DataFrame()
df['name'] = names
df['status'] = statuses
df['price'] = prices
df['shippingcost'] = shippings
csv_date = datetime.datetime.today().strftime("%Y%m%d")
csv_file_name = "eBayScraping_" + csv_date + ".csv"
df.to_csv(csv_file_name, index = False)
driver.quit()
出力結果
- eBay.com
- Dragon Ball
- 売り切れ
- 商品名
- 状態
- 価格
- 送料
- 1 ページ目のみ
- eBayScraping_(日付).csv
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import csv
import pandas as pd
import datetime
driver = webdriver.Chrome(executable_path="/Users/name/home/work/eBay/Python/chromedriver")
url = ('https://www.ebay.com/sch/i.html?_from=R40&_nkw=dragon+ball&_sacat=0&rt=nc&LH_Sold=1&LH_Complete=1')
driver.get(url)
driver.implicitly_wait(10)
names = []
statuses = []
prices = []
shippings=[]
items = driver.find_elements_by_class_name('s-item__info.clearfix')
for item in items:
# get Name
name = item.find_element_by_class_name('s-item__title').text
name = name.replace("NEW LISTING", "")
names.append(name)
# get Status
try:
status = item.find_element_by_class_name('SECONDARY_INFO').text
statuses.append(status)
except:
statuses.append(" ")
# get Price
price = item.find_element_by_class_name('s-item__price').text
price = price.replace("JPY ","")
prices.append(price)
# get ShippingCost
try:
shipping = item.find_element_by_class_name('s-item__logisticsCost').text
shipping = shipping.replace("+JPY ","").replace(" shipping","").replace("Free International Shipping","0")
shippings.append(shipping)
except:
shippings.append(" ")
#print(shippings)
df = pd.DataFrame()
df['name'] = names
df['status'] = statuses
df['price'] = prices
df['shippingcost'] = shippings
csv_date = datetime.datetime.today().strftime("%Y%m%d")
csv_file_name = "eBayScraping_" + csv_date + ".csv"
df.to_csv(csv_file_name, index = False)
driver.quit()
出力結果
問題点
- 以下の 2 つの項目を明確に区別できない
- Free Shipping
- Free International Shipping
- 以下の項目を取得する際、かなりの時間がかかる
- 送料
改善・発展
- 送料の 2 つの項目について、分類方法を見直す
- 複数ページのデータを取得する
- 頻出単語を抽出する
- Free Shipping
- Free International Shipping
- 送料
- 送料の 2 つの項目について、分類方法を見直す
- 複数ページのデータを取得する
- 頻出単語を抽出する
Author And Source
この問題について(Python Scraping), 我々は、より多くの情報をここで見つけました https://qiita.com/kganddl/items/cbf6dd838b25d6a51e8f著者帰属:元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。
Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .