chrome-headless使用例(Python)-百度を開く

6686 ワード

Googleがヘッドレスブラウザをサポートしてからしばらく経ち、PhantomJSに代わって爬虫類プログラム猿の愛着となっている.
以下はコードサンプルで、猿たちの参考に供します.
一、参考
二、環境
  • MacOS == 10.12.6 (16G29)
  • Chrome==61.0.3163.100(正式版)(64ビット)
  • selenium == 3.6.0
  • Python == 2.7.14
  • ChromeDriver == 2.33.506106

  • 三、手順
    3.1 chromedriverの起動
    $ chromedriver 
    Starting ChromeDriver 2.33.506106 (8a06c39c4582fbfbab6966dbb1c38a9173bfb1a2) on port 9515
    Only local connections are allowed.

    3.2コード
    #!/usr/bin/env python
    # -*- coding:utf-8 -*-
    
    from selenium import webdriver
    from selenium.webdriver.common.proxy import Proxy, ProxyType
    
    
    options = webdriver.ChromeOptions()
    
    # tell selenium to use the dev channel version of chrome
    # NOTE: only do this if you have a good reason to
    # options.binary_location = '/usr/bin/google-chrome-unstable'  # path to google Chrome bin
    
    options.add_argument('headless')
    
    # set the window size
    options.add_argument('window-size=1200x600')
    
    # with proxy
    proxy_url = 'ip:port'
    proxy = Proxy({
        'proxyType': ProxyType.MANUAL,
        'httpProxy': proxy_url,
        'sslProxy': proxy_url  #          CA  
    })
    
    desired_capabilities = options.to_capabilities()
    proxy.add_to_capabilities(desired_capabilities)
    
    # initialize the driver
    # driver = webdriver.Chrome(chrome_options=options)
    driver = webdriver.Chrome(chrome_options=options, desired_capabilities=desired_capabilities)
    
    
    driver.get('https://www.baidu.com')
    
    # wait up to 10 seconds for the elements to become available
    driver.implicitly_wait(10)
    driver.get_screenshot_as_file('baidu_index.png')
    
    # use css selectors to grab the search inputs
    text = driver.find_element_by_css_selector('#kw')
    search = driver.find_element_by_css_selector('#su')
    
    text.send_keys('headless chrome')
    
    driver.get_screenshot_as_file('baidu_main-page.png')
    
    
    # search
    search.click()
    driver.get_screenshot_as_file('search-result.png')
    
    results = driver.find_elements_by_xpath('//div[@class="result c-container "]')
    
    for result in results:
        res = result.find_element_by_css_selector('a')
        title = res.text
        link = res.get_attribute('href')
        print 'Title: %s 
    Link: %s
    '
    % (title, link)

    出力:
    Title: Headless Chrome   -    
    Link: http://www.baidu.com/link?url=VxjEiEVtl5fZX-AhWqc-AuoRP9Xy_uXIG1cqs43UbiSacUTqH0j7lDYsnYUpOXrC
    
    Title:      ——Chrome Headless   -      - SegmentFault 
    Link: http://www.baidu.com/link?url=CDylpWK8vIuZ8p60MUi_3KlThi-zxPw3bSr5AGPg2QsmTfoathDvfZGnEV2IZejOjw0cF5N4o0exxX1cqf9R-q
    
    Title:   Headless Chrome        -      
    Link: http://www.baidu.com/link?url=IyI0z_PmzMzH6mrw0-YndTwp7WiKmhVF-_ZuXMuPnfyF2MEaBB0BCit0BXpcrfsX
    
    Title:   Headless Chrome - WEB   -      
    Link: http://www.baidu.com/link?url=sw2qqcurzmwTu9n0orvk_LKIvMmiaWlCxlPtvuyOgsKzzxaV3Car6zbRRdpZumDX
    
    Title:   Headless Chrome -      
    Link: http://www.baidu.com/link?url=6nOyOVHD5AoBjugMoJTxDXhw5EBSYpF9fQMQfbu8WgCf0E_Wbalq6Hbj-KqBGwgm
    
    Title:   Headless Chrome  Selenium   - CSDN    Link: http://www.baidu.com/link?url=WSKRO7xRvGfbRIUKKnULwE0FeYNvyjLnEtiHWj108kxsQ7MUd1zPNXLph7WSkYXkiRLh8B3DBYSW8GNdI8wGBq Title: Web    Headless Chrome     -      Link: http://www.baidu.com/link?url=jZletPMcLn7z_liopLphjzknRWshmbsrCUr0K25MY7pbk5smOObJahHbvUrHz_2qnZdEUzcEm8IK0QriythwZa Title:  headless     selenium -       - SegmentFault  Link: http://www.baidu.com/link?url=jbe9GNh-2nDbd1KiMkh64EwQD6JvBXdQ_ndtkl-z_Hy2mn8GGnftg2BDnMn3x1rUMwkdwkwuo7dqMZMnVAtHGq Title: linux    Headless Chrome - bambooleaf - CSDN    Link: http://www.baidu.com/link?url=jruVom6bFUCrLluHA4aN8ITgq3HlBlR3rYNYC36VlqIBjuFRocIewfKVvw6pleX3v1l2joOaO3-f9NxrVGjUdq