Scrapy でWebページのリンクを抽出する

2392 ワード

scraping Python3 Scrapy scraping テキストリンク

Beautifulsoup で行っているのと同じことを、Scrapy でやってみました。
Beautifulsoup でWebページのリンクを抽出する

プログラム

scrapy01.py

# -*- coding: utf-8 -*-
#
#   scrapy01.py
#
#
#                   Jul/11/2018
#
import scrapy

class FirstScrapySpider(scrapy.Spider):
    name = 'scrapy01'
    allowed_domains = ['ekzemplaro.org']
    start_urls = ['https://ekzemplaro.org']

    def parse(self, response):
        for unit in response.css('a::attr(href)').extract():
            print(unit)
#

実行結果

$ scrapy runspider --loglevel=WARN scrapy01.py
en/
ekzemplaro/
audio_books/
librivox/
./audio/
http://www.hi-ho.ne.jp/linux
./raspberry/
./storytelling/
./crowdsourcing/
https://twitter.com/ekzemplaro
https://github.com/ekzemplaro/
qiita/
./test_dir/

Arch Linux での Scrapy のインストール方法

sudo pacman -S scrapy

Author And Source

この問題について(Scrapy でWebページのリンクを抽出する), 我々は、より多くの情報をここで見つけました https://qiita.com/ekzemplaro/items/486d9fef782310991214

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .

railsにおける時間表示フォーマット【回転】

C言語基本apiノート