Simple HTML DOM Parserを使ったスクレイピング

7481 ワード

PHP scraping PHP テキストリンク

Simple HTML DOM parserのインストール

ここ(http://sourceforge.net/projects/simplehtmldom/files/)から
simple_html_dom.phpをダウンロードしてスクリプトと同フォルダ内に保存する。

<?php
include "simple_html_dom.php";
$html = file_get_html('http://www.google.com/');
foreach($html->find('img') as $element){
       echo $element->src . "<br>";
}

Simple HTML DOM parserの使い方

http://simplehtmldom.sourceforge.net/manual.htm

1.URLからDOMを作成

$html = file_get_html(‘http://www.google.com/');

2.DOMから特定要素の抽出

// img要素のソースURLの抽出
foreach($html->find(‘img’) as $element){
    echo $element->src.’<br>’;
}

// a要素のhref属性の抽出
foreach($html->find(‘a’) as $element){
    echo $element->href.’<br>’;
}


//属性による抽出対象の指定
foreach($html->find(‘a[title=top]’) as $element){
    echo $element->href.’<br>’;
}

//classによる抽出対象の指定
$es = $html->find(‘table.hello td’);



//階層の指定
$es = $html->find(‘table td[align=center]’);


//ネスト
foreach ($html->find(‘div[id=new]’ as $div1){
    foreach($div1->find(‘p[title]’ as $p1){
        echo $p1->plaintext;
    }
}

3.DOMに属性の設定

// img要素の拡張子をjpgからpngに変更
foreach($html->find(‘img’) as $img){
    $img->src = str_replace(“jpg”,”png”,$img->src);
}

Author And Source

この問題について(Simple HTML DOM Parserを使ったスクレイピング), 我々は、より多くの情報をここで見つけました https://qiita.com/chkk525@github/items/3d3fba394514fa2c4529

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .

Ubuntu10.04 RabbitVCSをインストールする

zzuli OJ 1036:ある年のある月は何日ありますか