HTTPCLIENT Webコンテンツのキャプチャ

1606 ワード

httpclientでWebページ情報をキャプチャします.
 
public class SnippetHtml{
	
	/**
	 *  url html
	 * @param url  url
	 */
	public String parseHtml (String url) {
		//  HttpClient 
		HttpClient client=new HttpClient();
		// 
		HttpMethod method = null;
		String html = "";
		try {
			method = new GetMethod(url);
			client.executeMethod(method);
			html = method.getResponseBodyAsString();// 
		} catch (HttpException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally {
			// 
			if (method != null) {
				method.releaseConnection();  
			}
		}
		return html ;
	}
	
	/**
	 *  html bean
	 * @param html  html
	 * @return List 
	 */
	public void getHtmlEarthBean (String html) {
		if (html != null && !"".equals(html)) {
			Document doc = Jsoup.parse(html);   
			Elements linksElements = doc.getElementsByAttributeValue("class", "news-table");// class  news-table
			for (Element ele : linksElements) {
				Elements linksElements1 = ele.getElementsByTag("td");// td 
				for (Element ele1 : linksElements1) {
					System.out.println(ele1.text());
				}
			}   
		}
	}
}

jarパッケージをダウンロードする必要があります:commons-httpclient-3.1.JArとjsoup-1.6.1.JArをキャプチャと解析として使用します.