如何使用 Java 从网站复制 html div 的内容答案

【问题标题】：How to copy the contents of a html div from a website using Java如何使用 Java 从网站复制 html div 的内容
【发布时间】：2017-04-24 17:25:51
【问题描述】：

我正在尝试在 java 中编写一个函数，该函数基本上将从 url 的 div 复制和粘贴 html 代码。有问题的数据来自http://cdn.espn.com/sports/scores#completed，但是当使用 io 流复制到我的函数中时，数据是不可见的。当我单击检查和控制-f“完成的足球”时，数据本身是可见的，但我的代码根本没有检索到它。这是我使用的代码。

package project;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;


public class DownloadPage {

    public static void main(String[] args) throws IOException {

        // Make a URL to the web page
        URL url = new URL("http://cdn.espn.com/sports/scores#completed-soccer");

        // Get the input stream through URL Connection
        URLConnection con = url.openConnection();
        InputStream is =con.getInputStream();


        BufferedReader br = new BufferedReader(new InputStreamReader(is));

        String line = null;

        // read each line and write to System.out
        while ((line = br.readLine()) != null) {
            System.out.println(line);
        }
}

【问题讨论】：

标签： java html parsing web-scraping jsoup

【解决方案1】：

如果您无法通过正常的 HTTP 请求访问数据，则必须使用更复杂的库，例如带有 Webdriver 的 Selenium。

这个库允许您真正在网页中导航、执行 javascript 并检查所有元素。

您可以找到很多信息和指南。

【讨论】：

【解决方案2】：

试试这个代码

 public static void main(String[] args) throws IOException {
        URL url = new URL("http://cdn.espn.com/sports/scores#completed-soccer");
        HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
        try
        {
            InputStream in = url.openStream();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder result = new StringBuilder();
            String line;
            while((line = reader.readLine()) != null) {
                result.append(line);
            }
            System.out.println(result.toString());
        }
        finally
        {
            urlConnection.disconnect();
        }
    }

【讨论】：