提取html代码的特定部分答案

【问题标题】：extract specific part of html code提取html代码的特定部分
【发布时间】：2013-01-18 15:12:50
【问题描述】：

我正在做我的第一个 Android 应用程序，我必须获取一个 html 页面的代码。

其实我是这样做的：

    private class NetworkOperation extends AsyncTask<Void, Void, String > {
    protected String doInBackground(Void... params) {
        try {
            URL oracle = new URL("http://www.nationalleague.ch/NL/fr/");
            URLConnection yc = oracle.openConnection();
            BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
            String inputLine;
            String s1 = "";
            while ((inputLine = in.readLine()) != null)
                s1 = s1 + inputLine;
            in.close();


            //return
            return s1;
        }
        catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

但问题是它需要太多时间。如何以第 200 行到第 300 行的 HTML 为例？

对不起我的英语不好：$

【问题讨论】：

@user1965878：好多了
酷，这对包括你在内的所有人都有帮助！

标签： java html extract

【解决方案1】：

最好使用 readLine() 代替 read(char[] cbuf, int off, int len)。另一种肮脏的方式

int i =0;
while(while ((inputLine = in.readLine()) != null)
i++;
if(i>200 || i<300 )
DO SOMETHING
in.close();)

【讨论】：

【解决方案2】：

您通过 HTTP 获取 HTML 文档。 HTTP 通常依赖于 TCP。所以......你不能只是“跳过线”！服务器将始终尝试向您发送您感兴趣的部分之前的所有数据，并且您的通信方必须确认收到此类数据。

【讨论】：

【解决方案3】：

不要逐行阅读[使用read(char[] cbuf, int off, int len)]
不要连接字符串[使用StringBuilder]

打开缓冲阅读器（就像你已经做的那样）：

    URL oracle = new URL("http://www.nationalleague.ch/NL/fr/");
    BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));

不是逐行读取，而是读取char[]（我会使用大约 8192 的大小）而不是使用StringBuilder 附加所有读取的chars。

阅读 HTML 源代码接缝的特定行有点冒险，因为 HTML 页面源代码的格式可能会改变。

【讨论】：