有没有更快的方法将网页从网络下载到字符串？答案

【问题标题】：Is there a faster way to download a page from the net to a string?有没有更快的方法将网页从网络下载到字符串？
【发布时间】：2011-03-04 07:44:31
【问题描述】：

我尝试了其他方法从 URL 下载信息，但需要一种更快的方法。我需要下载和解析大约 250 个单独的页面，并且希望该应用程序不会显得异常缓慢。这是我目前用于检索单个页面的代码，任何见解都会很棒。

try 
{
    URL myURL = new URL("http://www.google.com");
    URLConnection ucon = myURL.openConnection();
    InputStream inputStream = ucon.getInputStream();
    BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream);
    ByteArrayBuffer byteArrayBuffer = new ByteArrayBuffer(50);
    int current = 0;
    while ((current = bufferedInputStream.read()) != -1) {
        byteArrayBuffer.append((byte) current);
    }
    tempString = new String(byteArrayBuffer.toByteArray());

} 
catch (Exception e) 
{
    Log.i("Error",e.toString());
}

【问题讨论】：

250 页？您正在构建某种数据库吗？

标签： android url download

【解决方案1】：

如果请求发往同一台服务器，请尝试保持连接打开。另外，尽量避免在缓冲区中重新分配，并尽可能一次读取。


const int APPROX_MAX_PAGE_SIZE = 300;
try 
{
    URL myURL = new URL("http://www.google.com");
    URLConnection ucon = myURL.openConnection();
    ucon.setRequestHeader("Connection", "keep-alive") // (1)
    InputStream inputStream = ucon.getInputStream();
    BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream);
    ByteArrayBuffer byteArrayBuffer = new ByteArrayBuffer(APPROX_MAX_PAGE_SIZE); // (2)
    int current = 0;
    byte[] buf = new byte[APPROX_MAX_PAGE_SIZE];
    int read;
    do {
       read = bufferedInputStream.read(buf, 0, buf.length); // (3)
       if(read > 0) byteArrayBuffer.append(buf, 0, read);
    } while (read >= 0);
    tempString = new String(byteArrayBuffer.toByteArray());

} 
catch (Exception e) 
{
    Log.i("Error",e.toString());
}

设置 Keep-alive 标头（不确定是否需要，在 J2SE 上它也是一个可配置属性）
在缓冲区中分配“通常足够”的内容以避免重新分配。
一次读取多个字节

免责声明：这是“盲目”编写的，没有使用 Java 编译器。可能 setRequestHeader 仅在 HttpURLConnection 上可用（需要强制转换），或者某些参数错误，但如果是，请随时编辑。

【讨论】：

类似 ucon.setRequestHeader("Connection", "keep-alive") 连接重用在内部处理。另外，尝试分块读取缓冲区，而不是逐字节读取。
上面的代码只需稍作修改即可工作，我将把它与旧程序进行对比并让你知道。顺便说一句，我解析一页大约 75K 以获取 250 URI 来解析各个页面。
在 10 次测试后运行两组代码，这比我原来的速度快 35%。任何其他建议都会很棒。

【解决方案2】：

为什么不使用内置的 apache http 组件？

HttpClient httpClient = new DefaultHttpClient();
HttpGet request = new  HttpGet(uri);
HttpResponse response = httpClient.execute(request);

int status = response.getStatusLine().getStatusCode();

if (status != HttpStatus.SC_OK) {
    ByteArrayOutputStream ostream = new ByteArrayOutputStream();
    response.getEntity().writeTo(ostream);
}

【讨论】：

恕我直言，URLConnection 比直接与 HTTP 协议交互更高级。
HttpClient.execute(request) 上的此错误； “无法从 HttpClient 类型对非静态方法 execute(HttpUriRequest) 进行静态引用”
org.apache.http.client.HttpClient中的execute方法不是静态的。我更新了上面的示例以包含 HttpClient 的创建。
感谢您的代码更正，我可以使用它，但上面的 Krumelur 平均快了大约 1/3。我不知道这是否是由于将 bytearray 分配为全尺寸，我将尝试一些更改，看看它是如何进行的。不过，我确实喜欢它的简洁性，更具可读性和更容易理解。
尝试将 ostream 预分配到完整的 uri 大小，但没有成功提高速度，尽管这比我的原始版本快。任何其他想法都会很棒。

【解决方案3】：

使用池化的HTTPClient 并尝试一次发出 2 或 3 个请求。并尝试创建一个内存池以避免分配和 GC 停顿。

【讨论】：