【问题标题】:java SocketTimeoutExceptionjava SocketTimeoutException
【发布时间】:2018-02-18 20:17:48
【问题描述】:

我正在尝试从https://www.groupon.pl/deals/ga-hotel-alpin-17 站点读取标题(这是此特定站点特有的问题)

address = "https://www.groupon.pl/deals/ga-hotel-alpin-17";
URL url = new URL(address);
URLConnection httpcon = url.openConnection();
httpcon.setConnectTimeout(5000);
httpcon.setReadTimeout(5000);
httpcon.addRequestProperty("User-Agent", "Mozilla/4.0");
response = httpcon.getInputStream();
Scanner scanner = new Scanner(response);
String responseBody = scanner.useDelimiter("\\A").next();
String title = responseBody.substring(responseBody.toUpperCase().indexOf("<TITLE>") + 7, responseBody.toUpperCase().indexOf("</TITLE>"));

我收到 403 或 SocketTimeoutException:

java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
    at sun.security.ssl.InputRecord.read(InputRecord.java:503)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)

获取此站点没有问题,例如使用简单的wget 命令。

我怀疑服务器不希望被 Java 查询,但为什么设置用户代理没有帮助?还可以做些什么来假装真实的浏览器行为?有什么想法吗?

【问题讨论】:

  • 没有ReadTimeoutException这样的例外。阅读堆栈跟踪。您的读取超时时间太短。很明显。
  • 不完全...如果我不设置超时,那么我会等待太久,我尝试了 60 秒,但仍然是同样的问题...

标签: java user-agent urlconnection


【解决方案1】:

我找到了答案,下面的程序有效!成功的关键是准确观察浏览器发送和使用的请求标头。缺少“接受编码”标头

import sun.misc.IOUtils;
import java.io.*;
import java.net.URL;
import java.net.URLConnection;
import java.util.Scanner;

public class Program {


    public static void main(String[] args) throws IOException {
        System.out.println("Hello World!");

        String address = "https://www.groupon.pl/deals/ga-hotel-alpin-17";
        URL url = new URL(address);
        URLConnection httpcon = url.openConnection();
        httpcon.setConnectTimeout(5000);
        httpcon.setReadTimeout(5000);

//        httpcon.addRequestProperty("Host", "www.groupon.pl");
        httpcon.addRequestProperty("User-Agent", "Mozilla/5.0 (X11; Fedora; Lin… Gecko/20100101 Firefox/54.0");
//        httpcon.addRequestProperty("Accept", "text/html,application/xhtml+x…lication/xml;q=0.9,*/*;q=0.8");
//        httpcon.addRequestProperty("Accept-Language", "en-US,en;q=0.5");
        httpcon.addRequestProperty("Accept-Encoding", "utf-8");
//        httpcon.addRequestProperty("DNT", "1");
//        httpcon.addRequestProperty("Connection", "keep-alive");
//        httpcon.addRequestProperty("Upgrade-Insecure-Requests", "1");

        InputStream response = httpcon.getInputStream();



        Scanner scanner = new Scanner(response);
        String responseBody = scanner.useDelimiter("\\A").next();
        String title = responseBody.substring(responseBody.toUpperCase().indexOf("<TITLE>") + 7, responseBody.toUpperCase().indexOf("</TITLE>"));

        System.out.println("End!" + title);
    }
}

注释的标题不是必需的。

干杯! 卢卡斯

PS。很遗憾我在这个问题上得到-2...

【讨论】:

    猜你喜欢
    • 2011-06-04
    • 2012-05-16
    • 2021-05-16
    • 1970-01-01
    • 2013-01-16
    • 2018-11-11
    • 1970-01-01
    • 2021-11-30
    • 1970-01-01
    相关资源
    最近更新 更多