【发布时间】:2013-05-27 03:00:34
【问题描述】:
我正在编写一个简单的 https 客户端,它将通过 https 拉下网页的 html。我可以很好地连接到网页,但是我下拉的 html 是乱码。
public String GetWebPageHTTPS(String URI){
BufferedReader read;
URL inputURI;
String line;
String renderedPage = "";
try{
inputURI = new URL(URI);
HttpsURLConnection connect;
connect = (HttpsURLConnection)inputURI.openConnection();
connect.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401");
read = new BufferedReader (new InputStreamReader(connect.getInputStream()));
while ((line = read.readLine()) != null)
renderedPage += line;
read.close();
}
catch (MalformedURLException e){
e.printStackTrace();
}
catch (IOException e){
e.printStackTrace();
}
return renderedPage;
}
当我传递一个类似https://kat.ph/ 的字符串时,会返回大约 10,000 个字符的乱码
编辑 这是我修改后的自签名证书代码,但我仍然得到加密流:
public String GetWebPageHTTPS(String URI){
TrustManager[] trustAllCerts = new TrustManager[] {
new X509TrustManager() {
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}
public void checkClientTrusted(
java.security.cert.X509Certificate[] certs, String authType) {
}
public void checkServerTrusted(
java.security.cert.X509Certificate[] certs, String authType) {
}
}
};
try {
SSLContext sc = SSLContext.getInstance("SSL");
sc.init(null, trustAllCerts, new java.security.SecureRandom());
HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
} catch (GeneralSecurityException e) {
}
try {
System.out.println("URI: " + URI);
URL url = new URL(URI);
} catch (MalformedURLException e) {
}
BufferedReader read;
URL inputURI;
String line;
String renderedPage = "";
try{
inputURI = new URL(URI);
HttpsURLConnection connect;
connect = (HttpsURLConnection)inputURI.openConnection();
read = new BufferedReader (new InputStreamReader(connect.getInputStream()));
while ((line = read.readLine()) != null)
renderedPage += line;
read.close();
}
catch (MalformedURLException e){
e.printStackTrace();
}
catch (IOException e){
e.printStackTrace();
}
return renderedPage;
}
【问题讨论】:
-
我认为这是因为网站的加密内容。尝试使用不同的“https”网站并仔细检查。