【发布时间】:2020-11-05 18:17:35
【问题描述】:
我想使用 Java 语言阅读https://www.instagram.com/mobonews/?__a=1。以下网址的源代码等于:
{"logging_page_id":"profilePage_1410389643","show_suggested_profiles":false,"show_follow_dialog":false,"graphql":{"user":{"biography":"\u200f\u200e\u0645\u0627\u062c\u0631\u0627\u062c\u0648\u06cc\u06cc\u200c\u0647\u0627\u06cc \u0645\u0646 \u062f\u0631 \u062
但是下面的代码会返回这个:
<!DOCTYPE html><html lang="en" class="no-js not-logged-in client-root"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <title>Login • Instagram</title> <meta name="robots" content="noimageindex, noarchive"> <meta name="apple-mobile-web-app-status-bar-style" content="default"> <meta name="mobile-web-app-capable" content="yes"> <meta name="theme-color" content="#ffffff"> <meta id="viewport" name="viewport"
这是我使用的代码:
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
编辑:
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import org.apache.commons.io.IOUtils;
public class TestReadurlInsta {
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws IOException {
URL u = new URL("https://www.instagram.com/mobonews/?__a=1");
URLConnection con = u.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
}
编辑 2:
好像不知什么原因进入了instagram的登录页面:
<!DOCTYPE html> <html lang="en" class="no-js not-logged-in client-root"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <title> Login • Instagram </title>
过去我在同一台机器上运行相同的代码,一切正常,但突然间我陷入了这个问题。
编辑 3:
我从在线 IDE 运行相同的代码并收到以下异常。似乎拒绝建立连接,正如@Holger 所说,Instagram 可能会阻止访问该资源:
Exception in thread "main" java.net.UnknownHostException: www.instagram.com
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
at java.net.URL.openStream(URL.java:1045)
at HelloWorld.main(HelloWorld.java:14)
但是有什么解决办法吗?
【问题讨论】:
-
我运行了你的代码,它返回了你想要的输入。检查您是否传递了正确的字符串或继续调试您的代码。
-
@Raz 我确实再次检查但仍然没有运气
-
但是你如何传递url?您可以在调试程序时附加更多代码部分或添加一些数据吗?
-
@Raz 请看我的编辑
-
Instagram 是否发布了有关重复自动访问此类提要资源的政策?对于它的普通网页,我在一些访问后也得到了一个强制登录页面,你说,你之前在同一台机器上运行了相同的代码。所以这不会那么不寻常(并解释了为什么所有其他用户在第一次尝试 URL 时都无法重现该问题)。
标签: java json url stream httpurlconnection