【发布时间】:2016-02-02 06:28:42
【问题描述】:
我有以下 Java 代码来解析网站代码:
URL url = new URL(urlToParse);
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
urlToParse 作为参数传递给此函数,等于“http://www.omegatiming.com/file/download/?id=00010F0200FFFFFFFFFFFFFFFFFFFF03”。
代码来自here。
输出是 Gibberish - 充满了问号和未知字符。
我尝试在 openConnection() 行之后添加这 5 行。
con.setRequestMethod("GET");
con.setDoOutput(true);
con.setReadTimeout(2000);
con.setChunkedStreamingMode(0);
con.connect();
从提供的解决方案 here,但后来我得到了这个异常:
线程“main”java.io.FileNotFoundException 中的异常:http://www.omegatiming.com/file/download/?id=00010F0200FFFFFFFFFFFFFFFFFFFF03
在 sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1835)
在 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) 来自行 InputStream is =con.getInputStream();
将此链接复制到浏览器会将我定向到该网站,因此不可能是该网站无效,但调用 con.getresposeCode() 返回 404。
当尝试从 getErrorStream() 获取错误时,它会打印:
<!DOCTYPE html>
<html>
<head>
<title>The resource cannot be found.</title>
<meta name="viewport" content="width=device-width" />
<style>
body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;}
p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}
b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}
H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }
H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }
pre {font-family:"Consolas","Lucida Console",Monospace;font-size:11pt;margin:0;padding:0.5em;line-height:14pt}
.marker {font-weight: bold; color: black;text-decoration: none;}
.version {color: gray;}
.error {margin-bottom: 10px;}
.expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }
@media screen and (max-width: 639px) {
pre { width: 440px; overflow: auto; white-space: pre-wrap; word-wrap: break-word; }
}
@media screen and (max-width: 479px) {
pre { width: 280px; }
}
</style>
</head>
<body bgcolor="white">
<span><H1>Server Error in '/' Application.<hr width=100% size=1 color=silver></H1>
<h2> <i>The resource cannot be found.</i> </h2></span>
<font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">
<b> Description: </b>HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable. Please review the following URL and make sure that it is spelled correctly.
<br><br>
<b> Requested URL: </b>/file/download/<br><br>
<hr width=100% size=1 color=silver>
<b>Version Information:</b> Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.34248
</font>
</body>
HttpException: A public action method 'download' was not found on controller 'SwissTiming.DocMgmt.DMSWeb.Controllers.FileController'.
at System.Web.Mvc.Controller.HandleUnknownAction(String actionName)
at System.Web.Mvc.Controller.<BeginExecuteCore>b__1d(IAsyncResult asyncResult, ExecuteCoreState innerState)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResultBase`1.End()
at System.Web.Mvc.Controller.EndExecuteCore(IAsyncResult asyncResult)
at System.Web.Mvc.Controller.<BeginExecute>b__15(IAsyncResult asyncResult, Controller controller)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResultBase`1.End()
at System.Web.Mvc.Controller.EndExecute(IAsyncResult asyncResult)
at System.Web.Mvc.Controller.System.Web.Mvc.Async.IAsyncController.EndExecute(IAsyncResult asyncResult)
at System.Web.Mvc.MvcHandler.<BeginProcessRequest>b__5(IAsyncResult asyncResult, ProcessRequestState innerState)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResultBase`1.End()
at System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult)
at System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result)
at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
--><!--
This error page might contain sensitive information because ASP.NET is configured to show verbose error messages using <customErrors mode="Off"/>. Consider using <customErrors mode="On"/> or <customErrors mode="RemoteOnly"/> in production environments.-->
这基本上就是我卡住的地方,根本无法理解问题所在。我什至不知道 ASP.NET 是从哪里来的。
绕过未解决问题的其他尝试:
1.添加
httpConnection.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible )");
httpConnection.setRequestProperty("Accept","/");,
按照建议here。还尝试按照建议的here 使用来自this 的userAgent。
仍然在 getInputStream() 中得到 FileNotFoundException。
2.添加
* System.setProperty("http.agent", "");*
如上所述here.
3. 回到最初的问题(打印 Gibberish)- 我尝试以这种方式更改对 InputStreamReader 的调用:
new InputStreamReader(new URL("www.website.com").openStream(), "UTF- 8") 正如评论 here 中提到的那样,但它没有改变任何东西。
4.添加行:
con.setRequestMethod("POST");
con.setDoInput(true);
仍然收到 fileNotFoundException。
我很困惑。
我什至不确定我是否有编码问题(因为在尝试通过向连接添加东西来解决之前,没有例外,“只是”错误的输出)。
或者我的连接有其他问题,我无法从中获取输入(如果是这样,这个特定网站有什么特别之处,因为引导我访问这个网站的网站,例如http://www.omegatiming.com/Competition?id=00010F0200FFFFFFFFFFFFFFFFFFFFFF&sport=AQ&year=2015,可以在没有的情况下解析一个问题)。
[[这里][1]:Using Java to pull data from a webpage?
[这里][2]:Trying to read from a URL(in Java) produces gibberish on certain occaisions
[这里][3]:URLConnection FileNotFoundException for non-standard HTTP port sources
[这里][4]:Setting "User-Agent" parameters for URLConnection for querying Google from a Java application
[这里][5]:Setting user agent of a java URLConnection
[这里][6]:Trying to read from a URL(in Java) produces gibberish on certain occaisions
[这个][1]:http://www.whatsmyuseragent.com/
【问题讨论】:
标签: asp.net-mvc character-encoding html-parsing inputstream filenotfoundexception