解析 URL 并检索信息答案

【问题标题】：Parse an URL and retrieve information解析 URL 并检索信息
【发布时间】：2015-11-10 04:00:24
【问题描述】：

我需要提取 Google Play 应用的类别。例如，Facebook 属于“社交”类别。

所以我需要从link 中提取社交信息。我可以在下面的代码中获取名为“result”的字符串中的 HTML 内容。但我找不到包含类别名称的标签。当我检查元素而不是代码时，我可以查看类别名称。如何获取上述 URL 的完整 html 内容，代码中的 URL 没有完整的 HTML 内容。类别名称在 html,head,Script,body,div,“类别名称”。

当我阅读完整的 HTML 响应时，我只得到以下标记元素：<html>、<head>、<script>，但我没有得到 <body> 元素及其内容。为什么页面正文内容没有返回？

以下代码输出查询页面的 HTML 响应。

String url = "https://play.google.com/store/apps/details?id=com.kongregate.mobile.fly.google&hl=en";
InputStream inputStream = null;
String result = "";

try {

    // create HttpClient
    HttpClient httpclient = new DefaultHttpClient();

    // make GET request to the given URL
    HttpResponse httpResponse = httpclient.execute(new HttpGet(url));
    EntityUtils.toString(httpResponse.getEntity());
    inputStream = httpResponse.getEntity().getContent();

    // convert InputStream to String
    if (inputStream != null) {
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));
        String line = "";

        while((line = bufferedReader.readLine()) != null) {
            result += line;
        }
    }
    // ...
} catch(...) {...}

【问题讨论】：

在网络浏览器中输入 URL 会发生什么？

标签： java

【解决方案1】：

也许这有帮助，代码将整个网站作为文档返回：

org.jsoup.nodes.Document html = null;
try {
    html = Jsoup.connect(source).get();
} catch (final IOException e) {
    LOG.error(e.getMessage(), e);
}
LOG.info(html);

使用Jsoup

我没有找到您的“类别名称”节点，但也许您会再次找到；）您可以像这样搜索您的文档：

html.select("#Category Name");

more examples

【讨论】：