Jsoup，解析html表格答案

【问题标题】：Jsoup, parse html tableJsoup，解析html表格
【发布时间】：2015-08-17 12:21:01
【问题描述】：

这可能是个愚蠢的问题，但我无法弄清楚。我正在尝试解析页面的 html 输出：http://meteo.uwb.edu.pl/

所以基本上我需要从表中提取值，从左侧（蓝色文本）作为键（标题），从右侧（棕色文本）作为值。另外，标题标签（“Aktualna pogoda/Weather conditions:”）

我的意图是从 html 输出中获取 html 表，然后解析它的行，但我无法弄清楚，因为 html 输出相当复杂。我从它开始：

doc = Jsoup.connect("http://meteo.uwb.edu.pl/").get();
Elements tables = doc.select("table");
for (Element row : table.select("tr"))
{
  Elements tds = row.select("td:not([rowspan])");
  System.out.println(tds.get(0).text() + "->" + tds.get(1).text());
}

但我的结果仍然是一团糟。你知道如何正确解析它吗？

【问题讨论】：

标签： java web-scraping html-table jsoup

【解决方案1】：

可以通过以下代码检索第一个表中的键数据：

doc.select("table").get(1).select("tbody").get(1).select("tr").get(1).select("td").get(0).select("b")

和价值：

doc.select("table").get(1).select("tbody").get(1).select("tr").get(1).select("td").get(1).select("b")

第二张桌子

doc.select("table").get(2).select("tbody").get(0).select("tr").get(1).select("td").get(0).select("b")

和

doc.select("table").get(2).select("tbody").get(0).select("tr").get(1).select("td").get(1).select("b")

【讨论】：

谢谢，你的灵魂看起来比我的好多了，我去看看

【解决方案2】：

我是这样管理的：

 doc = Jsoup.connect("http://meteo.uwb.edu.pl/").get();
 Elements tables = doc.select("td");
 Elements headers = tables.get(2).select("b");
 Elements vals = tables.get(3).select("b");
 Map all = new HashMap();

 for (int i=0;i<headers.size() ; i++) all.put(headers.get(i).text(),vals.get(i).text());

好像没问题。

【讨论】：