Jsoup - 一一阅读答案

【问题标题】：Jsoup - read one by oneJsoup - 一一阅读
【发布时间】：2013-03-01 17:22:46
【问题描述】：

我最近开始使用 Jsoup。我需要在 HTML 源代码中列出一些元素。例如：

 <table class="list">
    <tr>
        <td class="year" colspan="5">2012</td>
    </tr>
    <tr>
        <td class="code">COMP0348</td>
        <td class="name">Software Engineering</td>
    </tr>
    <tr>
        <td class="code">COMP0734</td>
        <td class="name">System Information</td>
    </tr>
    <td class="year" colspan="5">2013</td>
    </tr>
    <tr>
        <td class="code">COMP999</td>
        <td class="name">Windows</td>
    </tr>
</table>

这就是我想要的：

2012 
Comp0348 Software Engineering
COMP0734 System Information
2013
COMP999 Windows

但在我的代码中，它不是一一列出，而是列出一个字符串，其中首先包含所有“年份”，然后在另一行中包含所有“代码”，然后在另一行中包含所有“名称”。喜欢：

2012 
Comp0348 COMP0734 COMP999
Software Engineering System Information Windows

我该怎么做？

【问题讨论】：

显示你的jsoup相关代码

标签： java html html-parsing jsoup

【解决方案1】：

我猜你只是按标准选择标签，而不是结构。

但请看这里：

Document doc = ...

Element table = doc.select("table.list").first(); // select the table


for( Element element : table.select("tr") ) // select all 'tr' of the table
{
    final Elements td = element.select("td.year"); // select the 'td' with 'year' class

    if( !td.isEmpty() ) // if it's the one with the 'year' class
    {
        final String year = td.first().text(); // get year

        System.out.println(year);
    }
    else // if it's another 'tr' tag containing the 'code' and 'name' element
    {
        final String code = element.select("td.code").first().text(); // get code
        final String name = element.select("td.name").first().text(); // get name

        System.out.println(code + " " + name);
    }
}

输出（使用您的 html）：

2012
COMP0348 Software Engineering
COMP0734 System Information
2013
COMP999 Windows

【讨论】：