【问题标题】:Problems using jsoup to parse HTML Table使用 jsoup 解析 HTML 表的问题
【发布时间】:2015-07-27 09:27:28
【问题描述】:

几天前,我问了一个用 JSOUP 解析 HTML 表格的问题。 @luksch 帮助了我,我可以解决我的问题。我的问题是如何解析具有许多 TR 和 TD 的 HTML 文件的一部分以选择其中的特定文本(组块表)。

HTML 代码:

<TABLE SUMMARY="Topline" WIDTH="100%">
<TR><TD HEIGHT=16>&nbsp;</TD></TR>  <!-- For the menu bar -->
<TR>
<TD VALIGN=MIDDLE ALIGN=LEFT WIDTH="30%">
<FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Xymon</B></FONT
</TD>
<TD VALIGN=MIDDLE ALIGN=CENTER WIDTH="40%">
<CENTER><FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Current Status</B></FONT></CENTER>
</TD>
<TD VALIGN=MIDDLE ALIGN=RIGHT WIDTH="30%">
<FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Thu Jul 23 16:05:06 2015</B></FONT>
</TD>
</TR>
<TR>
<TD COLSPAN=3> <HR WIDTH="100%"> </TD>
</TR>
</TABLE>
<BR>
<A NAME=hosts-blk>&nbsp;</A>

<CENTER><TABLE SUMMARY="Group Block" BORDER=0 CELLPADDING=2>
<TR><TD VALIGN=MIDDLE ROWSPAN=2><CENTER><FONT COLOR="#FFFFF0" SIZE="+1">&nbsp;</FONT></CENTER></TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45> 
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbd"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbd</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbgen"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbgen</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbtest"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbtest</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?conn"><FONT COLOR="#87a9e5" SIZE="-1"><B>conn</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?cpu"><FONT COLOR="#87a9e5" SIZE="-1"><B>cpu</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?disk"><FONT COLOR="#87a9e5" SIZE="-1"><B>disk</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?files"><FONT COLOR="#87a9e5" SIZE="-1"><B>files</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?hobbitd"><FONT COLOR="#87a9e5" SIZE="-1"><B>hobbitd</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?http"><FONT COLOR="#87a9e5" SIZE="-1"><B>http</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?info"><FONT COLOR="#87a9e5" SIZE="-1"><B>info</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?memory"><FONT COLOR="#87a9e5" SIZE="-1"><B>memory</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?msgs"><FONT COLOR="#87a9e5" SIZE="-1"><B>msgs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?ports"><FONT COLOR="#87a9e5" SIZE="-1"><B>ports</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?procs"><FONT COLOR="#87a9e5" SIZE="-1"><B>procs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?trends"><FONT COLOR="#87a9e5" SIZE="-1"><B>trends</B></FONT></A> </TD>
</TR> 

<TR><TD COLSPAN=15><HR WIDTH="100%"></TD></TR>
<TR class=line>
<TD NOWRAP><A NAME="hostname1">&nbsp;</A>
<FONT SIZE="+1" COLOR="#FFFFCC" FACE="Tahoma, Arial, Helvetica"><span title="127.0.0.1">hostname1</span></FONT><TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1.&amp;SERVICE=bbd"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbd:green:268d04h25m" TITLE="bbd:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=bbgen"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbgen:green:268d04h24m" TITLE="bbgen:green:268d04h24m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=bbtest"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbtest:green:268d04h25m" TITLE="bbtest:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=conn"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="conn:green:268d04h25m" TITLE="conn:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=cpu"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="cpu:green:169d00h15m" TITLE="cpu:green:169d00h15m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=disk"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="disk:green:268d04h25m" TITLE="disk:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=files"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="files:clear:268d04h25m" TITLE="files:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=hobbitd"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="hobbitd:green:169d01h05m" TITLE="hobbitd:green:169d01h05m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:268d04h19m" TITLE="http:green:268d04h19m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=info"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="info:green:127.0.0.1" TITLE="info:green:127.0.0.1" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=memory"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="memory:green:268d04h25m" TITLE="memory:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=msgs"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="msgs:green:268d04h20m" TITLE="msgs:green:268d04h20m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=ports"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="ports:clear:268d04h25m" TITLE="ports:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=procs"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="procs:clear:268d04h25m" TITLE="procs:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=trends"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="trends:green:" TITLE="trends:green:" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
</TR>

<TR class=line>
<TD NOWRAP><A NAME="hostname2">&nbsp;</A>
<FONT SIZE="+1" COLOR="#FFFFCC" FACE="Tahoma, Arial, Helvetica"><span title="127.0.0.2">hostname2</span></FONT><TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=bbd"><IMG SRC="/hobbit/gifs/static/red.gif" ALT="bbd:red:16d06h46m" TITLE="bbd:red:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=conn"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="conn:green:16d06h46m" TITLE="conn:green:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:16d06h46m" TITLE="http:green:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=info"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="info:green:127.0.0.2" TITLE="info:green:127.0.0.2" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=trends"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="trends:green:" TITLE="trends:green:" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
</TR>

</TABLE></CENTER><BR>
<BR><BR>

第一部分(带有 bbd、bbdgen、bbtest 等的表组块)我修复了:

ArrayList<String> groupBlock = new ArrayList<String>();
Object[] objPlace;
Element table = document.select("TABLE").get(1); //select the second table:     "Group Block"
Elements rows = table.select("TR");             
for (int i = 0; i < rows.size(); i++) {
    Element row = rows.get(i);
    Elements cols = row.select("TD");
       for (Element col : cols){
           switch(col.text()){
           case "bbd": 
           case "bbgen":
           case "bbtest":
           //...more cases
               groupBlock.add(col.text());
               break;
           default:
               break;
           }
       }      
}
objPlace = groupBlock.toArray();

现在我必须解析两个主机名(主机名 1 和主机名 2)以放入单独的 TextView,但问题是主机名将来可以更改其名称。另外,我还要解析每个TD中的“IMG SRC”,例如:

<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:268d04h19m" TITLE="http:green:268d04h19m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>

我只需要解析 IMG SRC /hobbit/gifs/static/green.gif 必须在开头附加 URL 的其余部分:http://example.com/hobbit/gifs/static/green.gif 以获取图像和将其放在 XML 布局中的另一个字段中。我必须对 HTML 文件中的所有 IMG SRC TD 执行此操作。

我知道,一旦我获得了图像,我必须执行以下操作:

InputStream input = new java.net.URL(imgSrc).openStream();
bitmap = BitmapFactory.decodeStream(input);
ImageView logoimg = (ImageView) findViewById(R.id.logo);
logoimg.setImageBitmap(bitmap);

imgSrc应该是所有IMG SRC的数组

我不知道如何从前面的步骤开始,我是 Jsoup 和 Android 的新手。

【问题讨论】:

  • @luksch 你能帮我解决这个问题吗?我完全迷路了:(
  • 你能告诉我你在 JSoup 中加载原始 URL 的那一行吗?
  • 我正在加载网址如下:String username = "user";String password = "pass";String login = username + ":" + password;String base64login = new String(android.util.Base64.encode(login.getBytes(), android.util.Base64.NO_WRAP));Document document = Jsoup.connect("http://example.com").header("Authorization", "Basic " + base64login).get();Document document = Jsoup.connect(url).header("Authorization", "Basic " + base64login).get();
  • 对不起,最后一条评论有错误:String url = "http://example.com";Jsoup.connect(url).header("Authorization", "Basic " + base64login).get();

标签: android html parsing html-table jsoup


【解决方案1】:

您可以使用主机名元素查询&lt;td&gt;。然后转到父级,即&lt;tr&gt;。从那时起再次得到所有孩子&lt;td&gt;。这些将是包含您想要获取的链接的条目。这方面的一些东西:

Document document = Jsoup.parse(html);
Element table = document.select("TABLE").get(1); 
Elements asWithName = table.select("tr>td a[name]");
for (Element aWithName : asWithName){
    String name = aWithName.attr("name");
    System.out.println("hostname="+name);
    Element tr = aWithName.parent().parent();
    for (Element td : tr.select("td")){
        Element img = td.select("img").first();
        if (img == null){
            continue;
        }
        String imgRelPath = img.attr("src");
        System.out.println("  imgRelPath="+imgRelPath);
    }
}

【讨论】:

  • 谢谢@luksch,我会试试看,我会告诉你一些事情
  • 有效!我必须测试一些东西,但我明白了。非常感谢!
  • 现在我在数组中的所有 url 都在做:images.add("http://hostname.com"+imgRelPath); 你知道我怎样才能获取图像,然后将它们全部设置在 imageView 或类似的东西中吗?谢谢
  • 如果遇到困难,请提出一个新问题。
猜你喜欢
  • 1970-01-01
  • 2015-06-06
  • 1970-01-01
  • 1970-01-01
  • 2014-01-01
  • 2019-10-10
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多