【发布时间】:2015-03-31 21:14:31
【问题描述】:
我正在尝试解析一个如下所示的表:
<table><tbody>
<tr><th a href=""></th><th></th></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a"> </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="ttt"></table></td></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a"> </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="eee"></table></td></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a"> </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="rtr"></table></td></tr>
<tr><th a href=""></th><th></th></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a"> </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="ouu"></table></td></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a"> </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="teee"></table></td></tr>
</tbody></table>
我在 ASP.net 中使用此代码来获取我想要的每一行中的单元格:
var getHtmlWeb = new HtmlWeb();
var document = getHtmlWeb.Load(txtbox.Text);
//get tables
foreach (HtmlNode table in document.DocumentNode.SelectNodes("//table"))
{
//get each table row
foreach (HtmlNode row in table.SelectNodes("tr"))
{
Outputlabel.Text += "row: <br />";
//get table head tags that have a link, get the Inner text
if((row.SelectSingleNode("//th//a").InnerText) != null)
{
Outputlabel.Text += row.SelectSingleNode("//th//a").InnerText + "<br />";
}
// get the cells with the classes I want
string d = row.SelectSingleNode("//td[@class='d']").InnerText;
Outputlabel.Text += row.SelectSingleNode("//td[@class='d']").InnerText + " ";
string h = row.SelectSingleNode("//td[@class='h']").InnerText;
Outputlabel.Text += row.SelectSingleNode("//td[@class='h']").InnerText + " ";
string a = row.SelectSingleNode("//td[@class='a']").InnerText;
Outputlabel.Text += row.SelectSingleNode("//td[@class='a']").InnerText + " ";
string op = "";
//there are 3 classes in each row to have the class="o"
if (row.SelectNodes("//td[@class='o']") != null)
{
foreach (HtmlNode o in row.SelectNodes("//td[@class='o']"))
{
op += o.InnerText;
}
Outputlabel.Text += op + " ";
}
var pr = row.SelectSingleNode("//td//table[@class='p']");
string pr = probability.Attributes["title"].Value;
Outputlabel.Text += pr + "<br />";
}
}
我只得到第一个表的第一行,它被重复了很多次......而且我没有得到类“o”和类“p”中的类“p”表的标题p"
【问题讨论】:
-
检查我的答案,如果对你有帮助,请告诉我。
标签: c# html asp.net html-agility-pack