【发布时间】:2016-09-07 15:18:45
【问题描述】:
我按照下面链接中的示例,成功地将 HTML 表解析为数据表。
http://blog.ditran.net/parsing-html-table-to-c-usable-datalist/
但是我无法解析多个表,当我遍历 TR 时,第一个 TR 总是有列名,其余的都有每个表中的数据。所以我使用这个逻辑并将表数据存储在字典中发送到我的 ToDataTable 函数。
有人可以帮助我如何循环多个表并实现相同的逻辑。感谢它。
var tRowList = doc.DocumentNode.SelectNodes("//tr");
foreach (HtmlNode tRow in tRowList)
{
if (previousRowSpanList.Count > 0)
{
theDict = previousRowSpanList[0];
previousRowSpanList.Remove(theDict); //remove it off the list
isWorkingWithRowSpan = true;
}
else
{
theDict = new List<KeyValuePair<string, string>>();
isWorkingWithRowSpan = false;
}
var tCellList = tRow.SelectNodes("td|th");
tCelCount = tCellList.Count;
if (tCelCount > 0 &&
!(tCelCount == 1 && string.IsNullOrEmpty(tCellList[0].InnerText.Trim()))
)
{
//colOrder = 1;
IsNullEntireRow = true;
for (int colIndex = 0; colIndex < tCelCount; colIndex++)
{
cell = tCellList[colIndex];
ColInnerText = cell.InnerText.Replace(" ", " ").Trim();
if (!string.IsNullOrEmpty(ColInnerText))
IsNullEntireRow = false;
//
static DataTable ToDataTable(List<List<KeyValuePair<string, string>>> list)
{
DataTable result = new DataTable();
if (list.Count == 0)
return result;
result.Columns.AddRange(
list.First().Select(r => new DataColumn(r.Value)).ToArray()
);
list= list.Skip(1).ToArray().ToList();
list.ForEach(r => result.Rows.Add(r.Select(c => c.Value).Cast<object>().ToArray()));
return result;
示例 HTML:
<table>
<tbody>
<tr><td style="background-color:#A9F5A9;font-weight:bold;" class="center">Node</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">Logtime</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">Hardware</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">Prcstate A</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">Prcstate B</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">Cluster</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">RAID</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">AD replication A</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">AD replication B</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">File replication A</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">File replication B</td><td style="background-color:#A9F5A9;font-weight:bold;" class="center">hcstart RESULT</td></tr>
<tr><td class="center">DTMSCB1</td><td class="center">2016-08-26 16:40</td><td class="center">APG43L</td><td class="center">active</td><td class="center">passive</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td class="center">-</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">Not OK</td></tr>
<tr><td class="center">MSC9</td><td class="center">2016-08-26 16:40</td><td class="center">APG40C/4</td><td class="center">passive</td><td class="center">active</td><td class="center">OK</td><td class="center">OK</td><td class="center">OK</td><td class="center">OK</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">Not OK</td><td class="center">OK</td><td class="center">-</td></tr>
</tbody>
</table>
<table>
<tbody>
<tr><td style="background-color:#A9F5A9;" class="center">Node Type</td><td style="background-color:#A9F5A9;" class="center">Node</td><td style="background-color:#A9F5A9;" class="center">Log Time</td><td style="background-color:#A9F5A9;" class="center">New Mon. Alarms</td><td style="background-color:#A9F5A9;" class="center">Mon. Alarms Total</td><td style="background-color:#A9F5A9;" class="center">Other Alarms</td><td style="background-color:#A9F5A9;" class="center">MML</td></tr>
<tr><td class="center">BSC</td><td class="center">BMBSC1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">46</td><td class="center">445</td><td class="center">OK</td></tr>
<tr><td class="center">BSC</td><td class="center">BMBSC2C</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">27</td><td class="center">609</td><td class="center">OK</td></tr>
<tr><td class="center">BSC</td><td class="center">CYBSC1</td><td class="center">2016-08-26 16:45</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">1</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">45</td><td class="center">665</td><td class="center">OK</td></tr>
<tr><td class="center">BSC</td><td class="center">CYBSC2C</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">30</td><td class="center">849</td><td class="center">OK</td></tr>
<tr><td class="center">MSC-BC</td><td class="center">CYMSCB1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">38</td><td class="center">283</td><td class="center">OK</td></tr>
<tr><td class="center">BSC</td><td class="center">DTBSC1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">48</td><td class="center">201</td><td class="center">OK</td></tr>
<tr><td class="center">BSC</td><td class="center">DTBSC2</td><td class="center">2016-08-26 16:45</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">1</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">31</td><td class="center">310</td><td class="center">OK</td></tr>
<tr><td class="center">MSC-BC</td><td class="center">DTMSCB1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">25</td><td class="center">130</td><td class="center">OK</td></tr>
<tr><td class="center">HLR</td><td class="center">HLR1</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">16</td><td class="center">12</td><td class="center">OK</td></tr>
<tr><td class="center">HLR</td><td class="center">HLR2</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">24</td><td class="center">10</td><td class="center">OK</td></tr>
<tr><td class="center">MSC-S</td><td class="center">MSC10</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">48</td><td class="center">79</td><td class="center">OK</td></tr>
<tr><td class="center">MSC-S</td><td class="center">MSC9</td><td class="center">2016-08-26 16:45</td><td class="center">0</td><td style="background-color:#FF0000;color:#FFFFFF;font-weight:bold;" class="center">46</td><td class="center">131</td><td class="center">OK</td></tr>
</tbody>
</table>
【问题讨论】:
-
先找到表节点
var tableList = doc.DocumentNode.SelectNodes("//table");,遍历这个列表,找到对应的表行 -
如何只得到对应的tr和td..。下面的代码给出了 null.. var tCellList = tRow.SelectNodes("tr|td");
-
我想遍历一个表 tr 并获取字典对象,然后遍历另一个..
-
见下面提供的例子
标签: c# html xpath html-agility-pack