【问题标题】:How to parse html document using c# [duplicate]如何使用c#解析html文档[重复]
【发布时间】:2017-04-30 20:19:04
【问题描述】:

我必须按如下方式解析文档。我正在尝试 HtmlAgilityPack 但它非常复杂。我需要这个标签内部文本:<td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Mac Bahsi</td> and children

<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;11.25;1;Maç Bahsi;164518117')">
<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;6.50;0;Maç Bahsi;164518117')">,
<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;1.18;2;Maç Bahsi;164518117')">

<!DOCTYPE HTML>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <style>
        .table1 {
            width: 100%;
            margin: 0px;
            padding: 0px;
            border-collapse: collapse;
            padding: 0px;
        }

        .div1 {
            cursor: pointer;
            margin: 1px;
            border: 1px solid #999999;
            float: left;
            font-size: 12px;
        }

        .td1 {
            text-align: center;
            font-size: 20px;
            font-weight: bold;
            color: #33460E;
            height: 20px;
            padding: 0px;
        }

        .td2 {
            text-align: center;
            font-weight: bold;
            color: #808000;
            padding: 0px;
        }
    </style>
</head>
<body style="background: #FFFFCC;margin: 0px;padding: 0px;font-size: 12px;">
    <p></p>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Mac Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;11.25;1;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">11.25</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;6.50;0;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">6.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;1.18;2;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.18</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Ilk Yari Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;8.50;1;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">8.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;3.05;0;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">3.05</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;1.50;2;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">İkinci Yarı Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;8.50;1;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">8.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;3.70;0;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">3.70</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;1.40;2;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.40</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <br />
    <br />
    <br />
</body>
</html>

【问题讨论】:

标签: c# html html-parsing html-agility-pack


【解决方案1】:

你可以使用类似的东西:

var document = new HtmlDocument();
document.LoadHtml(text);
var tables = document.Descendants("table").ToList();
foreach (var table in tables)
{
    var node = HtmlNode.CreateNode(table.InnerHtml);

    var td = node.SelectNodes("//td[@style='background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;'").FirstOrDefault();
    ...
    var divs =  node.SelectNodes("//div[@class='div1']").ToList();
    ...
}

【讨论】:

  • 感谢您的关注。但是,我无法理解“后裔”。我添加了这个命名空间“System.Xml.Linq”但是没有用。
  • 此方法来自 HtmlNode 类,放置在 HtmlAgilityPack 命名空间中。
  • 抱歉,它不起作用。 Visual Studio 请参阅缺少的参考。已附加 HtmlAgilityPack 命名空间。
  • 我使用最新版本的 HtmlAgilityPack。 s29.postimg.org/ixuqb5gd3/Descedants.png
  • 我有 v1.4.9.5 和运行时 v4.0.30319。你的版本是什么?我从 nugget 包管理器中包含在内。
【解决方案2】:

我是这样做的。但这是一条很长的路。如果有更好的捷径和更好的方法,请写出来。

HtmlWeb h = new HtmlWeb();
HtmlDocument doc = h.Load(Server.MapPath("xml/htmlpage.html"));
HtmlNodeCollection n = doc.DocumentNode.SelectNodes("//html/body/table");

string item;
string[] items;
string oran, oranadi;
int oran_id, secim;
for (int i = 1; i < n.Count + 1; i++)
{
    HtmlNode ns = n[i - 1].SelectSingleNode(string.Format("//html/body/table[{0}]/tr[1]/td", i));
    HtmlNodeCollection nc = n[i-1].SelectNodes(string.Format("//html/body/table[{0}]/tr[2]/td[1]/div", i));
    Response.Write(string.Format("{0} --> {1}<br/>", i, ns.InnerHtml));
    for (int j = 1; j < nc.Count + 1; j++)
    {
        HtmlNode ncs = nc[j - 1].SelectSingleNode(string.Format("//html/body/table[{0}]/tr[2]/td[1]/div[{1}]", i, j));
        item = ncs.Attributes[2].Value.ToString();
        items = item.Split(';');
        oran_id = Convert.ToInt32(items[7].Replace("')", ""));
        oranadi = items[6].ToString();
        secim = Convert.ToInt32(items[5]);
        oran = items[4];

        Response.Write(string.Format("{0} --> {1} - {2} - {3} - {4} <br/>", j, secim, oran_id, oranadi, oran));
    }
}

【讨论】:

    猜你喜欢
    • 2016-03-22
    • 2015-12-10
    • 2011-08-29
    • 2011-07-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多