【问题标题】:WebRequest multiple pages and load into StreamReaderWebRequest 多个页面并加载到 StreamReader
【发布时间】:2011-10-11 10:32:37
【问题描述】:

我想使用 ASP.NET 4.0 转到多个页面,复制所有 HTML,然后最后将其粘贴到文本框中。从那里我想运行我的解析功能,处理这个问题的最佳方法是什么?

 protected void goButton_Click(object sender, EventArgs e)
    {
        if (datacenterCombo.Text == "BL2")
        {
            fwURL = "http://website1.com/index.html";
            l2URL = "http://website2.com/index.html";
            lbURL = "http://website3.com/index.html";
            l3URL = "http://website4.com/index.html";
            coreURL = "http://website5.com/index.html";

            WebRequest objRequest = HttpWebRequest.Create(fwURL);
            WebRequest layer2 = HttpWebRequest.Create(l2URL);

            objRequest.Credentials = CredentialCache.DefaultCredentials;
            using (StreamReader layer2 = new StreamReader(layer2.GetResponse().GetResponseStream()))


            using (StreamReader objReader = new StreamReader(objRequest.GetResponse().GetResponseStream()))
            {
                originalBox.Text = objReader.ReadToEnd();
            }
            objRequest = HttpWebRequest.Create(l2URL);

            //Read all lines of file
            String[] crString = { "<BR>&nbsp;" };
            String[] aLines = originalBox.Text.Split(crString, StringSplitOptions.RemoveEmptyEntries);
            String noHtml = String.Empty;

            for (int x = 0; x < aLines.Length; x++)
            {
                if (aLines[x].Contains(ipaddressBox.Text))
                {
                    noHtml += (RemoveHTML(aLines[x]) + "\r\n");
                }
            }

            //Print results to textbox
            resultsBox.Text = String.Join(Environment.NewLine, noHtml);

        }
    }
    public static string RemoveHTML(string text)
    {
        text = text.Replace("&nbsp;", " ").Replace("<br>", "\n");
        var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
        return oRegEx.Replace(text, string.Empty);

    } 

【问题讨论】:

    标签: c# asp.net html string parsing


    【解决方案1】:

    您应该使用HtmlAgilityPack 而不是手动执行所有这些操作,然后您可以执行以下操作:

    HtmlWeb web = new HtmlWeb();
    HtmlDocument doc = web.Load("http://google.com");
    
    var targetNodes = doc.DocumentNode
                         .Descendants()
                         .Where(x=> x.ChildNodes.Count == 0 
                                &&  x.InnerText.Contains(someIpAddress));
    
    foreach (var node in targetNodes)
    {
        //do something
    }
    

    如果 HtmlAgilityPack 不适合您,请至少简化代码的下载部分并使用 WebClient

    using (WebClient wc = new WebClient())
    {
        string html = wc.DownloadString("http://google.com");
    }
    

    【讨论】:

    • 我认为 WebClient 添加标头和读取响应标头是不可能的。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2012-12-11
    • 2020-06-09
    • 2012-09-15
    • 1970-01-01
    • 2012-12-03
    • 1970-01-01
    • 2023-01-22
    相关资源
    最近更新 更多