【问题标题】:How to Timeout a request using Html Agility Pack如何使用 Html Agility Pack 使请求超时
【发布时间】:2011-09-28 06:46:04
【问题描述】:

我正在向当前离线(故意)的远程 Web 服务器发出请求。

我想找出使请求超时的最佳方法。基本上,如果请求运行时间超过“X”毫秒,则退出请求并返回null 响应。

目前网络请求只是坐在那里等待响应.....

我将如何最好地解决这个问题?

这是当前的代码 sn-p

    public JsonpResult About(string HomePageUrl)
    {
        Models.Pocos.About about = null;
        if (HomePageUrl.RemoteFileExists())
        {
            // Using the Html Agility Pack, we want to extract only the
            // appropriate data from the remote page.
            HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load(HomePageUrl);
            HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");

            if (node != null)
            { 
                about = new Models.Pocos.About { html = node.InnerHtml };
            }
                //todo: look into whether this else statement is necessary
            else 
            {
                about = null;
            }
        }

        return this.Jsonp(about);
    }

【问题讨论】:

    标签: c# .net asp.net-mvc-3 timeout html-agility-pack


    【解决方案1】:

    通过此方法检索你的url网页:

    private static string retrieveData(string url)
        {
            // used to build entire input
            StringBuilder sb = new StringBuilder();
    
            // used on each read operation
            byte[] buf = new byte[8192];
    
            // prepare the web page we will be asking for
            HttpWebRequest request = (HttpWebRequest)
            WebRequest.Create(url);
            request.Timeout = 10; //10 millisecond
            // execute the request
    
            HttpWebResponse response = (HttpWebResponse)
            request.GetResponse();
    
            // we will read data via the response stream
            Stream resStream = response.GetResponseStream();
    
            string tempString = null;
            int count = 0;
    
            do
            {
                // fill the buffer with data
                count = resStream.Read(buf, 0, buf.Length);
    
                // make sure we read some data
                if (count != 0)
                {
                    // translate from bytes to ASCII text
                    tempString = Encoding.ASCII.GetString(buf, 0, count);
    
                    // continue building the string
                    sb.Append(tempString);
                }
            }
            while (count > 0); // any more data to read?
    
            return sb.ToString();
        }
    

    使用 HTML Agility 包并像这样检索 html 标记:

    public static string htmlRetrieveInfo()
        {
            string htmlSource = retrieveData("http://example.com/test.html");
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(htmlSource);
            if (doc.DocumentNode.SelectSingleNode("//body") != null)
            {
              HtmlNode node = doc.DocumentNode.SelectSingleNode("//body");
            }
            return node.InnerHtml;
        }
    

    【讨论】:

    • +1 感谢您的回复,它让我走上了正确的道路。我没有通过 HttpWebRequest 读取 Html,而是简单地向 RemoteFileExists 添加了超时 - see my answer
    • @reggie:请注意,此代码的生产版本应使用 using 处理 IDisposable 之类的内容。
    【解决方案2】:

    Html Agility Pack 是开源的。这就是为什么您可以自己修改源代码的原因。 首先将此代码添加到类 HtmlWeb

    private int _timeout = 20000;
    
    public int Timeout 
        { 
            get { return _timeout; } 
            set
            {
                if (_timeout < 1) 
                    throw new ArgumentException("Timeout must be greater then zero.");
                _timeout = value;
            }
        }
    

    然后找到这个方法

    private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds)
    

    并修改它:

    req = WebRequest.Create(uri) as HttpWebRequest;
    req.Method = method;
    req.UserAgent = UserAgent;
    req.Timeout = Timeout; //add this
    

    或者类似的:

    htmlWeb.PreRequest = request =>
                {
                    request.Timeout = 15000;
                    return true;
                };
    

    【讨论】:

      【解决方案3】:

      我不得不对我最初发布的代码做一个小的调整

          public JsonpResult About(string HomePageUrl)
          {
              Models.Pocos.About about = null;
              // ************* CHANGE HERE - added "timeout in milliseconds" to RemoteFileExists extension method.
              if (HomePageUrl.RemoteFileExists(1000))
              {
                  // Using the Html Agility Pack, we want to extract only the
                  // appropriate data from the remote page.
                  HtmlWeb hw = new HtmlWeb();
                  HtmlDocument doc = hw.Load(HomePageUrl);
                  HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");
      
                  if (node != null)
                  { 
                      about = new Models.Pocos.About { html = node.InnerHtml };
                  }
                      //todo: look into whether this else statement is necessary
                  else 
                  {
                      about = null;
                  }
              }
      
              return this.Jsonp(about);
          }
      

      然后我修改了我的 RemoteFileExists 扩展方法以设置超时

          public static bool RemoteFileExists(this string url, int timeout)
          {
              try
              {
                  //Creating the HttpWebRequest
                  HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
      
                  // ************ ADDED HERE
                  // timeout the request after x milliseconds
                  request.Timeout = timeout;
                  // ************
      
                  //Setting the Request method HEAD, you can also use GET too.
                  request.Method = "HEAD";
                  //Getting the Web Response.
                  HttpWebResponse response = request.GetResponse() as HttpWebResponse;
                  //Returns TRUE if the Status code == 200
                  return (response.StatusCode == HttpStatusCode.OK);
              }
              catch
              {
                  //Any exception will returns false.
                  return false;
              }
          }
      

      在这种方法中,如果我的超时在RemoteFileExists 可以确定标头响应之前触发,那么我的bool 将返回false。

      【讨论】:

        【解决方案4】:

        您可以使用标准 HttpWebRequest 来获取远程资源并设置 Timeout 属性。如果成功,则将生成的 HTML 提供给 HTML Agility Pack 进行解析。

        【讨论】:

        • System.Net.WebRequest 转换为HtmlAgilityPack.HtmlDocument 的正确方法是什么?
        猜你喜欢
        • 2017-09-25
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-06-04
        • 2014-07-13
        • 2014-08-31
        • 1970-01-01
        相关资源
        最近更新 更多