【问题标题】:How to get html content from amazon using HttpWebRequest如何使用 HttpWebRequest 从亚马逊获取 html 内容
【发布时间】:2021-02-22 01:50:45
【问题描述】:

我正在尝试从亚马逊网站获取 HTML 内容。这是我创建请求、响应和获取字符串的代码:

       public static HttpWebResponse GetHttpWebResponse(string url)
    {
        HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
        webRequest.ContentType = "text/xml";
        try
        {
            return (HttpWebResponse)webRequest.GetResponse();
        }
        catch (WebException e)
        {
            if (e.Response == null)
                throw new Exception("Cannot get response");
            return (HttpWebResponse)e.Response;
        }
    }

    public static string GetString(HttpWebResponse response)
    {
        Encoding encoding = Encoding.UTF8;
        using (var reader = new StreamReader(response.GetResponseStream(), encoding))
        {
            string responseText = reader.ReadToEnd();
            return responseText;
        }
    }

它在其他网站上运行良好。但是,当我尝试从亚马逊获取内容时,例如: https://www.amazon.com/gp/product/B00AEISSHA/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 我看到的是编码内容:

我尝试更改Encoding 并使用HttpUtility.HtmlDecode(html);,但它无济于事。有什么简单的方法可以从亚马逊获取内容?

【问题讨论】:

    标签: c# html amazon-web-services web-scraping


    【解决方案1】:

    您不适合压缩。如果您像这样更新您的网络请求,它应该可以解决问题。

    public static HttpWebResponse GetHttpWebResponse(string url)
    {
        HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
        webRequest.ContentType = "text/xml";
        webRequest.AutomaticDecompression = DecompressionMethods.GZip;
        try
        {
            return (HttpWebResponse)webRequest.GetResponse();
        }
        catch (WebException e)
        {
            if (e.Response == null)
                throw new Exception("Cannot get response");
            return (HttpWebResponse)e.Response;
        }
    }
    

    【讨论】:

      猜你喜欢
      • 2011-02-03
      • 2015-07-22
      • 1970-01-01
      • 1970-01-01
      • 2018-06-04
      • 1970-01-01
      • 1970-01-01
      • 2013-07-22
      • 1970-01-01
      相关资源
      最近更新 更多