【发布时间】:2021-02-22 01:50:45
【问题描述】:
我正在尝试从亚马逊网站获取 HTML 内容。这是我创建请求、响应和获取字符串的代码:
public static HttpWebResponse GetHttpWebResponse(string url)
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.ContentType = "text/xml";
try
{
return (HttpWebResponse)webRequest.GetResponse();
}
catch (WebException e)
{
if (e.Response == null)
throw new Exception("Cannot get response");
return (HttpWebResponse)e.Response;
}
}
public static string GetString(HttpWebResponse response)
{
Encoding encoding = Encoding.UTF8;
using (var reader = new StreamReader(response.GetResponseStream(), encoding))
{
string responseText = reader.ReadToEnd();
return responseText;
}
}
它在其他网站上运行良好。但是,当我尝试从亚马逊获取内容时,例如: https://www.amazon.com/gp/product/B00AEISSHA/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 我看到的是编码内容:
我尝试更改Encoding 并使用HttpUtility.HtmlDecode(html);,但它无济于事。有什么简单的方法可以从亚马逊获取内容?
【问题讨论】:
标签: c# html amazon-web-services web-scraping