【问题标题】:Truncated Response for Web page with 0x00 character带有 0x00 字符的网页的截断响应
【发布时间】:2009-11-16 11:01:22
【问题描述】:

我编写了一个下载网页的程序。它适用于大多数网页,但我发现有些网页无法正常工作。

这些页面包含 0x00 个字符。

我可以阅读该字符之前的页面内容,但不能阅读之后的内容。

我使用这部分代码来读取响应:

IAsyncResult ar = null;
HttpWebResponse resp = null;
Stream responseStream = null;
String content = null;
...
resp = (HttpWebResponse)req.EndGetResponse(ar);
responseStream = resp.GetResponseStream();
StreamReader sr = new StreamReader(responseStream, Encoding.UTF8);
content = sr.ReadToEnd();

在这个例子中,我使用异步请求,但我尝试使用同步请求,但我遇到了同样的问题。

我也尝试了同样的结果:

HttpWebResponse resp = null;
Stream responseStream = null;
String content = new String();
...
responseStream = resp.GetResponseStream();
byte[] buffer = new byte[4096];
int bytesRead = 1;
while (bytesRead > 0)
{
    bytesRead = responseStream.Read(buffer, 0, 4096);
    content += Encoding.UTF8.GetString(buffer, 0, bytesRead);
}

比如这个urlhttp://www.daz3d.com/i/search/searchsub?sstring=ps_tx1662b&_m=dps_tx1662b出现问题

感谢您的回复

欧约素

【问题讨论】:

    标签: c# httpwebresponse truncated


    【解决方案1】:

    您的问题是将接收到的内容转换为字符串,您需要在其中删除那些0x00 字节:

    AutoResetEvent sync = new AutoResetEvent(false);
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://...");
    request.Proxy.Credentials = CredentialCache.DefaultCredentials;
    request.BeginGetResponse((result) =>
    {
        StringBuilder content = new StringBuilder();
        using (HttpWebResponse response = 
               request.EndGetResponse(result) as HttpWebResponse)
        using (Stream stream = response.GetResponseStream())
        {
            int read = 1;
            byte[] buffer = new byte[0x1000];
            while (read > 0)
            {
                read = stream.Read(buffer, 0, buffer.Length);
                content.Append(Encoding.UTF8.GetString(buffer
                    .TakeWhile((b, index) => index <= read)
                    .Where(b => b != 0x00).ToArray()));
            }
            Console.WriteLine(content);
            sync.Set();
        }
    }, null);
    sync.WaitOne();
    

    【讨论】:

      【解决方案2】:

      实际上失败的是编码。要绕过它,您必须过滤掉 0x00 字节。像这样的东西应该可以解决问题:

      using System.Net;
      using System.IO;
      using System.Text;
      
      WebRequest request = WebRequest.Create("url here");
      WebResponse response = request.GetResponse();
      
      string html;
      using (Stream stream = response.GetResponseStream())
      {
      
          int index = -1, currentByte = 0;
          byte[] buffer = new byte[response.ContentLength];
          while ((currentByte = stream.ReadByte()) > -1)
          {
              if(currentByte > 0) buffer[++index] = (byte)currentByte;
          }
      
          html = Encoding.ASCII.GetString(buffer, 0, index + 1);
      }
      

      【讨论】:

        猜你喜欢
        • 2014-08-13
        • 2023-03-29
        • 1970-01-01
        • 2021-10-28
        • 1970-01-01
        • 2020-04-06
        • 2012-01-20
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多