C#将二进制转换为文本——问号？答案

【问题标题】：C# converting binary to text -- question marks?C#将二进制转换为文本——问号？
【发布时间】：2014-11-02 23:21:27
【问题描述】：

我正在将二进制文件转换为文本并将其转储为 PDF。我有这个工作，但我需要生成与另一个程序的某些示例相同的输出（它生成文本，然后将其转换为二进制，所以我想我正在转换回来？）。

除了一件事之外，我得到了相同的输出。我应该有一堆破折号来衬托主题标题，但我得到了问号 (?)。如果我使用 Notepad++ 显示二进制文件，问号会变成一些随机的韩文字符 (컴)。我尝试过result.Replace("?", "-"); 和result.Replace("컴", "-");，我什至尝试过检查Contains()，但没有任何触发。

如何替换它们？

不确定它是否会有所帮助，但这是我的代码：

private void btnConvertBinaryToPDF_Click(object sender, EventArgs e)
    {
        PdfDocument document = new PdfDocument(); //make new pdf document
        PdfPage page = document.AddPage(); //add a page to the document
        XGraphics gfx = XGraphics.FromPdfPage(page); //use this to draw/write on the specified page
        XFont font = new XFont("Courier New", 10); //need a font to write with

        string result = "";
        string path = @"C:\Users\file";

        byte[] b = new byte[1024];
        UTF8Encoding temp = new UTF8Encoding(true);
        FileStream fs = File.OpenRead(path);
        int i = 1; 
        while (fs.Read(b, 0, b.Length) > 0)
        {
            string tmp = temp.GetString(b);
            result += tmp;
            b = new byte[1024]; //clear the buffer 
        }


        if (result.Contains("?"))
        {
            Console.WriteLine("contains!");
        }
        result.Replace("컴", "-");

        XTextFormatter tf = new XTextFormatter(gfx);
        XRect rect = new XRect(40, 100, 500, 100);
        tf.DrawString(result, font, XBrushes.Black, rect, XStringFormats.TopLeft);

        string filename = "HelloWorld.pdf"; //make the filename
        document.Save(filename); //save the document to the filename
        Process.Start(filename); //open the file to show the document
    }

编辑：path 包含二进制数据。我需要获取其内容的文本表示。以上工作正常，除了编号高于 127 的 ASCII 字符。

【问题讨论】：

问号往往是由文本编码问题引起的。这开始很糟糕，utf-8 是一种可变长度编码。您使用 FileStream 的方式将切断编码字符的部分字节。您必须使用 StreamReader 来读取文件。

标签： c# text unicode binary converter

【解决方案1】：

看起来您只是在从文件中读取数据。我假设path 包含文本数据；在这种情况下，您最好只使用：

string result = File.ReadAllText(path);

可选地指定编码：

string result = File.ReadAllText(path, Encoding.UTF8);

目前，你是：

将比每次迭代读取的字节数更多的字节视为数据
不处理部分字符读取

（在处理string、byte[] 和FileStream 的方式上也存在一些效率低下的问题，但坦率地说，如果你也得到了错误的答案，那就没有实际意义了）

最后，你的替换：什么都不做：

result.Replace("컴", "-");

应该是：

result = result.Replace("컴", "-");

（如果仍然需要）

【讨论】：

path 不包含文本数据，它包含二进制数据，这就是我一开始没有使用string result = File.ReadAllText(path); 的原因。我唯一弄错的是我应该看到---------------Statistics-------------- 而不是看到?????????????Statistics???????????????。
@senschen 如果不包含文本，你为什么使用Encoding.GetString()？这是没有意义的。这里的数据是什么？
数据绝对是二进制的——我试图在那里获取b 内容的字符串表示形式。
@senschen 是的，这不是一个有效的做法
@senschen 如果数据是 UTF8，它可以仅“正常工作”。否则结果是完全不确定的。