Java - 如何逐字节读取 .dat 文件答案

【问题标题】：Java - How to read Byte by Byte a .dat fileJava - 如何逐字节读取 .dat 文件
【发布时间】：2023-04-10 15:47:02
【问题描述】：

我正在尝试使用没有其他类的 Java 读取 .dat。这是文件的结构：标题序列号：字； //2字节文件名：字符串[255]； //1个字节日期：字； //2字节字段编号：字； //2字节 NumbersOfRecords：字； //2 个字节

Info about Fields 
    FieldCode: Word;   //2 bytes
    FieldName: ShortString;   //1 byte

Info in Field 
    FieldCode: Word;  //2 bytes
    FieldText: String[255];  //1 byte

    DateTime = double

我必须知道的是如何使用 BufferedReader 获取每个 Byte，将其作为 int 读取，然后将相同的 int 转换为字符串并在屏幕上显示。我可以创建不同的方法来读取每种类型的数据吗？我可以让它同时读取 2 个字节吗？

更新：

    package binarios5;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main5 
{

    public static void main(String[] args) throws FileNotFoundException, IOException 
    {

        try 
        {
            Path path = Paths.get("C:\\\\Dev-Pas\\\\EXAMEN2.dat");
            System.out.println("File open");
            byte[] bytes = Files.readAllBytes(path);
            ByteBuffer buffer = ByteBuffer.wrap(bytes);
            buffer.order(ByteOrder.BIG_ENDIAN);
            short serial = buffer.getShort();
            System.out.println("----[CONTENIDO DEL ARCHIVO]--------------------");
            System.out.println("Nro. de Serie: " + serial);
            int largoCadena = buffer.get();//Bytes 1 int Longitud de la cadena
            //System.out.println("largoCadena: " + largoCadena);//33
            byte[] bytesChar = new byte[largoCadena];//CString
            buffer.get(bytesChar);
            String nombre = new String(bytesChar, StandardCharsets.ISO_8859_1);
            System.out.println("Nombre: " + nombre);

            short date = buffer.getShort();//FALTA DECODIFICAR FECHA
            System.out.println("Fecha sin procesar. "+date);//FALTA DECODIFICAR FECHA

            short cantCampos = buffer.getShort(); //cantidad de campos que tienen los registros
            System.out.println("Cantidad de Campos Customizados: "+cantCampos);//debe decir 4
            int[] codCampo = new int[cantCampos];
            String[] nombreCampo = new String[10];


            for (int i = 0; i < cantCampos; i++) //leer RegType segun la cantidad de campos
            {
                codCampo[i] = buffer.getShort();//Bytes 2 codigo del campo
                int largoCadena2 = buffer.get();//Bytes 1 int Longitud de la cadena
                byte[] bytesChar2 = new byte[largoCadena2];
                buffer.get(bytesChar2);
                nombreCampo[i] = new String(bytesChar2, StandardCharsets.ISO_8859_1);
            }

            for (int i = 0; i < cantCampos; i++)//mostrar codigos y campos
            {
                System.out.println("Campo [codigo: " + codCampo[i] + ", descripcion: " + nombreCampo[i] + "]");
            }

            short cantRegistros = buffer.getShort();//cantidad de registros total
            System.out.println("Cantidad de Registros: "+cantRegistros);
            System.out.println("-----------------------");//OK

            String[] contenidoCampo = new String[10];
            for (int i = 0; i < cantRegistros; i++) //leyendo RegData 5 veces
            {
                short cantCamposCompletos = buffer.getShort();

                for (int j = 0; j < cantCamposCompletos; j++)
                {
                    short codCampoInterno = buffer.getShort();
                    int largoCadena3 = buffer.get();
                    byte[] bytesChar3 = new byte[largoCadena3];
                    buffer.get(bytesChar3);
                    contenidoCampo[j] = new String(bytesChar3, StandardCharsets.ISO_8859_1);
                    System.out.println(nombreCampo[j]+": "+contenidoCampo[j]); 
                }
                System.out.println("-----------------------");
            }

            System.out.println("----[FIN CONTENIDO DEL ARCHIVO]-----------------");
        } 
        catch (IOException e)
        {
            System.out.println("File I/O error!");
        }

    }


}

【问题讨论】：

您使用 InputSTream 读取字节，而不是 Reader。 Reader 用于字符，而不是字节。所有的 InputStream 都有一个 read() 方法，返回一个字节。 docs.oracle.com/javase/8/docs/api/java/io/…。 Integer.toString() 将一个 int（例如 234）返回到一个字符串中，例如“234”。不过，不确定这是否是您的意思。您能否举一个具体的例子，因为我不明白文件名如何既可以是字符串 [255] 又可以是 1 字节。
Reader/Writer 用于文本而非二进制。将 Streams 用于二进制。
为什么要逐个字节？这是DataInputStream 的工作。注意String[255] 是 255 个字节（至少），而不是一个。

标签： java byte

【解决方案1】：

在 java 中 Reader 和 Writer 用于 Unicode 文本、字符串、2 字节字符。

对于二进制数据，byte[] 需要一个 InputStream、OutputStream。

可以使用 InputStream：

BufferedInputStream in = new BufferedInputStream(new FileInputStream(...));

在您的情况下，您想阅读简短的内容。为此，您可以将其包装在 DataInputStream 周围。

然而，使用 ByteBuffer 是最容易开始的。可以从文件（FileChannel）中读取，但简单的情况是：

Path path = Paths.get("C:/xxx/yyy.dat");
byte[] bytes = Files.readAllBytes(path);
ByteBuffer buffer = ByteBuffer.wrap(bytes);
//buffer.order(ByteOrder.LITTLE_ENDIAN); // So short is read as LSB,MSB

解决了：

// Header
short serial = buffer.getShort();
byte[] fileNameB = new byte[255];
buffer.get(fileNameB);
// If 0 terminated asciz string:
int len = fileNameB.length;
for (int i = 0; i < fileNameB.length; ++i) {
    if (fileNameB[i] == 0) {
        len = i;
        break;
    }
}
String fileName = new String(fileNameB, 0, len, StandardCharsets.ISO_8859_1);

short date = buffer.getShort();
short fieldNumbers = buffer.getShort();
short numbersOfRecords = buffer.getShort();

for (int fieldI = 0; fieldI < fieldNumber; ++fieldI) {
    // Info about Fields 
    short fieldCode = buffer.getShort();
    //byte fieldName: ShortString;   //1 byte
}

字段信息字段代码：字； //2字节字段文本：字符串[255]； //1个字节

DateTime = double

String getPascalString(ByteBuffer buffer) {
    int length = buffer.get() & 0xFF;
    byte[] bytes = new byte[length];
    buffer.get(bytes);
    return new String(bytes, StandardCharsets.ISO_8859_1);
}

将交付：d:/documentos/te...

short packedDate = buffer.getShort();
int year = packedDate & 0x7F; // + 1900?
int month = (packedDate >> 7) & 0xF:
int day = (packedDate >> 11) & 0x1F;

【讨论】：

您确定可以使用Little Endian 吗？我知道文件序列号是 14141，但它显示 15671。编辑：刚刚测试：buffer.order(ByteOrder.BIG_ENDIAN);它给了我 14141。它读取第一行没问题。
@GMP_47 这是我的（错误）假设，因为 Windows Intel 平台通常使用小端序。 BIG_ENDIAN 在 java 中是默认的。
告诉我更多关于那个字节[]的信息。所有字节都存储在该数组中，我可以按顺序抓取每个字节并根据需要使用它吗？
是的。将其包装在 ByteBuffer 中允许额外的定位（搜索）和读/写 short/int，而不是与bytes[i] << 8 等杂耍。
可能是：(1) 固定大小的块，以\0 结尾的字符串。 (2) 可变长度字段，以\0 终止，(3) Pascal 样式字符串：长度字节 + 文本字节（最大长度 255 适合）。

【解决方案2】：

阅读器用于阅读字符流。要读取原始字节流，请考虑使用 InputStream 并调用

public int read(byte[] b)

要解析字符串，请将字节数组传递给指定编码的字符串构造函数（不要使用默认编码，因为它可能是 utf8，具体取决于您的环境并且在您的情况下不适合）。

https://docs.oracle.com/javase/tutorial/essential/io/index.html

【讨论】：