【问题标题】:Java: Javolution: How to use UTF8StreamReader properly? Error occurs Caused by: java.lang.ArrayIndexOutOfBoundsException: 2048Java:Javolution:如何正确使用 UTF8StreamReader?发生错误原因:java.lang.ArrayIndexOutOfBoundsException: 2048
【发布时间】:2011-06-20 00:54:49
【问题描述】:

代码如下:

public static void mergeAllFilesJavolution()throws FileNotFoundException, IOException {
    String fileDir = "C:\\TestData\\w12";
    File dirSrc = new File(fileDir);
    File[] list = dirSrc.listFiles();
    long start = System.currentTimeMillis();
    for(int j=0; j<list.length; j++){
        int chr;
        String srcFile = list[j].getPath();
        String outFile = fileDir + "\\..\\merged.txt";
        UTF8StreamReader inFile=new UTF8StreamReader().setInput(new FileInputStream(srcFile));
        UTF8StreamWriter outPut=new UTF8StreamWriter().setOutput(new FileOutputStream(outFile, true)); 
        while((chr=inFile.read()) != -1) {
            outPut.write(chr);
        }
        outPut.close();
        inFile.close();
    }
    System.out.println(System.currentTimeMillis()-start);
}

utf-8 文件的文件大小为 200MB 作为测试数据,但800MB 的可能性很大

这是 UTF8StreamReader.read() 源代码。

/**
 * Holds the bytes buffer.
 */
private final byte[] _bytes;

/**
 * Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).
 */
public UTF8StreamReader() {
    _bytes = new byte[2048];
}

/**
 * Reads a single character.  This method will block until a character is
 * available, an I/O error occurs or the end of the stream is reached.
 *
 * @return the 31-bits Unicode of the character read, or -1 if the end of
 *         the stream has been reached.
 * @throws IOException if an I/O error occurs.
 */
public int read() throws IOException {
    byte b = _bytes[_start];
    return ((b >= 0) && (_start++ < _end)) ? b : read2();
}

错误发生在 _bytes[_start],因为 _bytes = new byte[2048]。

这是另一个 UTF8StreamReader 构造函数:

/**
 * Creates a UTF-8 reader having a byte buffer of specified capacity.
 * 
 * @param capacity the capacity of the byte buffer.
 */
public UTF8StreamReader(int capacity) {
    _bytes = new byte[capacity];
}

问题:如何在创建 UTF8StreamReader 时指定 _bytes 的正确容量

尝试了 File.length(),但它返回 long 类型(我认为它是正确的,因为我期望文件很大,但构造函数只接收 int 类型)。

感谢任何有关正确方向的指导。

【问题讨论】:

    标签: java byte javolution


    【解决方案1】:

    似乎没有人对上述情况有过同样的经历。

    无论如何,我尝试了其他解决方案,不使用上述类 (UTF8StreamReader) 而不是 ByteBuffer (UTF8ByteBufferReader)。它比 StreamReader 快得令人难以置信。

    Faster Merging Files by using ByteBuffer

    【讨论】:

      猜你喜欢
      • 2015-12-30
      • 1970-01-01
      • 1970-01-01
      • 2019-01-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多