逐行读取ascii文件 - Java答案

【问题标题】：Reading ascii file line by line - Java逐行读取ascii文件 - Java
【发布时间】：2016-01-24 14:21:52
【问题描述】：

我正在尝试读取一个 ascii 文件并识别换行符“\n”的位置，以了解每行中有哪些字符和多少个字符。文件大小为 538MB。当我运行下面的代码时，它从不打印任何东西。我搜索了很多，但我没有找到任何 ascii 文件。我使用 netbeans 和 Java 8。有什么想法吗？？

下面是我的代码。

String inputFile = "C:\myfile.txt";
FileInputStream in = new FileInputStream(inputFile);
FileChannel ch = in.getChannel();
int BUFSIZE = 512;
ByteBuffer buf = ByteBuffer.allocateDirect(BUFSIZE);
Charset cs = Charset.forName("ASCII");

while ( (rd = ch.read( buf )) != -1 ) {
        buf.rewind();
        CharBuffer chbuf = cs.decode(buf);

        for ( int i = 0; i < chbuf.length(); i++ ) {
             if (chbuf.get() == '\n'){
                System.out.println("PRINT SOMETHING");
             }
        }
}

【问题讨论】：

你看过stackoverflow.com/questions/4716503/…吗？
我已经看过这篇文章，但是使用 BufferReader 它会抛出 Java Out of Memory 错误，所以我无法使用 readline() 函数。
对大文件使用RandomAccessFile 而不是FileReaders。

标签： java netbeans ascii

【解决方案1】：

一行中的字符数是readLine调用读取的字符串的长度：

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
    int iLine = 0;
    String line;
    while ((line = br.readLine()) != null) {
        System.out.println( "Line " + iLine + " has " +
                            line.length() + " characters." );
        iLine++;
    }
} catch( IOException ioe ){
    // ...
}

请注意，readLine 已经从字符串中删除了（取决于系统的）行结束标记。

如果一个非常大的文件不包含换行符，则确实可能会耗尽内存。逐字阅读可以避免这种情况。

文件文件 = 新文件（“Z.java”）； Reader reader = new FileReader(file); 国际长度 = 0; 诠释 c; 诠释 iLine = 0; while( (c = reader.read()) != -1) { 如果（ c == '\n' ）{ iLine++; System.out.println("行" + iLine +" 包含" + len + "字符" ); 长度 = 0; } 别的 { 伦++; } } reader.close();

【讨论】：

使用 BufferedReader 它会抛出 java.lang.OutOfMemoryError: Java heap space。这就是我使用 ByteBuffer 的原因。
@Iostromos 整个文件是否可能不包含行尾？这是一个“常规”文本文件还是一些奇怪的字节？
@Iostromos 添加了一个不存储任何文件数据的版本 - 这应该没问题。（如果太慢：可以改进。）
真的让我OOM，你想发布什么样的可验证示例？这是我刚刚捕获的屏幕截图中的链接，带有错误消息link
该文件包含换行符，它是一个 65x65 数组，元素为 0 和 1，但采用 ascii 格式。

【解决方案2】：

您应该使用FileReader，这是读取字符文件的便利类。

FileInputStream javs docs clearly states

FileInputStream 用于读取原始字节流，例如图像数据。对于读取字符流，请考虑使用文件阅读器。

试试下面

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
    String line;
    while ((line = br.readLine()) != null) {
       for (int pos = line.indexOf("\n"); pos != -1; pos = line.indexOf("\n", pos + 1)) {
        System.out.println("\\n at " + pos);
       }
    }
}

【讨论】：

由于编译错误而停止运行。如何忽略它？？
如果这个 sn-p 会打印任何东西，那就太令人惊讶了。
@laune 我已经纠正了一个小错误。如果您仍然认为它不起作用，请告诉我？
抛出 java.lang.OutOfMemoryError: Java heap space 错误。

【解决方案3】：

将文件内容存储到字符串的方法：

static String readFile(String path, Charset encoding) throws IOException 
{
    byte[] encoded = Files.readAllBytes(Paths.get(path));
    return new String(encoded, encoding);
}

这是一种查找整个字符串中某个字符出现次数的方法：

public static void main(String [] args) throws IOException
{
    List<Integer> indexes = new ArrayList<Integer>();
    String content = readFile("filetest", StandardCharsets.UTF_8);
    int index = content.indexOf('\n');
    while (index >= 0)
    {
        indexes.add(index);
        index = content.indexOf('\n', index + 1);
    }
}

找到here 和here。

【讨论】：

这个方法也会抛出内存不足的错误。我提到了一个高达 538MB 的大文件。