从一行文件中获取一个字符串？答案

【问题标题】：Grab a String from a file at a line?从一行文件中获取一个字符串？
【发布时间】：2016-04-14 05:21:47
【问题描述】：

目前，为了从文件中找到我想要的行，我正在逐行读取文件，直到我要查找的字符串与当前行匹配。

这似乎是一种不好的编码习惯，因为我的文件有 1000 多行；有没有办法告诉扫描仪或缓冲阅读器（或其他东西？）从给定行的字符创建字符串？

编辑：正如 ajb 指出的那样，这似乎在物理上是不可能的。

我想最好的解决方案是将整个文件读入 String[] 行。

【问题讨论】：

我真的看不出有什么办法可以扫描整个文件来找到你想要的。
在 Notepad++ 等编辑器中对性能进行基准测试，您会看到超过 100K 行的文件存在延迟。
除非您在 VMS 系统上（如果还有这些系统），文本文件将存储为字符序列，每行之间带有 \n 或 \r\n。没有任何“索引”或任何东西可以告诉系统每行的开头在哪里，或者任何其他有助于加快速度的元数据。这有点像如果我给你一本书并说“在书中找到第 1000 个'e'”。没有办法，只能从头开始计数。
如果您自己创建文本文件，则可以添加索引。如果您正在编写自己的文字处理器，该处理器将被优化以处理非常大的文件，也许您可以考虑这样的事情。但是您不能使用从任何地方获得的任意文本文件来执行此操作。实际上，您要创建的并不是真正的纯“文本”文件。它具有自定义格式，就像 .docx 或 .odt 文件有自己的格式一样。
如果您有非常大的文件，您可以将一个大文件拆分为多个较小的文件/块，并使用多个线程进行搜索/匹配。

标签： java string file-io java.util.scanner bufferedreader

【解决方案1】：

是的，您可以设置文件读取或写入的偏移量。同样使用 RandomAccessFile API。包括下面的示例代码。

import java.io.*;

public class RandomAccessFileDemo {

   public static void main(String[] args) {
      try {
         // create a new RandomAccessFile with filename test
         RandomAccessFile raf = new RandomAccessFile("F:/test.txt", "r");

         System.out.println("Output without setting offset, i.e. from start of file");
         // print the lines
         String temp="";
         while((temp = raf.readLine()) != null)
            System.out.println(temp);

            System.out.println();
         // set the file pointer at 20 position
         raf.seek(20);
            System.out.println("Output using seek and setting offset to 20");
         // print the line
         while((temp = raf.readLine()) != null)
            System.out.println(temp);

      } catch (IOException ex) {
         ex.printStackTrace();
      }
   }
}

这是我放在F盘的示例test.txt

This is an example
Hello World
Trying RandomAccessFile

这是程序的输出

Output without setting offset, i.e. from start of file
This is an example
Hello World
Trying RandomAccessFile

Output using seek and setting offset to 20
Hello World
Trying RandomAccessFile

【讨论】：

这如何帮助找到第 n 行？
@ajb 它没有。问题中的要求是在某些字符之后开始阅读
不，你误解了这个问题。
我猜，大概是这样。如果这是要求，上述部分将起作用。他提到“从字符 AT 创建字符串”，我误解为从某些字符/之后读取它

【解决方案2】：

尝试使用多线程概念，因为文件中的行数更多。

private void multiThreadRead(int num){

    for(int i=1; i<= num; i++) { 
        new Thread(readIndivColumn(i),""+i).start(); 
     } 
}

private Runnable readIndivColumn(final int colNum){
    return new Runnable(){
        @Override
        public void run() {
            // TODO Auto-generated method stub
            try {

                long startTime = System.currentTimeMillis();
                System.out.println("From Thread no:"+colNum+" Start time:"+startTime);

                RandomAccessFile raf = new RandomAccessFile("./src/test/test1.csv","r");
                String line = "";
                //System.out.println("From Thread no:"+colNum);

                while((line = raf.readLine()) != null){
                    //System.out.println(line);
                    //System.out.println(StatUtils.getCellValue(line, colNum));
                }


                long elapsedTime = System.currentTimeMillis() - startTime;

                String formattedTime = String.format("%d min, %d sec",  
                        TimeUnit.MILLISECONDS.toMinutes(elapsedTime), 
                        TimeUnit.MILLISECONDS.toSeconds(elapsedTime) -  
                        TimeUnit.MINUTES.toSeconds(TimeUnit.MILLISECONDS.toMinutes(elapsedTime)) 
                    );

                System.out.println("From Thread no:"+colNum+" Finished Time:"+formattedTime);
            } 
            catch (Exception e) {
                // TODO Auto-generated catch block
                System.out.println("From Thread no:"+colNum +"===>"+e.getMessage());

                e.printStackTrace();
            }
        }
    };
}

private void sequentialRead(int num){
    try{
        long startTime = System.currentTimeMillis();
        System.out.println("Start time:"+startTime);

        for(int i =0; i < num; i++){
            RandomAccessFile raf = new RandomAccessFile("./src/test/test1.csv","r");
            String line = "";

            while((line = raf.readLine()) != null){
                //System.out.println(line);
            }               
        }

        long elapsedTime = System.currentTimeMillis() - startTime;

        String formattedTime = String.format("%d min, %d sec",  
                TimeUnit.MILLISECONDS.toMinutes(elapsedTime), 
                TimeUnit.MILLISECONDS.toSeconds(elapsedTime) -  
                TimeUnit.MINUTES.toSeconds(TimeUnit.MILLISECONDS.toMinutes(elapsedTime)) 
            );

        System.out.println("Finished Time:"+formattedTime);
    }
    catch (Exception e) {
        e.printStackTrace();
        // TODO: handle exception
    }

}
    public TesterClass() {

    sequentialRead(1);      
    this.multiThreadRead(1);

}

【讨论】：

【解决方案3】：

Java NIO 有很多新的方法和简单的方法来做你想做的事：

public List<String> getLinesInFile(File f){
    return Files.readAllLines(f.toPath());
}

或者您可以将其解析为一个大字符串并使用 contains 方法对其进行搜索：

     /**
     * Uses static methods in the Files class of NIO
     * Reads everything in a file, and puts it in a String
     * @param file the file to read
     * @return a String representing the contents of the file
     * @throws IOException "if an I/O error occurs reading from the stream" (Files.readAllBytes javadoc)
     */
    public String readFileContents(File file) throws IOException {
        String filecontents = "";
        filecontents = new String(Files.readAllBytes(Paths.get(file.toURI())));
        return filecontents;
    }

     /**
     * Checks if a string contains another string, ignoring case
     * @param word the string to look for
     * @param contents the string to look for the other string in
     * @return If it does contain the word, returns true. Otherwise returns false. Ignoring case.
     */
    private boolean containsIgnoreCase(String word, String contents) {
        String w = word.toLowerCase();
        String c = contents.toLowerCase();
        return c.contains(w);
    }

【讨论】：