【问题标题】:Merge CSV files into a single file with no repeated headers将 CSV 文件合并为一个没有重复标题的文件
【发布时间】:2013-08-02 15:10:47
【问题描述】:

我有一些列标题相同的 CSV 文件。例如

文件 A

header1,header2,header3
one,two,three
four,five,six

文件 B

header1,header2,header3
seven,eight,nine
ten,eleven,twelve

我想合并它,以便将数据合并到一个文件中,顶部为标题,但其他任何地方都没有标题。

header1,header2,header3
one,two,three
four,five,six
seven,eight,nine
ten,eleven,twelve

有什么好的方法可以做到这一点?

【问题讨论】:

  • 我假设您知道如何逐行读取文件。将行写回文件时,请跳过每个后续文件中的第一行。
  • 要么创建一个仅包含标题的文本文件,并将每个 CSV 附加到该文件中,同时跳过第一行,或者读取所有文件,而不跳过第一个文件的第一行。第一个会更容易一些,如果你有不同的文件集和不同的标题,第二个会相对容易和更便携。

标签: java csv


【解决方案1】:

这应该可以。它检查正在合并的文件是否具有匹配的标题。否则会抛出异常。异常处理(关闭流等)已留作练习。

String[] headers = null;
String firstFile = "/path/to/firstFile.dat";
Scanner scanner = new Scanner(new File(firstFile));

if (scanner.hasNextLine())
    headers[] = scanner.nextLine().split(",");

scanner.close();

Iterator<File> iterFiles = listOfFilesToBeMerged.iterator();
BufferedWriter writer = new BufferedWriter(new FileWriter(firstFile, true));

while (iterFiles.hasNext()) {
  File nextFile = iterFiles.next();
  BufferedReader reader = new BufferedReader(new FileReader(nextFile));

  String line = null;
  String[] firstLine = null;
  if ((line = reader.readLine()) != null)
    firstLine = line.split(",");

  if (!Arrays.equals (headers, firstLine))
    throw new FileMergeException("Header mis-match between CSV files: '" +
              firstFile + "' and '" + nextFile.getAbsolutePath());

  while ((line = reader.readLine()) != null) {
    writer.write(line);
    writer.newLine();
  }

  reader.close();
}
writer.close();

【讨论】:

    【解决方案2】:

    这里很晚,但 Fuzzy-Csv (https://github.com/kayr/fuzzy-csv/) 就是为此而设计的。

    这就是代码的样子

            String csv1 = "NAME,SURNAME,AGE\n" +
                    "Fred,Krueger,Unknown";
    
            String csv2 = "NAME,MIDDLENAME,SURNAME,AGE\n" +
                    "Jason,Noname,Scarry,16";
    
            FuzzyCSVTable t1 = FuzzyCSVTable.parseCsv(csv1);
            FuzzyCSVTable t2 = FuzzyCSVTable.parseCsv(csv2);
    
            FuzzyCSVTable output = t1.mergeByColumn(t2);
    
            output.printTable();
    

    输出

    ╔═══════╤═════════╤═════════╤════════════╗
    ║ NAME  │ SURNAME │ AGE     │ MIDDLENAME ║
    ╠═══════╪═════════╪═════════╪════════════╣
    ║ Fred  │ Krueger │ Unknown │ -          ║
    ╟───────┼─────────┼─────────┼────────────╢
    ║ Jason │ Scarry  │ 16      │ Noname     ║
    ╚═══════╧═════════╧═════════╧════════════╝
    

    您可以使用其中一种辅助方法重新导出 csv

    output.write("FilePath.csv");
    
    or 
    
    output.toCsvString()
    
    

    【讨论】:

      【解决方案3】:

      在 Java 中这样做似乎有点重量级。它在 Linux shell 中是微不足道的:

      (cat FileA ; tail --lines=+2 FileB) > FileC
      

      【讨论】:

        【解决方案4】:

        这是一个例子:

        public static void main(String[] args) throws IOException {
            List<Path> paths = Arrays.asList(Paths.get("c:/temp/file1.csv"), Paths.get("c:/temp/file2.csv"));
            List<String> mergedLines = getMergedLines(paths);
            Path target = Paths.get("c:/temp/merged.csv");
            Files.write(target, mergedLines, Charset.forName("UTF-8"));
        }
        
        private static List<String> getMergedLines(List<Path> paths) throws IOException {
            List<String> mergedLines = new ArrayList<> ();
            for (Path p : paths){
                List<String> lines = Files.readAllLines(p, Charset.forName("UTF-8"));
                if (!lines.isEmpty()) {
                    if (mergedLines.isEmpty()) {
                        mergedLines.add(lines.get(0)); //add header only once
                    }
                    mergedLines.addAll(lines.subList(1, lines.size()));
                }
            }
            return mergedLines;
        }
        

        【讨论】:

        【解决方案5】:

        之前:

        idFile#x_y.csv

        之后:

        idFile.csv

        例如:

        100#1_2.csv + 100#2_2.csv > 100.csv

        100#1_2.csv 包含:

        "one","two","three"
        "a","b","c"
        "d","e","f"
        

        100#2_2.csv 包含:

        "one","two","three"
        "g","h","i"
        "j","k","l"
        

        100.csv 包含:

        "one","two","three"
        "a","b","c"
        "d","e","f"    
        "g","h","i"
        "j","k","l"
        

        来源:

        //MergeDemo.java
        import java.io.BufferedReader;
        import java.io.BufferedWriter;
        import java.io.File;
        import java.io.FileNotFoundException;
        import java.io.FileReader;
        import java.io.FileWriter;
        import java.io.IOException;
        import java.util.ArrayList;
        //import java.util.Arrays;
        import java.util.Iterator;
        import java.util.Scanner;
        
        public class MergeDemo {
        
            public static void main(String[] args) {
        
                String idFile = "100";
                int numFiles = 3;
        
                try {
                    mergeCsvFiles(idFile, numFiles);
                } catch (IOException e) {
                    e.printStackTrace();
                }
        
            }
        
            private static void mergeCsvFiles(String idFile, int numFiles) throws IOException {
        
                // Variables
                ArrayList<File> files = new ArrayList<File>();
                Iterator<File> iterFiles;
                File fileOutput;
                BufferedWriter fileWriter;
                BufferedReader fileReader;
                String csvFile;
                String csvFinal = "C:\\out\\" + idFile + ".csv";
                String[] headers = null;
                String header = null;
        
                // Files: Input
                for (int i = 1; i <= numFiles; i++) {
                    csvFile = "C:\\in\\" + idFile + "#" + i + "_" + numFiles + ".csv";
                    files.add(new File(csvFile));
                }
        
                // Files: Output
                fileOutput = new File(csvFinal);
                if (fileOutput.exists()) {
                    fileOutput.delete();
                }
                try {
                    fileOutput.createNewFile();
                    // log
                    // System.out.println("Output: " + fileOutput);
                } catch (IOException e) {
                    // log
                }
        
                iterFiles = files.iterator();
                fileWriter = new BufferedWriter(new FileWriter(csvFinal, true));
        
                // Headers
                Scanner scanner = new Scanner(files.get(0));
                if (scanner.hasNextLine())
                    header = scanner.nextLine();
                // if (scanner.hasNextLine()) headers = scanner.nextLine().split(";");
                scanner.close();
        
                /*
                 * System.out.println(header); for(String s: headers){
                 * fileWriter.write(s); System.out.println(s); }
                 */
        
                fileWriter.write(header);
                fileWriter.newLine();
        
                while (iterFiles.hasNext()) {
        
                    String line;// = null;
                    String[] firstLine;// = null;
        
                    File nextFile = iterFiles.next();
                    fileReader = new BufferedReader(new FileReader(nextFile));
        
                    if ((line = fileReader.readLine()) != null)
                        firstLine = line.split(";");
        
                    while ((line = fileReader.readLine()) != null) {
                        fileWriter.write(line);
                        fileWriter.newLine();
                    }
                    fileReader.close();
                }
        
                fileWriter.close();
        
            }
        
        }
        

        【讨论】:

          猜你喜欢
          • 2015-10-29
          • 1970-01-01
          • 2017-11-03
          • 2017-11-01
          • 2018-03-30
          • 2019-10-11
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多