【问题标题】:How can Univocity Parsers read a .csv file when the headers are not on the first line?当标题不在第一行时,Univocity 解析器如何读取 .csv 文件?
【发布时间】:2016-02-06 03:44:30
【问题描述】:

当标题不在第一行时,Univocity 解析器如何读取 .csv 文件?

如果 .csv 文件的第一行不是标题,则会出现错误。

代码和堆栈跟踪如下。

任何帮助将不胜感激。

import com.univocity.parsers.csv.CsvParserSettings;
import com.univocity.parsers.common.processor.*;
import com.univocity.parsers.csv.*;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.UnsupportedEncodingException;
import java.lang.IllegalStateException;
import java.lang.String;
import java.util.List;


public class UnivocityParsers {

public Reader getReader(String relativePath) {
    try {
        return new InputStreamReader(this.getClass().getResourceAsStream(relativePath), "Windows-1252");
    } catch (UnsupportedEncodingException e) {
        throw new IllegalStateException("Unable to read input", e);
    }
}


public void columnSelection() {
    RowListProcessor rowProcessor = new RowListProcessor();
    CsvParserSettings parserSettings = new CsvParserSettings();

    parserSettings.setRowProcessor(rowProcessor);
    parserSettings.setHeaderExtractionEnabled(true);
    parserSettings.setLineSeparatorDetectionEnabled(true);
    parserSettings.setSkipEmptyLines(true);

    // Here we select only the columns "Price", "Year" and "Make".
    // The parser just skips the other fields
    parserSettings.selectFields("AUTHOR", "ISBN");

    CsvParser parser = new CsvParser(parserSettings);
    parser.parse(getReader("list2.csv"));

    List<String[]> rows = rowProcessor.getRows();

    String[] strings = rows.get(0);

    System.out.print(strings[0]);

}


public static void main(String arg[]) {

    UnivocityParsers univocityParsers = new UnivocityParsers();

    univocityParsers.columnSelection();


}


}

堆栈跟踪:

    Exception in thread "main" com.univocity.parsers.common.TextParsingException: Error processing input: java.lang.IllegalStateException - Unknown field names: [author, isbn]. Available fields are: [list of books by author - created today]

这是正在解析的文件:

List of books by Author - Created today
"REVIEW_DATE","AUTHOR","ISBN","DISCOUNTED_PRICE"
"1985/01/21","Douglas Adams",0345391802,5.95
"1990/01/12","Douglas Hofstadter",0465026567,9.95
"1998/07/15","Timothy ""The Parser"" Campbell",0968411304,18.99
"1999/12/03","Richard Friedman",0060630353,5.95
"2001/09/19","Karen Armstrong",0345384563,9.95
"2002/06/23","David Jones",0198504691,9.95
"2002/06/23","Julian Jaynes",0618057072,12.50
"2003/09/30","Scott Adams",0740721909,4.95
"2004/10/04","Benjamin Radcliff",0804818088,4.95
"2004/10/04","Randel Helms",0879755725,4.50

【问题讨论】:

    标签: parsing csv


    【解决方案1】:

    从今天开始,您可以在 2.0.0-SNAPSHOT 上执行此操作:

    settings.setNumberOfRowsToSkip(1);
    

    在 1.5.6 版本上,您可以这样做以跳过第一行并正确抓取标题:

    RowListProcessor rowProcessor = new RowListProcessor(){
            @Override
            public void processStarted(ParsingContext context) {
                super.processStarted(context);
                context.skipLines(1);
            }
        };
    

    如果您的输入文件(如果您可以控制文件的生成方式)通过在要丢弃的行的开头添加#,另一种方法是注释第一行:

    #List of books by Author - Created today
    

    【讨论】:

    • 使用univocity-parsers-1.5.6.jar 尝试上述解决方案代码时,出现错误:java.lang.IllegalStateException: Unknown field names: [author, isbn, review_date]. Available fields are: [1985/01/21, douglas adams, 0345391802, 5.95]。通过将 pom.xml 从该快照添加到 IntelliJ 项目来尝试 2.0.0-SNAPSHOT 解决方案时:Error:(53, 23) java: cannot find symbol symbol: method setNumberOfRowsToSkip(int) location: variable parserSettings of type com.univocity.parsers.csv.CsvParserSettings。另外,IntelliJ 没有找到 pom.xml 插件:&lt;artifactId&gt;maven-gpg-plugin&lt;/artifactId&gt;.
    • 1.5.6 解决方案有效(即覆盖processStartedMethod),我自己使用您发布的代码对其进行了测试。显然您也应用了其他更改,它抓住了第三行用作标题而不是第二行。在 2.0.0-SNAPSHOT 版本上,您不需要将 pom 文件复制过来,这将永远无法工作。您要么更新自己的 pom.xml,要么从 [oss.sonatype.org/content/repositories/snapshots/com/univocity/…) 中获取 jar
    • 我将univocity-parsers-2.0.0-20151111.095007-18.jar 放入项目库中,但是,当我使用:parserSettings.setNumberOfRowsToSkip(1); 调用新方法时出现错误:Cannot resolve method 'setNumberOfRowsToSkip(int)'
    • 对不起,我发送了一个错误的链接。这就是你需要的:oss.sonatype.org/content/repositories/snapshots/com/univocity/…
    • 嗨,这个链接:oss.sonatype.org/content/repositories/snapshots/com/univocity/… 实际上会导致 404。有可用的链接吗?
    猜你喜欢
    • 1970-01-01
    • 2016-03-20
    • 2012-06-28
    • 2011-04-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-12-29
    • 1970-01-01
    相关资源
    最近更新 更多