使用 OpenCSV 将 CSV 解析为多个/嵌套的 bean 类型？答案

【问题标题】：Parse CSV to multiple/nested bean types with OpenCSV?使用 OpenCSV 将 CSV 解析为多个/嵌套的 bean 类型？
【发布时间】：2013-04-13 05:46:57
【问题描述】：

我有各种 CSV，其中包含一些标准列和一些完全随机的字段：

firstname, lastname, dog_name, fav_hat, fav_color
bill,smith,fido,porkpie,blue
james,smith,rover,bowler,purple


firstname, lastname, car_type, floor_number
tom, collins, ford, 14
jim, jones, toyota, 120

所以我试图将它们解析为 Person.class bean，其中包含名字和姓氏，然后我有第二个名为 PersonAttribute.class 的类来保存......其他任何东西。

两个类的基本概要：

class Person {
 public String firstname;
 public String lastname;
 public List<PersonAttribute> attribs;
}

class PersonAttribute {
 public Person p;
 public String key; // header name, ex. 'car_type'
 public String value; // column value, ex. 'ford'
}

我一直在 opencsv 中使用 CsvToBean 函数：

public static List<Person> parseToBeans(File csvFile, HashMap<String, String> mapStrategy, Class beanClass) throws IOException {
    CSVReader reader = null;
    try {
        reader = new CSVReader(new BufferedReader(new FileReader(csvFile)));

        HeaderColumnNameTranslateMappingStrategy<Person> strategy = new HeaderColumnNameTranslateMappingStrategy<>();
        strategy.setType(beanClass);
        strategy.setColumnMapping(mapStrategy);

        final CsvToBean<Person> csv = new CsvToBean<Person>() {
            @Override
            protected Object convertValue(String value, PropertyDescriptor prop) throws InstantiationException, IllegalAccessException {
                value = value.trim().replaceAll(" +", " ");
                return super.convertValue(value, prop);
            }
        };
        return csv.parse(strategy, reader);
    }
...

但是，当我为 Person.class bean 解析 csv 时，我不确定如何处理创建 PersonAttribute.class bean。我遇到了this post，我想知道是否需要切换到 supercsv 来轻松处理我想要做的事情？

【问题讨论】：

问题/答案有点误导，您询问了 opencsv 并添加了 supercsv 作为标签并收到了 supercsv 答案。你曾在哪里用 opencsv 解决过这个问题？
我最终切换到 SuperCSV，主要是因为它的积极开发，以及活跃的开发人员 Hound Dog 在这里。我很高兴我做到了，我们对 SuperCSV 非常满意

标签： java opencsv supercsv

【解决方案1】：

您当然可以使用 Super CSV 实现这一目标。

你可以使用

CsvBeanReader - 不支持索引映射，因此您需要在 bean 中创建一个辅助方法才能使用它
CsvDozerBeanReader - 支持开箱即用的索引映射，因此可以完全按照您的意愿行事（需要最近发布的 Super CSV 2.1.0）

使用 CsvBeanReader

如果您不想使用 Dozer 并且能够修改您的 bean 类，最简单的选择是在您的 bean 上添加一个虚拟 setter，CsvBeanReader 将使用它来填充属性。我假设您的 Person 和 PersonAttribute bean 具有为每个字段定义的公共无参数构造函数和 getter/setter（这是必需的）。

将以下虚拟设置器添加到您的 Person bean：

public void setAddAttribute(PersonAttribute attribute){
    if (attribs == null){
        attribs = new ArrayList<PersonAttribute>();
    }
    attribs.add(attribute);
}

创建一个自定义 cell processor，它将使用 CSV 标题中的适当键和 CSV 列中的值填充 PersonAttribute。

package org.supercsv.example;

import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.util.CsvContext;

/**
 * Creates a PersonAttribute using the corresponding header as the key.
 */
public class ParsePersonAttribute extends CellProcessorAdaptor {

    private final String[] header;

    public ParsePersonAttribute(final String[] header) {
        this.header = header;
    }

    public Object execute(Object value, CsvContext context) {

        if( value == null ) {
            return null;
        }

        PersonAttribute attribute = new PersonAttribute();
        // columns start at 1
        attribute.setKey(header[context.getColumnNumber() - 1]);
        attribute.setValue((String) value);
        return attribute;
    }

}

我认为以下示例主要说明了问题，但我应该指出以下几点：

我必须使用自定义首选项，因为您的 CSV 包含不属于数据的空格
我必须动态组装字段映射和单元处理器数组，因为您的数据具有未知数量的属性（此设置通常没有那么复杂）
属性的所有字段映射都使用addAttribute，对应于我们添加到您的bean的setAddAttribute()方法
我已经使用我们的自定义单元处理器为每个属性列创建了一个PersonAttribute bean

代码如下：

package org.supercsv.example;

import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.prefs.CsvPreference;

public class ReadWithCsvBeanReader {

    private static final String CSV = 
            "firstname, lastname, dog_name, fav_hat, fav_color\n"
            + "bill,smith,fido,porkpie,blue\n"
            + "james,smith,rover,bowler,purple";

    private static final String CSV2 = 
            "firstname, lastname, car_type, floor_number\n"
            + "tom, collins, ford, 14\n" + "jim, jones, toyota, 120";

    // attributes start at element 2 of the header array
    private static final int ATT_START_INDEX = 2;

    // custom preferences required because CSV contains 
    spaces that aren't part of the data
    private static final CsvPreference PREFS = 
        new CsvPreference.Builder(
            CsvPreference.STANDARD_PREFERENCE)
            .surroundingSpacesNeedQuotes(true).build();

    public static void main(String[] args) throws IOException {
        System.out.println("CsvBeanReader with first CSV input:");
        readWithCsvBeanReader(new StringReader(CSV));
        System.out.println("CsvBeanReader with second CSV input:");
        readWithCsvBeanReader(new StringReader(CSV2));
    }

    private static void readWithCsvBeanReader(final Reader reader)
            throws IOException {
        ICsvBeanReader beanReader = null;
        try {
            beanReader = new CsvBeanReader(reader, PREFS);

            final String[] header = beanReader.getHeader(true);

            // set up the field mapping and processors dynamically
            final String[] fieldMapping = new String[header.length];
            final CellProcessor[] processors = 
                    new CellProcessor[header.length];
            for (int i = 0; i < header.length; i++) {
                if (i < ATT_START_INDEX) {
                    // normal mappings
                    fieldMapping[i] = header[i];
                    processors[i] = new NotNull();
                } else {
                    // attribute mappings
                    fieldMapping[i] = "addAttribute";
                    processors[i] = 
                            new Optional(new ParsePersonAttribute(header));
                }
            }

            Person person;
            while ((person = beanReader.read(Person.class, fieldMapping,
                    processors)) != null) {
                System.out.println(String.format(
                        "lineNo=%s, rowNo=%s, person=%s",
                        beanReader.getLineNumber(), beanReader.getRowNumber(),
                        person));
            }

        } finally {
            if (beanReader != null) {
                beanReader.close();
            }
        }
    }

}

输出（我在你的 bean 中添加了 toString() 方法）：

CsvBeanReader with first CSV input:
lineNo=2, rowNo=2, person=Person [firstname=bill, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=fido], PersonAttribute [key=fav_hat, value=porkpie], PersonAttribute [key=fav_color, value=blue]]]
lineNo=3, rowNo=3, person=Person [firstname=james, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=rover], PersonAttribute [key=fav_hat, value=bowler], PersonAttribute [key=fav_color, value=purple]]]
CsvBeanReader with second CSV input:
lineNo=2, rowNo=2, person=Person [firstname=tom, lastname=collins, attribs=[PersonAttribute [key=car_type, value=ford], PersonAttribute [key=floor_number, value=14]]]
lineNo=3, rowNo=3, person=Person [firstname=jim, lastname=jones, attribs=[PersonAttribute [key=car_type, value=toyota], PersonAttribute [key=floor_number, value=120]]]

使用 CsvDozerBeanReader

如果您不能或不想修改您的 bean，那么我建议在 Super CSV Dozer Extension 项目中使用 CsvDozerBeanReader，因为它支持嵌套和索引字段映射。查看一些使用它的示例here。

以下是使用CsvDozerBeanReader 的示例。您会注意到它与 CsvBeanReader 示例几乎相同，但是：

它使用不同的阅读器（呃！）
它使用索引映射，例如attribs[0]
它通过调用configureBeanMapping() 来设置映射（而不是像CsvBeanReader 那样在read() 方法上接受字符串数组
它还设置了一些提示（更多内容见下文）

代码：

package org.supercsv.example;

import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.dozer.CsvDozerBeanReader;
import org.supercsv.io.dozer.ICsvDozerBeanReader;
import org.supercsv.prefs.CsvPreference;

public class ReadWithCsvDozerBeanReader {

    private static final String CSV = 
            "firstname, lastname, dog_name, fav_hat, fav_color\n"
            + "bill,smith,fido,porkpie,blue\n" 
            + "james,smith,rover,bowler,purple";

    private static final String CSV2 = 
            "firstname, lastname, car_type, floor_number\n" 
            + "tom, collins, ford, 14\n"
            + "jim, jones, toyota, 120";

    // attributes start at element 2 of the header array
    private static final int ATT_START_INDEX = 2;

    // custom preferences required because CSV contains spaces that aren't part of the data
    private static final CsvPreference PREFS = new CsvPreference.Builder(CsvPreference.STANDARD_PREFERENCE)
        .surroundingSpacesNeedQuotes(true).build();

    public static void main(String[] args) throws IOException {
        System.out.println("CsvDozerBeanReader with first CSV input:");
        readWithCsvDozerBeanReader(new StringReader(CSV));
        System.out.println("CsvDozerBeanReader with second CSV input:");
        readWithCsvDozerBeanReader(new StringReader(CSV2));
    }

    private static void readWithCsvDozerBeanReader(final Reader reader) throws IOException {
        ICsvDozerBeanReader beanReader = null;
        try {
            beanReader = new CsvDozerBeanReader(reader, PREFS);

            final String[] header = beanReader.getHeader(true);

            // set up the field mapping, processors and hints dynamically
            final String[] fieldMapping = new String[header.length];
            final CellProcessor[] processors = new CellProcessor[header.length];
            final Class<?>[] hintTypes = new Class<?>[header.length];
            for( int i = 0; i < header.length; i++ ) {
                if( i < ATT_START_INDEX ) {
                    // normal mappings
                    fieldMapping[i] = header[i];
                    processors[i] = new NotNull();
                } else {
                    // attribute mappings
                    fieldMapping[i] = String.format("attribs[%d]", i - ATT_START_INDEX);
                    processors[i] = new Optional(new ParsePersonAttribute(header));
                    hintTypes[i] = PersonAttribute.class;
                }
            }

            beanReader.configureBeanMapping(Person.class, fieldMapping, hintTypes);

            Person person;
            while( (person = beanReader.read(Person.class, processors)) != null ) {
                System.out.println(String.format("lineNo=%s, rowNo=%s, person=%s", 
                    beanReader.getLineNumber(),
                    beanReader.getRowNumber(), person));
            }

        }
        finally {
            if( beanReader != null ) {
                beanReader.close();
            }
        }
    }

}

输出：

CsvDozerBeanReader with first CSV input:
lineNo=2, rowNo=2, person=Person [firstname=bill, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=fido], PersonAttribute [key=fav_hat, value=porkpie], PersonAttribute [key=fav_color, value=blue]]]
lineNo=3, rowNo=3, person=Person [firstname=james, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=rover], PersonAttribute [key=fav_hat, value=bowler], PersonAttribute [key=fav_color, value=purple]]]
CsvDozerBeanReader with second CSV input:
lineNo=2, rowNo=2, person=Person [firstname=tom, lastname=collins, attribs=[PersonAttribute [key=car_type, value=ford], PersonAttribute [key=floor_number, value=14]]]
lineNo=3, rowNo=3, person=Person [firstname=jim, lastname=jones, attribs=[PersonAttribute [key=car_type, value=toyota], PersonAttribute [key=floor_number, value=120]]]

在整理这个示例时，我在 Super CSV 2.0.1 中发现了一个带有 CsvDozerBeanReader 的错误，当您组合 cell processor 时（例如我在上面示例中创建的用于解析每个人属性键/值的错误） , 带有索引映射，例如：

"firstname","lastname","attribs[0]","attribs[1]"

我刚刚发布了修复此问题的 Super CSV 2.1.0。事实证明，Dozer 需要为索引映射配置一个提示才能正常工作。我不是 100% 确定为什么，因为当您摆脱自定义单元处理器并使用以下（深度）映射时，它完全能够创建每个 PersonAttribute 并将其添加到正确的索引中：

"firstname","lastname","attribs[0].value","attribs[1].value"

我希望这会有所帮助:)

【讨论】：

哇，感谢您的详细回复，非常感谢。如果“标准列”可以出现在 csv 中的任何位置，而不总是出现在前两列中，会有多困难？
嗯，这取决于。你能从标题中看出吗？看起来“属性列”有下划线，但这可能是巧合。如果你 a) 事先知道格式，或者 b) 可以从标题中分辨出来，这才是真正可能的
是的，我总是可以通过标题名称知道，用户可以将名为“banana”的列定义为“first_name”或“title”等，所以我总是知道何时映射它解析为bean（如果不需要列，则自动将其设为属性）。我也可以按照您的要求编辑我的 Person bean，您是否仍然建议等待 Dozer 修复并采用该路线而不是普通 CsvBeanReader？
只要你有办法区分标准字段和属性——你可以简单地将我的代码示例中的条件（当前为if (i < ATT_START_INDEX)）替换为适当的条件（例如if "firstname".equals(header[i]) || "lastname".equals(header[i]))或其他东西）更有活力）。 CsvDozerBeanReader 解决方案很简洁，如果您无法修改 bean，可能是更好的解决方案。否则CsvBeanReader 解决方案将始终更快，并且不需要任何额外的依赖项。您可以随时在它发布时尝试一下，然后再决定！
ok @xref 我已经用CsvDozerBeanReader 的使用细节更新了答案，并发布了Super CSV 的固定版本。你可以去project website了解更多。