【问题标题】:apex parse csv that contains double quote in every single records顶点解析在每条记录中包含双引号的csv
【发布时间】:2017-07-28 02:22:09
【问题描述】:
public static List<List<String>> parseCSV(String contents,Boolean skipHeaders) {
List<List<String>> allFields = new List<List<String>>();

// replace instances where a double quote begins a field containing a comma
// in this case you get a double quote followed by a doubled double quote
// do this for beginning and end of a field
contents = contents.replaceAll(',"""',',"DBLQT').replaceall('""",','DBLQT",');
// now replace all remaining double quotes - we do this so that we can reconstruct
// fields with commas inside assuming they begin and end with a double quote
contents = contents.replaceAll('""','DBLQT');
// we are not attempting to handle fields with a newline inside of them
// so, split on newline to get the spreadsheet rows
List<String> lines = new List<String>();
try {
    lines = contents.split('\n');
} catch (System.ListException e) {
    System.debug('Limits exceeded?' + e.getMessage());
}
Integer num = 0;
for(String line : lines) {
    // check for blank CSV lines (only commas)
    if (line.replaceAll(',','').trim().length() == 0) break;

    List<String> fields = line.split(',');  
    List<String> cleanFields = new List<String>();
    String compositeField;
    Boolean makeCompositeField = false;
    for(String field : fields) {
        if (field.startsWith('"') && field.endsWith('"')) {
            cleanFields.add(field.replaceAll('DBLQT','"'));
        } else if (field.startsWith('"')) {
            makeCompositeField = true;
            compositeField = field;
        } else if (field.endsWith('"')) {
            compositeField += ',' + field;
            cleanFields.add(compositeField.replaceAll('DBLQT','"'));
            makeCompositeField = false;
        } else if (makeCompositeField) {
            compositeField +=  ',' + field;
        } else {
            cleanFields.add(field.replaceAll('DBLQT','"'));
        }
    }

    allFields.add(cleanFields);

}


if(skipHeaders)allFields.remove(0);

return allFields;       
}

我使用这部分来解析 CSV 文件,但是当 CSV 都被双引号包围时,我发现我无法解析。

例如,我有这样的记录 "a","b","c","d,e,f","g"

解析后,我想得到这些 a b c d,e,f g

【问题讨论】:

  • d,e,f 应该放在一个记录中

标签: string csv salesforce apex double-quotes


【解决方案1】:

据我所知,您要做的第一件事就是用逗号分隔从 CSV 文件中获取的行,使用以下行:

List fields = line.split(',');

当您对自己的示例 ("a","b","c","d,e,f","g") 执行此操作时,您得到的字符串列表是:

字段 = ("a" | "b" | "c" | "d | e | f" | "g" ),其中条形用于分隔列表元素

这里的问题是,如果您首先用逗号分隔,则将那些属于字段的逗号(因为它们实际上出现在引号内)与分隔 CSV 中的字段的逗号区分开来会有点困难.

我建议尝试用引号分割行,这会给你这样的东西:

字段 = (a | , | b | , | c | , | d, e, f | , | g)

并过滤掉列表中只有逗号和/或空格的任何元素,最终实现:

字段 = (a | b | c | d, e, f | g)


(已编辑)

您使用的是 Java 吗? 无论如何,这是一个 Java 代码,可以完成您正在尝试做的事情:

import java.lang.*;

import java.util.*;

public class HelloWorld
{
    public static ArrayList<ArrayList<String>> parseCSV(String contents,Boolean skipHeaders) {
    ArrayList<ArrayList<String>> allFields = new ArrayList<ArrayList<String>>();

    // separating the file in lines
    List<String> lines = new ArrayList<String>();
    lines = Arrays.asList(contents.split("\n"));

    // ignoring header, if needed
    if(skipHeaders) lines.remove(0);

    // for each line
    for(String line : lines) {
        List<String> fields = Arrays.asList(line.split("\""));  
        ArrayList<String> cleanFields = new ArrayList<String>();
        Boolean isComma = false; 
        for(String field : fields) {
          // ignore elements that don't have useful data
          // (every other element after splitting by quotes)
          isComma = !isComma;
          if (isComma) continue;

          cleanFields.add(field);
        }

        allFields.add(cleanFields);
    }

    return allFields;       
  }

  public static void main(String[] args)
  {
    // example of input file:
    // Line 1: "a","b","c","d,e,f","g"
    // Line 2: "a1","b1","c1","d1,e1,f1","g1"
    ArrayList<ArrayList<String>> strings = HelloWorld.parseCSV("\"a\",\"b\",\"c\",\"d,e,f\",\"g\"\n\"a1\",\"b1\",\"c1\",\"d1,e1,f1\",\"g1\"",false);
    System.out.println("Result:");
    for (ArrayList<String> list : strings) {
      System.out.println("  New List:");
      for (String str : list) {
        System.out.println("    - " + str);
      }
    }
  }
}

【讨论】:

  • 如何删除字符串中的双引号?似乎代码不起作用。
  • field.replaceAll('DBLQT','"')
猜你喜欢
  • 2016-04-15
  • 2020-03-17
  • 1970-01-01
  • 2023-03-09
  • 1970-01-01
  • 2021-12-12
  • 2011-12-12
  • 2021-12-16
  • 1970-01-01
相关资源
最近更新 更多