【问题标题】:How to read a header from a specific line with CsvHelper?如何使用 CsvHelper 从特定行读取标题?
【发布时间】:2016-09-22 14:01:04
【问题描述】:

我正在尝试读取标题位于第 3 行的 CSV 文件:

some crap line
some empty line
COL1,COL2,COl3,...
val1,val2,val3
val1,val2,val3

我如何告诉CSVHelper 标题不在第一行?

我尝试使用 Read() 跳过 2 行,但随后对 ReadHeader() 的调用引发了标头已被读取的异常。

using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration)) {
   csv.Read();
   csv.Read();
   csv.ReadHeader();
   .....

如果我将 csvConfiguration.HasHeaderRecord 设置为 false ReadHeader() 再次失败。

【问题讨论】:

  • 大多数 CSV 阅读器都有一个选项跳过第一行以避免标题等。如果 CsvReader 没有该选项,只需阅读几行,如 Evk 所示

标签: c# .net csv csvhelper


【解决方案1】:

试试这个:

using (var reader = new StreamReader(stream)) {
      reader.ReadLine();
      reader.ReadLine();
      using (var csv = new CsvReader(reader)) {                    
          csv.ReadHeader();                    
    }
}

【讨论】:

    【解决方案2】:

    这并不比 Evk 的回答好,但我很感兴趣。

    CsvConfiguration 类似乎有一个名为 ShouldSkipRecord 的 Func 回调,可以连接到该回调以实现自定义逻辑。

    https://github.com/JoshClose/CsvHelper/tree/master/src/CsvHelper

    CsvConfiguration.cs

    /// <summary>
    /// Gets or sets the callback that will be called to
    /// determine whether to skip the given record or not.
    /// This overrides the <see cref="SkipEmptyRecords"/> setting.
    /// </summary>
    public virtual Func<string[], bool> ShouldSkipRecord { get; set; }
    

    CsvReader.cs

    /// <summary>
    /// Advances the reader to the next record.
    /// If HasHeaderRecord is true (true by default), the first record of
    /// the CSV file will be automatically read in as the header record
    /// and the second record will be returned.
    /// </summary>
    /// <returns>True if there are more records, otherwise false.</returns>
    public virtual bool Read()
    {
        if (doneReading)
        {
            throw new CsvReaderException(DoneReadingExceptionMessage);
        }
    
        if (configuration.HasHeaderRecord && headerRecord == null)
        {
            ReadHeader();
        }
    
        do
        {
            currentRecord = parser.Read();
        }
        while (ShouldSkipRecord());
    
        currentIndex = -1;
        hasBeenRead = true;
    
        if (currentRecord == null)
        {
            doneReading = true;
        }
    
        return currentRecord != null;
    }
    
    /// <summary>
    /// Checks if the current record should be skipped or not.
    /// </summary>
    /// <returns><c>true</c> if the current record should be skipped, <c>false</c> otherwise.</returns>
    protected virtual bool ShouldSkipRecord()
    {
        if (currentRecord == null)
        {
            return false;
        }
    
        return configuration.ShouldSkipRecord != null
            ? configuration.ShouldSkipRecord(currentRecord)
            : configuration.SkipEmptyRecords && IsRecordEmpty(false);
    }
    

    不幸的是,在调用 ReadHeaders 或在第三行调用 Read 之前,您似乎必须将 HasHeaderRecord 设置为 false,然后将其设置回 true,因为 Read() 中的 ShouldSkipRecord 逻辑位于 ReadHeader( ) 逻辑。

    【讨论】:

      【解决方案3】:

      从 CsvHelper 27.0 开始,该问题不再重现。现在可以从任何行读取标题。根据change log,这可能早在Release 3.0.0 from 2017 就已实现:

      3.0.0

      读取超过 1 个标题行。

      因此,以下代码现在可以正常工作,并且已经工作了一段时间:

      var csvText = "some crap line\nsome empty line\nCOL1,COL2,COl3\nval1,val2,val3\nval1,val2,val3\n\n";
      using var stream = new MemoryStream(Encoding.UTF8.GetBytes(csvText));
      
      var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
      {
          // Your settings here.
      };
      using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
      {
          csv.Read(); // Read in the first row "some crap line"
          csv.Read(); // Read in the second row "some empty line"
          csv.Read(); // Read in the third row which is the actual header.
          csv.ReadHeader(); // Process the currently read row as the header.
      
          Assert.AreEqual(3, csv.HeaderRecord.Length);
          Assert.AreEqual(@"COL1,COL2,COl3", String.Join(",", csv.HeaderRecord));
      

      成功的演示小提琴 #1 here.

      警告:请注意 CsvHelper 默认跳过空白行,因此如果要跳过的某些初步行可能为空白,也可能不为空白,csv.Read() 可能默默地阅读它们——然后也使用你的标题,导致错误的行被用作标题行!

      演示小提琴 #2 here 失败。

      为避免这种可能性并确定性地在文件开头跳过一定数量的行,您必须设置CsvConfiguration.IgnoreBlankLines = false。但是,一旦创建了CsvReader,就无法修改此属性,因此如果您需要跳过空白数据行,可以使用ShouldSkipRecord 回调来完成:

      bool ignoreBlankLines = false;
      var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
      {
          IgnoreBlankLines = false,
          ShouldSkipRecord = (args) => !ignoreBlankLines ? false : args.Record.Length == 0 || args.Record.Length == 1 && string.IsNullOrEmpty(args.Record[0]),
          // Your settings here.
      };
      using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
      {
          csv.Read(); // Read in the first row "some crap line"
          csv.Read(); // Read in the second empty row, which is empty.
          csv.Read(); // Read in the third row which is the actual header.
          csv.ReadHeader(); // Process the currently read row as the header.
          ignoreBlankLines = true; // Now that the header has been read, ignore blank data lines.
      

      成功的演示小提琴#3 here.

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多