【问题标题】:How to read tab delimited lines by skipping alternate lines如何通过跳过交替行来读取制表符分隔的行
【发布时间】:2014-04-22 01:51:32
【问题描述】:

我目前能够从大型制表符分隔文件中解析和提取数据。我正在逐行读取、解析和提取数据,并在我的数据表中添加拆分项(行限制一次添加 3 行)。我需要跳过偶数行,即阅读第一个最大制表符分隔行,然后跳过第二个并直接阅读第三个。

我的制表符分隔的源文件格式

001Mean                   26.975                  1.1403                  910.45                   
001Stdev                  26.975                  1.1403                  910.45                   
002Mean                   26.975                  1.1403                  910.45                   
002Stdev                  26.975                  1.1403                  910.45                   

需要跳过或避免阅读 Stdev 制表符分隔的行。

C#代码:

通过拆分行获取文件制表符分隔行中项目的最大长度

using (var reader = new StreamReader(sourceFileFullName))
        {
            string line = null;
            line = reader.ReadToEnd();

            if (!string.IsNullOrEmpty(line))
            {
                var list_with_max_cols = line.Split('\n').OrderByDescending(y => y.Split('\t').Count()).Take(1);
                foreach (var value in list_with_max_cols)
                {
                   var values = value.ToString().Split(new[] { '\t', '\n' }).ToArray();
                   MAX_NO_OF_COLUMNS = values.Length;
                }
            }
        }

逐行读取文件,直到满足制表符分隔行中的最大长度作为要解析和提取的第一行

using (var reader = new StreamReader(sourceFileFullName))
        {
            string new_read_line = null;
            //Read and display lines from the file until the end of the file is reached.                
            while ((new_read_line = reader.ReadLine()) != null)
            {
                            var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
                            if (items.Length != MAX_NO_OF_COLUMNS)                         
                            continue;
                //when reach first line it is column list need to create datatable based on that.
                if (firstLineOfFile)
                {

                    columnData = new_read_line;
                    firstLineOfFile = false;
                    continue;
                }
                if (firstLineOfChunk)
                {
                    firstLineOfChunk = false;
                    chunkDataTable = CreateEmptyDataTable(columnData);
                }
                    AddRow(chunkDataTable, new_read_line);
                chunkRowCount++;

                if (chunkRowCount == _chunkRowLimit)
                {
                    firstLineOfChunk = true;
                    chunkRowCount = 0;
                    yield return chunkDataTable;
                    chunkDataTable = null;
                }
            }
        }

创建数据表:

private DataTable CreateEmptyDataTable(string firstLine)
    {

        IList<string> columnList = Split(firstLine);
        var dataTable = new DataTable("TableName");
        for (int columnIndex = 0; columnIndex < columnList.Count; columnIndex++)
        {
            string c_string = columnList[columnIndex];
            if (Regex.Match(c_string, "\\s").Success)
            {
                string tmp = Regex.Replace(c_string, "\\s", "");
                string finaltmp = Regex.Replace(tmp, @" ?\[.*?\]", ""); // To strip strings inside [] and inclusive [] alone
                columnList[columnIndex] = finaltmp;

            }
        }
        dataTable.Columns.AddRange(columnList.Select(v => new DataColumn(v)).ToArray());
        dataTable.Columns.Add("ID");
        return dataTable;

    }

How to skip lines by reading alternatively and split and then add to my datatable !!!

AddRow 功能:通过添加以下更改来实现我的要求!!!

private void AddRow(DataTable dataTable, string line)
    {

        if (line.Contains("Stdev"))
        {
            return;
        }
        else
        {
          //Rest of Code
        }

    }

【问题讨论】:

    标签: c# asp.net split skip csv


    【解决方案1】:

    考虑到每一行都有制表符分隔的值,如何读取奇数行并将它们拆分为数组。这只是一个示例;您可以对此进行扩展。

    测试数据(file.txt)

    luck    is  when    opportunity meets   preparation
    this    line    needs   to  be  skipped
    microsoft   visual  studio
    another line    to  be  skipped
    let us  all code
    

    代码

    var oddLines = File.ReadLines(@"C:\projects\file.txt").Where((item, index) => index%2 == 0);
    foreach (var line in oddLines)
    {
         var words = line.Split('\t');
    }
    

    调试屏幕截图

    编辑

    获取不包含 'Stdev' 的行

    var filteredLines = System.IO.File.ReadLines(@"C:\projects\file.txt").Where(item => !item.Contains("Stdev"));
    

    【讨论】:

    • @Prashanth 感谢您的 cmets !!!我已经对我的代码添加了更改。我很晚才意识到,使用索引 % 2 == 0 可能不符合我的要求,因为 stdev 行可能存在于我的源文件中制表符分隔行的奇数和偶数索引中。
    • @Shrivatsan 我的回答是基于您最初的要求。无论如何,很高兴知道您找到了解决方法。您仍然可以修改我的查询以仅过滤您想要的数据。查看我编辑的查询。
    【解决方案2】:

    改变

    using (var reader = new StreamReader(sourceFileFullName))
        {
            string new_read_line = null;
            //Read and display lines from the file until the end of the file is reached.                
            while ((new_read_line = reader.ReadLine()) != null)
            {
                            var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
                            if (items.Length != MAX_NO_OF_COLUMNS)                         
                            continue;
    

    using (var reader = new StreamReader(sourceFileFullName))
        {
    
            int cnt = 0;
            string new_read_line = null;
            //Read and display lines from the file until the end of the file is reached.                
            while ((new_read_line = reader.ReadLine()) != null)
            {
                            cnt++;
    
                            if(cnt % 2 == 0)
                               continue;
                            var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
                            if (items.Length != MAX_NO_OF_COLUMNS)                         
                            continue;
    

    【讨论】:

    • @Gusman 感谢您的 cmets!我已经对我的代码添加了更改。我很晚才意识到,使用 cnt % 2 == 0 可能不符合我的要求,因为 stdev 行可能存在于我的源文件中制表符分隔行的奇数和偶数索引中。
    猜你喜欢
    • 1970-01-01
    • 2011-10-03
    • 1970-01-01
    • 2011-12-06
    • 2022-06-27
    • 1970-01-01
    • 2017-04-03
    • 1970-01-01
    相关资源
    最近更新 更多