【问题标题】:Writing Specific CSV Rows to a Dataframe将特定的 CSV 行写入数据框
【发布时间】:2019-04-15 22:33:52
【问题描述】:

我正在使用 csv 库从我拥有的几个文件中读取特定行。我遇到的问题是将这些行保存到数据框中。我遇到了无法解决的索引错误。

当前版本的代码找到列名(位于第三行),然后开始查找我需要的数据(从第六行开始,一直持续到找到空白行)。查找列名工作正常,但是当我尝试将数据附加到它时,我收到错误: “InvalidIndexError:重新索引仅对唯一值的索引对象有效”

我目前的代码如下:

    i=0
    import csv
    import pandas as pd
    df = pd.DataFrame()
    with open('C:/Users/sword/Anaconda3/envs/exceltest/RF_SubjP02_Free_STATIC_TR01.csv', 'r') as csvfile:
        csvreader = csv.reader(csvfile, delimiter=',')
        for row in csvreader:
           if csvreader.line_num == 3:  #this is for the column names
               print(row)
               df = pd.DataFrame(columns = row)
               df.columns = row
           if csvreader.line_num >= 6:  #this is for the data
               if row: #checks for blank row
                   if i<10: #just printing the top ten rows for debugging purposes, theres thousands I need
                       print(i)
                       i+=1
                       df.append(row)  #this is where I get the indexing error
               else: # breaks out of loop if
                   break
    print(df) #for double checking if it worked

编辑: 数据样本在这里:

Devices

1680

Column Name 1,Column Name 2,Column Name 3,Column Name 4,Column Name 5,Column Name 6,Column Name 7,Column Name 8,Column Name 9,Column Name 10,Column Name 11,Column Name 12,Column Name 13,Column Name 14,Column Name 15,Column Name 16,Column Name 17,Column Name 18,Column Name 19,Column Name 20,Column Name 21

Frame,Sub Frame,Sync,v,v,v,v,v,v,v,v,v,v,v,v,v,v,v,v,FS,FS

,,,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V,V
1,0,0,1.28178e-005,-5.21866e-005,8.24e-006,1.19022e-005,1.00711e-005,3.02133e-005,2.83822e-005,0,6.40889e-006,-6.1037e-007,2.83822e-005,-6.40889e-006,2.65511e-005,1.46489e-005,1.73956e-005,1.09867e-005,0,0

1,1,0,9.82043e-006,-4.40121e-005,8.78497e-006,1.02673e-005,1.1706e-005,3.15758e-005,2.62023e-005,5.44972e-006,8.0438e-006,-1.06924e-005,2.91997e-005,-8.0438e-006,2.73686e-005,1.51939e-005,1.73956e-005,1.04417e-005,0,0

1,2,0,1.40167e-005,-3.27202e-005,1.00493e-005,1.22292e-005,1.33409e-005,3.55758e-005,2.57009e-005,6.58328e-006,9.67872e-006,-1.5499e-005,2.95376e-005,-8.47978e-006,2.98645e-005,1.47797e-005,1.42783e-005,9.89672e-006,0,0

1,3,0,1.83656e-005,-2.59735e-005,1.01692e-005,1.46816e-005,1.45617e-005,3.74506e-005,2.56355e-005,3.19357e-006,4.47972e-006,-1.95863e-005,2.93959e-005,-7.92392e-006,3.13469e-005,1.46489e-005,1.38423e-005,9.14466e-006,0,0

1,4,0,1.84419e-005,-2.20169e-005,8.5016e-006,1.52157e-005,1.46053e-005,3.87149e-005,2.44148e-005,6.53978e-007,-4.27252e-006,-1.96627e-005,2.87746e-005,-8.1528e-006,3.05185e-005,1.39513e-005,1.59568e-005,9.37354e-006,0,0

1,5,0,1.5837e-005,-1.80387e-005,7.46613e-006,1.39622e-005,1.40603e-005,4.07858e-005,2.10905e-005,0,-8.4253e-006,-1.45073e-005,2.88073e-005,-9.25364e-006,2.83277e-005,1.21529e-005,1.69705e-005,9.48254e-006,0,0

1,6,0,1.39295e-005,-1.44963e-005,7.52064e-006,1.24908e-005,1.42783e-005,4.23117e-005,1.63493e-005,0,-4.77405e-006,-9.22096e-006,2.98427e-005,-1.00711e-005,2.60933e-005,1.02455e-005,1.5935e-005,7.84765e-006,0,0

我希望输出是一个数据框,其中第 3 行作为列名,第 6 行直到空白行作为填充列的数据。

例如:

    In[1]: csv file above
    Out[1]: [column Name 1]   [Column Name 2] ...
            [Data 1 in Row 6] [Data 2 in Row 6] ...
            [Data 1 in Row 7] [Data 2 in Row 7] ...
            [Data 1 in Row 8] [Data 2 in Row 8] ...

【问题讨论】:

  • 你能添加一些数据样本和预期输出吗?
  • 是的,我已经添加了
  • 我认为是文本中的示例数据,如果使用图片我无法复制数据。还有什么预期的输出?
  • 我添加了文字。预期输出低于样本。
  • 对不起,你能查一下how to provide a great pandas example吗?

标签: python pandas csv dataframe


【解决方案1】:

我很高兴在没有给出我的问题值得否决的理由的情况下被否决。我能够自己弄清楚。希望这可以在将来回答其他人的问题。

    import csv
    import pandas as pd
    temp = []  #initialize array
    with open('C:/Users/sword/Anaconda3/envs/exceltest/RF_SubjP02_Free_STATIC_TR01.csv', 'r') as csvfile:
         csvreader = csv.reader(csvfile, delimiter=',')
         for row in csvreader:
             if csvreader.line_num == 3:  
                temp.append(row)     #gets column names and saves to array  
             if csvreader.line_num >= 6:
                if row: 
                     temp.append(row)  # gets data values and saves to array
                else: #stops at blank row
                     break
    df = pd.DataFrame(temp) #creates a dataframe from an array
    df.columns = df.iloc[0]  #make top row the column names
    df.reindex(df.index.drop(1))
    print(df)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-12-13
    • 2014-03-28
    • 2017-03-29
    • 2021-03-25
    • 2021-05-27
    • 1970-01-01
    相关资源
    最近更新 更多