【问题标题】:How to filter out data into unique pandas dataframes from a combined csv of multiple datatypes?如何从多种数据类型的组合 csv 中将数据过滤成唯一的 pandas 数据帧?
【发布时间】:2015-09-20 16:51:24
【问题描述】:

示例 csv

time,type,-1,
time,type,0,w
time,type,1,a,12,b,13,c,15,name,apple
time,type,5,r,2,s,43,t,45,u,67,style,blue,font,13
time,type,11,a,12,c,15
time,type,5,r,2,s,43,t,45,u,67,style,green,font,15
time,type,1,a,12,b,13,c,15,name,apple
time,type,11,a,12,c,15
time,type,5,r,2,s,43,t,45,u,67,style,green,font,15
time,type,1,a,12,b,13,c,15,name,apple
time,type,5,r,2,s,43,t,45,u,67,style,yellow,font,9
time,type,19,b,12
type,19,b,42

我想将以下每个“type,1”、“type,5”、“type,11”、“type,19”过滤到单独的 pandas 框架中以供进一步分析。最好的方法是什么? [另外,我将忽略“type,0”和“type,-1”]

示例代码

import pandas as pd

type1_header = ['type','a','b','c','name']
type5_header = ['type','r','s','t','u','style','font']
type11_header = ['type','a','c']
type19_header = ['type','b']

type1_data = pd.read_csv(file_path_to_csv, usecols=[2,4,6,8,10] , names=type1_header)
type5_data = pd.read_csv(file_path_to_csv, usecols=[2,4,6,8,10,12,14] , names=type5_header)

【问题讨论】:

    标签: python numpy pandas matplotlib data-analysis


    【解决方案1】:
    import pandas as pd
    
    headers = {1:['a','b','c','name'],
               5:['r','s','t','u','style','font'],
    }
    
    usecols = {1:[4,6,8,10],
               5:[4,6,8,10,12,14],
               }
    
    
    frames = {}
    for h in headers:
        frames[h] = pd.DataFrame(columns=headers[h])
    
    count = 0
    for line in open('irreg.csv'):
        row = line.split(',')
        count += 1
        ID = int(row[2])
        row_subset = []
        if ID in frames:
            for col in usecols[ID]: row_subset.append(row[col])
            frames[ID].loc[len(frames[ID])] = row_subset
        else:
            print('WARNING: line %d: type %s not found'%(count, row[2]))
    

    虽然已经完成,但您多久执行一次,数据多久更改一次?对于一次性,拆分传入的 csv 文件可能是最简单的,例如由

     grep type,19 irreg.csv > 19.csv
    

    在命令行,然后根据其标题和 usecols 导入每个 csv。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-01-11
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多