【发布时间】:2019-03-05 14:18:41
【问题描述】:
我正在尝试将数据从 csv 提取到 JSON 文件。 csv 有几列,但我只需要 col1、col2、col3。我一直在玩熊猫并试图让它工作,但我不知道如何消除其他列并只得到 col1、col2 和 col3。我知道为 pandas 运行 iteraterrows 会遍历所有行并导致获取所有列,我尝试了 iloc 但没有得到正确的输出。
到目前为止我的代码
import pandas as pd
import pdb
from itertools import groupby
from collections import OrderedDict
import json
df = pd.read_csv('test_old.csv', dtype={
"col1" : str,
"col2" : str
})
results = []
for (col1), bag in df.groupby(["col1"]):
contents_df = bag.drop(["col1"], axis=1)
labels = [OrderedDict(row) for i,row in contents_df.iterrows()]
pdb.set_trace()
results.append(OrderedDict([("col1", col1),
("subset", labels)]))
print json.dumps(results[0], indent=4)
with open('ExpectedJsonFile.json', 'w') as outfile:
outfile.write(json.dumps(results, indent=4))
CSV
col1,col2,state,col3,val2,val3,val4,val5
95110,2015-05-01,CA,50,30.00,5.00,3.00,3
95110,2015-06-01,CA,67,31.00,5.00,3.00,4
95110,2015-07-01,CA,97,32.00,5.00,3.00,6
预期的 JSON
{
"col1": "95110",
"subset": [
{
"col2": "2015-05-01",
"col3": "50",
},
{
"col2": "2015-06-01",
"col3": "67",
},
{
"col2": "2015-07-01",
"col3": "97",
}
]
}
【问题讨论】:
标签: python pandas pandas-groupby