【发布时间】:2020-07-01 23:04:37
【问题描述】:
我正在尝试自动格式化从传感器收集的大量 JSON 文件。我创建了一个初始数据框,其中包含每个文件的路径信息,以及传感器数据的标签。我正在尝试遍历每个 JSON 文件,将传感器读数提取到数据帧中,然后想要加入原始数据帧。数据可在以下https://github.com/MJLongstreth/stackoverflow
这是我到目前为止所得到的。
# Import necessary packages
import os
import pandas as pd
import json
data_files = []
for dirpath, subdirs, files in os.walk('.'):
for x in files:
if x.endswith(".json"):
data_files.append(os.path.join(dirpath, x))
# Delete variable no longer needed
del dirpath, files, x, subdirs
# Read file paths into a dataframe
df = pd.DataFrame(data_files)
# Rename column to path
df.columns = ['path']
# Split path to extract labels, sensor type, date, filename and then join file path
df = pd.DataFrame(df.apply(lambda x: x.str.split('/'))['path'].to_list(),
columns=['delete', 'folder', 'label', 'sensor_type', 'collection_date', 'file']).join(df).drop(['delete', 'folder'], axis=1)
# Initialize empty list to store data from json files
data = []
# Loop over data files paths and add json file dictionary to list
for file in data_files:
x = pd.read_json(file,
lines=True)
data.append(x)
# Add data to dataframe
df['data'] = data
# Delete variable no longer needed
del data, data_files, x, file
# Split DF into dataframes by sensor type
acc_data = df[df['sensor_type'] == 'acc']
gyro_data = df[df['sensor_type'] == 'gyro']
这就是我想要从那里做的事情,但只针对其中一个 JSON 文件
# Unpack first level of dictionary
df_1 = acc_data['data'].iloc[0].apply(pd.Series)
temp_1 = []
for index, row in df_1.iterrows():
temp_1.append(row.apply(pd.Series))
temp_2 = []
for i in temp_1:
for index, row in i.iterrows():
#row = row.drop('Timestamp')
row = row.apply(pd.Series)
temp_2.append(row)
temp_3 = []
for i in temp_2:
y = i.stack().apply(pd.Series).mean()
temp_3.append(y)
temp_4 = []
for i in temp_3:
x = pd.DataFrame(i).transpose()
temp_4.append(x)
empty_df = pd.DataFrame()
for i in temp_4:
empty_df = empty_df.append(i, ignore_index=True)
我开始尝试结合我的 FOR 循环,但我冻结了我的电脑,与以下
test = acc_data['data'].to_list()
temp = []
temp_2 = []
temp_3 = []
temp_4 = []
for i in test:
for index, row in i.iterrows():
temp.append(row.apply(pd.Series))
for i in temp:
for index, row in i.iterrows():
#row = row.drop('Timestamp')
row = row.apply(pd.Series)
temp_2.append(row)
任何关于以更有效的方式完成我正在尝试做的事情的建议将不胜感激。谢谢。
【问题讨论】:
标签: python json pandas dataframe dictionary