【发布时间】:2018-02-02 02:52:51
【问题描述】:
我有一个程序,它接收一个 JSON 文件,逐行读取它,根据时间将时间聚合到四个 bin 中,然后将其输出到文件中。但是,由于将字典与字符串连接,我的文件输出包含额外的字符。
例如,一行的输出如下所示:
dwQEZBFen2GdihLLfWeexA<bound method DataFrame.to_dict of Friday Monday Saturday Sunday Thursday Tuesday Wednesday
Category
Afternoon 0 0 3 2 2 0 1
Evening 20 4 16 11 4 3 5
Night 16 1 19 5 2 5 3>
内存地址也被连接到输出文件中。
这是用于创建此特定文件的代码:
import json
import ast
import pandas as pd
from datetime import datetime
def cleanStr4SQL(s):
return s.replace("'","`").replace("\n"," ")
def parseCheckinData():
#write code to parse yelp_checkin.JSON
# Add a new column "Time" to the DataFrame and set the values after left padding the values in the index
with open('yelp_checkin.JSON') as f:
outfile = open('checkin.txt', 'w')
line = f.readline()
# print(line)
count_line = 0
while line:
data = json.loads(line)
# print(data)
# jsontxt = cleanStr4SQL(str(data['time']))
# Parse the json and convert to a dictionary object
jsondict = ast.literal_eval(str(data))
outfile.write(cleanStr4SQL(str(data['business_id'])))
# Convert the "time" element in the dictionary to a pandas DataFrame
df = pd.DataFrame(jsondict['time'])
# Add a new column "Time" to the DataFrame and set the values after left padding the values in the index
df['Time'] = df.index.str.rjust(5, '0')
# Add a new column "Category" and the set the values based on the time slot
df['Category'] = df['Time'].apply(cat)
# Create a pivot table based on the "Category" column
pt = df.pivot_table(index='Category', aggfunc=sum, fill_value=0)
# Convert the pivot table to a dictionary to get the json output you want
jsonoutput = pt.to_dict
# print(jsonoutput)
outfile.write(str(jsonoutput))
line = f.readline()
count_line+=1
print(count_line)
outfile.close()
f.close()
# Define a function to convert the time slots to the categories
def cat(time_slot):
if '06:00' <= time_slot < '12:00':
return 'Morning'
elif '12:00' <= time_slot < '17:00':
return 'Afternoon'
elif '17:00' <= time_slot < '23:00':
return 'Evening'
else:
return 'Night'
我想知道是否可以通过某种方式从输出文件中删除内存位置?
感谢任何建议,如果您需要更多信息,请告诉我。
感谢您的阅读
【问题讨论】:
标签: python json dictionary file-io formatting