【发布时间】:2018-12-08 00:17:07
【问题描述】:
我有 13 个 CSV 文件,其中包含格式异常的帐单信息。每天每 30 分钟记录多个读数。五天并排记录(列)。那么接下来的五天都记录在它下面。为了使事情变得更复杂,星期几、日期和计费日显示在每天的第一次 KVAR 记录上。
图片展示了一个小例子。但是,假设 KW、KVAR 和 KVA 重复 3 次以上,然后再继续大约 50 行。
我的目标是创建一个简单的 Python 脚本,将数据转换为具有以下列的数据框:DATE、TIME、KW、KVAR、KVA 和 DAY。
问题是我的脚本在前五天(与 for 循环的新实例相关)后返回 KW、KVAR 和 KVA 数据的 NaN 数据。对我来说奇怪的是,当我尝试打印出相同的范围时,我得到了我期望的数据。
我的代码如下。我已经包含了 cmets 以帮助进一步解释事情。我还有一个函数的示例输出示例。
def make_df(df):
#starting values
output = pd.DataFrame(columns=["DATE", "TIME", "KW", "KVAR", "KVA", "DAY"])
time = df1.loc[3:50,0]
val_start = 3
val_end = 51
date_val = [0,2]
day_type = [1,2]
# There are 7 row movements that need to take place.
for row_move in range(1,8):
day = [1,2,3]
date_val[1] = 2
day_type[1] = 2
# There are 5 column movements that take place.
# The basic idea is that I would cycle through the five days, grab their data in a temporary dataframe,
# and then append that dataframe onto the output dataframe
for col_move in range(1,6):
temp_df = pd.DataFrame(columns=["DATE", "TIME", "KW", "KVAR", "KVA", "DAY"])
temp_df['TIME'] = time
#These are the 3 values that stop working after the first column change
# I get the values that I expect for the first 5 days
temp_df['KW'] = df.iloc[val_start:val_end, day[0]]
temp_df['KVAR'] = df.iloc[val_start:val_end, day[1]]
temp_df['KVA'] = df.iloc[val_start:val_end, day[2]]
# These 2 values work perfectly for the entire data set
temp_df['DAY'] = df.iloc[day_type[0], day_type[1]]
temp_df["DATE"] = df.iloc[date_val[0], date_val[1]]
# trouble shooting
print(df.iloc[val_start:val_end, day[0]])
print(temp_df)
output = output.append(temp_df)
# increase values for each iteration of row loop.
# seems to work perfectly when I print the data
day = [x + 3 for x in day]
date_val[1] = date_val[1] + 3
day_type[1] = day_type[1] + 3
# increase values for each iteration of column loop
# seems to work perfectly when I print the data
date_val[0] = date_val[0] + 55
day_type [0]= day_type[0] + 55
val_start = val_start + 55
val_end = val_end + 55
return output
test = make_df(df1)
以下是一些示例输出。它显示了数据在第五天后开始分解的位置(或 for 循环中列移位的第一个实例)。我做错了什么?
【问题讨论】: