【发布时间】:2019-10-12 04:11:01
【问题描述】:
我正在从目录列表中读取某些 csv 文件,即实际结果和预期结果。现在我遍历actual_results 中的每个csv 并比较它在expected_results 中的csvs。然后我想将整个数据显示为 HTML,如下所示
我已经编写了一些代码来实际清理数据,然后比较实际和预期 csv 的数据帧。
这是整个代码:
import pandas as pd
import sys
from glob import glob
import os
import itertools
# compareCSV takes in two args as path of the two csv files to compare
def compare(expectedList,actualList):
ctr=0
dfList = list()
for (csv1,csv2) in itertools.zip_longest(expectedList,actualList):
df1_ctr=pd.read_csv(csv1,sep=',')
df1_ctr[df1_ctr.columns[1:]] = [x.split('\t') for x in df1_ctr['mean(ms)']]
df1=df1_ctr.apply(pd.to_numeric,errors='coerce')
df2_ctr=pd.read_csv(csv2,sep=',')
df2_ctr[df2_ctr.columns[1:]] = [x.split('\t') for x in df2_ctr['mean(ms)']]
df2=df2_ctr.apply(pd.to_numeric,errors='coerce')
print("Dataframe for Expected List for file : {} is \n {}".format(csv1,df1))
print("Dataframe for Actual List for file: {} is \n {}".format(csv2,df2))
d3=df1.loc[:,:] # Dataframe 1
d4=df2.loc[:,:] # Dataframe 2
d5=abs(((d3.subtract(d4))/d3)*100)
print("Deviation between file {} and {} is :\n {}".format(csv1,csv2,d5))
ctr=ctr+1
#Final Data frame
df=pd.concat([df1,df2,d5])
#print("{}".format(df))
dfList.append(df)
#print("Final Data frame: \n{}".format(dfList))
# for data in dfList:
# print("data at index: \n{}".format(data))
if __name__ == "__main__":
#file1=sys.argv[1] # FileName1
#file2=sys.argv[2] #FileName2
#compareCSV(file1,file2) # Compare CSV files passed in as paramters
os.chdir("expected_results")
expectedCSVs=glob("*.csv")
#print(expectedCSVs)
os.chdir("../actual_results")
actualCSVs=glob("*.csv")
#print(actualCSVs)
compare(expectedCSVs,actualCSVs)
目前我有一些多余的打印语句。 上述代码的输出如下:
Dataframe for Expected List for file : CT_QRW_25.csv is
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN 0.038973 0.044939 0.091076 0.363859 1760108
1 NaN 0.050652 0.044963 0.094738 0.402525 1354233
2 NaN 0.046500 0.045020 0.108138 0.320636 123448
3 NaN 1.872630 0.599966 33.313200 172.040000 21954617
4 NaN 37.752900 0.600484 603.063000 805.340000 2708258
Dataframe for Actual List for file: CT_QRW_25.csv is
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN 0.038973 0.044939 0.091076 0.363859 1760108
1 NaN 0.050652 0.044963 0.094738 0.402525 1354233
2 NaN 0.046500 0.045020 0.108138 0.320636 123448
3 NaN 1.872630 0.599966 33.313200 172.040000 21954617
4 NaN 37.752900 0.600484 603.063000 805.340000 2708258
Deviation between file CT_QRW_25.csv and CT_QRW_25.csv is :
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN 0.0 0.0 0.0 0.0 0.0
1 NaN 0.0 0.0 0.0 0.0 0.0
2 NaN 0.0 0.0 0.0 0.0 0.0
3 NaN 0.0 0.0 0.0 0.0 0.0
4 NaN 0.0 0.0 0.0 0.0 0.0
Dataframe for Expected List for file : CT_W_14.csv is
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN 97.8025 17.8492 725.619 891.455 5304765.0
Dataframe for Actual List for file: CT_W_14.csv is
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN 97.8025 17.8492 725.619 891.455 5304765.0
Deviation between file CT_W_14.csv and CT_W_14.csv is :
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN 0.0 0.0 0.0 0.0 0.0
目标: 由于我目前拥有的语句是打印语句,因此如果我想将其转换为 HTML,我将无法使其动态化。我的目标是将其作为 HTML 文件输出。或者即使有一种自定义方法可以在数据框中添加一行作为标题,那么也可以。如果偏差大于 10%,那么我想以红色显示单元格。如果有人遇到过这种情况,那就太好了,请帮助我。任何帮助将不胜感激。
【问题讨论】:
标签: html python-3.x pandas csv data-science