【发布时间】:2019-12-09 05:10:56
【问题描述】:
我有两个不同的 csv 文件,我已将它们合并到一个数据框中,并根据“class_name”列进行分组。 group by 按预期工作,但我不知道如何通过将组与其他组进行比较来执行操作。从 r1.csv 班级代数减少了 5 个学生,所以我想要 -5,微积分增加了 5,所以它必须增加到 +5,这必须作为单独数据框中的新列添加。与日期算术相同。
这是我目前尝试过的
import pandas as pd
report_1_df=pd.read_csv('r1.csv')
report_2_df=pd.read_csv('r2.csv')
for group,elements in pd.concat([report_1_df, report_2_df], axis=0, sort=False).groupby('class_name'):
print(elements)
我可以看到我的工作组,我尝试了 .sum() .diff() 但似乎没有人做我想做的事,我能在这里做什么。谢谢。
r1.csv
class_name,student_count,start_time,end_time
algebra,15,"2019,Dec,08","2019,Dec,09"
calculus,10,"2019,Dec,08","2019,Dec,09"
statistics,12,"2019,Dec,08","2019,Dec,09"
r2.csv
class_name,student_count,start_time,end_time
calculus,15,"2019,Dec,09","2019,Dec,10"
algebra,10,"2019,Dec,09","2019,Dec,10"
trigonometry,12,"2019,Dec,09","2019,Dec,10"
需要
class_name,student_count,student_count_change,start_time,start_time_delay,end_time,end_time_delay
algebra,10,-5,"2019,Dec,09",1,"2019,Dec,10",1
calculus,15,5,"2019,Dec,09",1,"2019,Dec,10",1
statistics,12,-12,"2019,Dec,08",0,"2019,Dec,09",0
trigonometry,12,12,"2019,Dec,09",0,"2019,Dec,10",0
【问题讨论】:
-
结果中的
student_count列应该来自 r1 还是 r2 ?
标签: python pandas compare pandas-groupby