【发布时间】:2020-10-01 18:18:44
【问题描述】:
我想比较两个数据帧并输出一个数据帧及其差异。但是,我可以容忍 2 天之内的日期差异,并在 5 分之内得分。如果 df1 的值在可接受的范围内,我将保留它们。
df1
id group date score
10 A 2020-01-10 50
29 B 2020-01-01 80
39 C 2020-01-21 84
38 A 2020-02-02 29
df2
id group date score
10 B 2020-01-11 56
29 B 2020-01-01 81
39 C 2020-01-22 85
38 A 2020-02-12 29
我的预期输出:
id group date score
10 A -> B 2020-01-10 50 -> 56
29 B 2020-01-01 80
39 C 2020-01-21 84
38 A 2020-02-02 -> 2020-02-12 29
因此,我想在某些列上逐个单元格和条件比较数据框。
我从这个开始:
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
result = []
for col in df1.columns:
for index, row in df1.iterrows():
diff = []
compare_item = row[col][index]
for index, row in df2.iterrows():
if col == 'date':
# acceptable if it's within 2 days differences
if col == 'score':
# acceptable if it's within 5 points differences
if compare_item == row[col][index]:
diff.append(compare_item)
else:
diff.append('{} --> {}'.format(compare_item, row[col]))
result.append(diff)
df = pd.DataFrame(result, columns = [df1.columns])
【问题讨论】: