【发布时间】:2019-07-01 17:46:16
【问题描述】:
给定 2 个 pandas 数据框
Med_DF
Key Med
1 A
1 B
1 C
2 A
2 F
3 A
3 C
3 E
4 A
4 B
4 C
4 D
Key_DF
Key ID
1 A1
2 A2
3 A3
4 A4
5 A5
如何在不重复Keys 的情况下合并两者,将ID 与每个Key 匹配并在新列中创建派生变量?派生变量将返回每个ID 或空白/Nan 如果为 0 的药物数量,如下面的 Result_DF 所示
Result_DF
Key ID Med
1 A1 3
2 A2 2
3 A3 3
4 A4 4
5 A5
我的尝试
我确信我的解决方案过时且效率低下,这就是为什么我要求更清洁、可能更快的解决方案。尽管如此,我还是通过循环创建 Excel 公式来填充派生列并查找与Key 匹配的ID。
# read in Med and Key files into dataframes
Med_DF = pd.read_csv(med_file, usecols = ['Key', 'Med'], encoding = 'utf-8', keep_default_na=False, na_values=[''])
Key_DF = pd.read_csv(key_file, usecols = ['Key', 'ID'], encoding = 'utf-8', keep_default_na=False, na_values=[''])
# add empty ID column to Med_DF
Med_DF.insert(0, "ID", "")
# assign length of dataframes
length_of_med = len(Med_DF)
length_of_key = len(Key_DF)
# create empty lists for formulas
med_countif = []
med_vlookup = []
# med VLOOKUP formulas
for i in range(2,length_of_med+2):
formula = '=VLOOKUP($B{0},Sheet1!$A:$B,2,FALSE)'.format(i)
med_vlookup.append(formula)
# med COUNTIF formulas
for i in range(2,length_of_key+2):
formula =
'=IF(COUNTIF(Sheet1!$A:$A,$B{0})=0,"",COUNTIF(Sheet1!$A:$A,$B{0}))'.format(i)
med_countif.append(formula)
# write formulas to columns
Key_DF.loc[:, "Meds"] = meds_formulas
Med_DF.loc[:, "Key"] = meds_vlookup
【问题讨论】:
标签: python-3.x pandas merge