【发布时间】:2021-09-10 23:50:55
【问题描述】:
我在内存中有两个 CSV 存储为数据帧:df1 和 df2
df1 有一列“OOSCUSTID” df2 有一列“FORCUSTID”
对于 df1 中的每一行:
其中df1中的OOSCUSTID值==df2中的FORCUSTID值,取df2['KKLM']中的值,存入df1['FOREIGN-KKLM'']
df1:
NO. OOSCUSTID # TRADES AVG PROFIT/LOSS
648500 -17 103 1305914.12
648483 -16 103 1305914.12
648502 -15 103 1305914.12
df2:
NO. FORCUSTID KKLM AVG PROFIT/LOSS
648495 0 6 1305914.12
648500 -17 3 1305914.12
648483 -16 5 1305914.12
648502 -15 6 1305914.12
648484 -14 7 1305914.12
648482 -13 8 1305914.12
648501 -12 20.34 1305914.12
648486 -9 4534 1305914.12
648487 -8 103 1305914.12
下面的代码产生错误:
ValueError:长度不匹配:预期为 9 行,收到长度为 1 的数组
checkstats = ["FOREIGN-KKLM"]
c = ["KKLM"]
ooscolfor = ["FORCUSTID"]
ooscolmain = ["OOSCUSTID"]
df1[checkstats] = df2.set_index([ooscolfor])[c].reindex(df1[ooscolmain]).array
编辑 2 修改df1和df2并使用代码:
df1['FOREIGN-KKLM'] = df1.merge(df2, left_on='OOSCUSTID',
right_on='FORCUSTID')['KKLM']
产生不一致 - 当 #3 应该是 Nan 而 #4 应该是 4534:
NO. OOSCUSTID # TRADES AVG PROFIT/LOSS FOREIGN-KKLM
0 648500 -17 103 1305914.12 3.0
1 648483 -16 103 1305914.12 5.0
2 648502 -15 103 1305914.12 6.0
3 545 4 44 44.00 4534.0
4 22 -9 22 22.00 NaN
修改了df:
df1:
NO. OOSCUSTID # TRADES AVG PROFIT/LOSS
648500 -17 103 1305914.12
648483 -16 103 1305914.12
648502 -15 103 1305914.12
545 4 44 44
22 -9 22 22
df2:
NO. FORCUSTID KKLM AVG PROFIT/LOSS
648495 0 6 1305914.12
648500 -17 3 1305914.12
648483 -16 5 1305914.12
648502 -15 6 1305914.12
648484 -14 7 1305914.12
648482 -13 8 1305914.12
648501 -12 20.34 1305914.12
648486 -9 4534 1305914.12
648487 -8 103 1305914.12
【问题讨论】: