根据列值（pandas 和 numpy）在两个数据框中创建重复行

【问题标题】：Create duplicate rows in two data frames based on column values (pandas & numpy)根据列值（pandas 和 numpy）在两个数据框中创建重复行
【发布时间】：2021-05-12 03:26:19
【问题描述】：

假设我有两个数据框，DF1 和 DF2，

no1  quantity    no2
abc      3       123
pqr      5       NaN

和

no1    serial
abc      10
pqr      20

我想创建以下输出 DF3 和 DF4

no1     quantity  
abc         3      
123         3      
pqr         5

和

no1       serial
abc         10
123         10
pqr         20

请帮助创建 DF3。我考虑过重复 Df1 的行 if DF1['no1'] != 'NA' for Df3 然后删除 no2 列。可以使用pd.merge创建DF4，但123的序列号应该是10，这是必需的。

【问题讨论】：

标签： python-3.x pandas dataframe numpy

【解决方案1】：

对于 df3，您可以使用append() 方法、to_frame() 方法和assign() 方法：

df3=df1['no1'].append(df1['no2']).to_frame(name='no1').assign(quantity=df1['quantity']).reset_index(drop=True).dropna()

df3的输出：

    no1     quantity
0   abc     3
1   pqr     5
2   123.0   3

对于 df4，您可以使用merge() 方法、groupby() 方法和ffill() 方法：

df4=df3.merge(df2,on='no1',how='left').groupby('quantity').ffill()

df4的输出：

    no1     serial
0   abc     10.0
1   pqr     20.0
2   123.0   10.0

【讨论】：