【问题标题】:Merging Dataframes With Some Matching Columns Names Results in Duplicate Columns将数据框与某些匹配的列名称合并会导致重复的列
【发布时间】:2020-02-19 22:00:07
【问题描述】:

对于给定的SymbolDate,我有两个包含一些重叠列的数据框。但是当我这样做时,不是填充缺失的数据,而是添加带有后缀的新列。

df1

  Investor     Date   Name Symbol  Price  Amount  Income
0     Mike  2019 Q4  A Inc    AAA    NaN     100     NaN
1     Bill  2019 Q4  C Inc    CCC    NaN     200     NaN
2     John  2018 Q4  A Inc    AAA    NaN     200     NaN
3     Faye  2018 Q4  D Inc    DDD    NaN     300     NaN
4      Joe  2019 Q2  A Inc    AAA    NaN     300     NaN
5     Hank  2019 Q2  S Inc    SSS    NaN     100     NaN

df2

      Date   Name Symbol  Price  Income
0  2019 Q4  A Inc    AAA      5      10
1  2019 Q4  B Inc    BBB      3      20
2  2019 Q4  C Inc    CCC     33      30
3  2019 Q4  D Inc    DDD     30      40
4  2018 Q4  A Inc    AAA     23      20
5  2018 Q4  B Inc    BBB      4      30
6  2018 Q4  C Inc    CCC    136      40
7  2018 Q4  D Inc    DDD      6      50
8  2018 Q4  E Inc    EEE      1      90

我希望我的输出看起来像:

  Investor     Date   Name Symbol  Price  Amount  Income
0     Mike  2019 Q4  A Inc    AAA    5.0     100    10.0
1     Bill  2019 Q4  C Inc    CCC   33.0     200    30.0
2     John  2018 Q4  A Inc    AAA   23.0     200    20.0
3     Faye  2018 Q4  D Inc    DDD    6.0     300    50.0
4      Joe  2019 Q2  A Inc    AAA    NaN     300     NaN
5     Hank  2019 Q2  S Inc    SSS    NaN     100     NaN

但是当我做df3 = pd.merge(df1, df2, on=['Date', 'Symbol'], how='left') 时,我得到:

  Investor     Date Name_x Symbol  ...  Income_x  Name_y  Price_y Income_y
0     Mike  2019 Q4  A Inc    AAA  ...       NaN   A Inc      5.0     10.0
1     Bill  2019 Q4  C Inc    CCC  ...       NaN   C Inc     33.0     30.0
2     John  2018 Q4  A Inc    AAA  ...       NaN   A Inc     23.0     20.0
3     Faye  2018 Q4  D Inc    DDD  ...       NaN   D Inc      6.0     50.0
4      Joe  2019 Q2  A Inc    AAA  ...       NaN     NaN      NaN      NaN
5     Hank  2019 Q2  S Inc    SSS  ...       NaN     NaN      NaN      NaN

我做错了什么?

df1 = `df1 = {'Investor': {0: 'Mike', 1: 'Bill', 2: 'John', 3: 'Faye', 4: 'Joe', 5: 'Hank'}, 'Date': {0: '2019 Q4', 1: '2019 Q4', 2: '2018 Q4', 3: '2018 Q4', 4: '2019 Q2', 5: '2019 Q2'}, 'Name': {0: 'A Inc', 1: 'C Inc', 2: 'A Inc', 3: 'D Inc', 4: 'A Inc', 5: 'S Inc'}, 'Symbol': {0: 'AAA', 1: 'CCC', 2: 'AAA', 3: 'DDD', 4: 'AAA', 5: 'SSS'}, 'Price': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'Amount': {0: 100, 1: 200, 2: 200, 3: 300, 4: 300, 5: 100}, 'Income': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}}`
df2 = {'Date': {0: '2019 Q4', 1: '2019 Q4', 2: '2019 Q4', 3: '2019 Q4', 4: '2018 Q4', 5: '2018 Q4', 6: '2018 Q4', 7: '2018 Q4', 8: '2018 Q4'}, 'Name': {0: 'A Inc', 1: 'B Inc', 2: 'C Inc', 3: 'D Inc', 4: 'A Inc', 5: 'B Inc', 6: 'C Inc', 7: 'D Inc', 8: 'E Inc'}, 'Symbol': {0: 'AAA', 1: 'BBB', 2: 'CCC', 3: 'DDD', 4: 'AAA', 5: 'BBB', 6: 'CCC', 7: 'DDD', 8: 'EEE'}, 'Price': {0: 5, 1: 3, 2: 33, 3: 30, 4: 23, 5: 4, 6: 136, 7: 6, 8: 1}, 'Income': {0: 10, 1: 20, 2: 30, 3: 40, 4: 20, 5: 30, 6: 40, 7: 50, 8: 90}}
df3 = pd.merge(df1, df2, on=['Date', 'Symbol'], how='left')

【问题讨论】:

    标签: python pandas join merge


    【解决方案1】:

    那是因为您在两个数据帧上都有 Name, Income, Price。如果您不想重复,则应选择所需的列:

    (df1[['Investor', 'Name', 'Date','Symbol','Amount']]
       .merge(df2.drop('Name', axis=1),
              on=['Date','Symbol'],
              how='left')
    )
    

    输出:

      Investor   Name     Date Symbol  Amount  Price  Income
    0     Mike  A Inc  2019 Q4    AAA     100    5.0    10.0
    1     Bill  C Inc  2019 Q4    CCC     200   33.0    30.0
    2     John  A Inc  2018 Q4    AAA     200   23.0    20.0
    3     Faye  D Inc  2018 Q4    DDD     300    6.0    50.0
    4      Joe  A Inc  2019 Q2    AAA     300    NaN     NaN
    5     Hank  S Inc  2019 Q2    SSS     100    NaN     NaN
    

    【讨论】:

    • 所以我需要通过在(df1[['Investor', 'Name', 'Date','Symbol','Amount']] 中列出我想从df1 保留的每一列?
    • 是的,您可以删除列,类似于df2.drop(['Name'], axis=1)
    • 没有明确列出就没有办法保留每一列?基本上df1 中没有df2 中不存在的列,我想保留df1 中的所有列——如果它们存在于df2 中,只需填充缺失值。
    猜你喜欢
    • 2023-03-24
    • 1970-01-01
    • 2013-07-27
    • 2015-02-03
    • 1970-01-01
    • 1970-01-01
    • 2016-08-26
    • 2019-01-13
    • 2023-03-24
    相关资源
    最近更新 更多