Pandas：如何更新数据框并附加新条目？ [复制]答案

【问题标题】：Pandas: How to update a dataframe and also append new entries? [duplicate]Pandas：如何更新数据框并附加新条目？ [复制]
【发布时间】：2021-10-09 05:00:34
【问题描述】：

使用熊猫=1.1.5。我想将值从 df2 更新为 df1。但是 df2 有新的索引，当我使用更新时，这些索引并没有附加到 df1。详情见下文。谢谢

df1

      | Revenue |  Profit   | Sales |
0      |  100    |  300      |  1    |
1      |  500    |  900      |  3    |
2      |  200    |  100      |  4    |

df2

       | Sales |
0       | 10   |
6       |  12    |

所需的df

      | Revenue |  Profit   | Sales |
0      |  100    |  300      |  10    |
1      |  500    |  900      |  3    |
2      |  200    |  100      |  4    |
6      |  Nan    |  Nan      |  12   |

df 使用更新

df1.update(df2)

      | Revenue |  Profit   | Sales |
0      |  100    |  300      |  10    |
1      |  500    |  900      |  3    |
2      |  200    |  100      |  4    |

【问题讨论】：

这是combine_first 的经典用例：df2.combine_first(df1).loc[:,[*df1]]

标签： python python-3.x pandas

【解决方案1】：

使用Join：

df1 = pd.DataFrame(data={'Revenue':[100,500,200], 'Profit':[300,900,100], 'Sales':[1,3,4]})
df2 = pd.DataFrame(data={'Sales':[10,12]}, index=[0,6])
df1 = df1.join(df2, how='outer', lsuffix='_df1')
df1['Sales'].fillna(df1['Sales_df1'], inplace=True)
df1.drop(columns=['Sales_df1'], inplace=True)
print(df1)

使用merge

df1 = df1.merge(df2, how='outer', left_index=True, right_index=True, suffixes=('_df1', ''))
df1['Sales'].fillna(df1['Sales_df1'], inplace=True)
df1.drop(columns=['Sales_df1'], inplace=True)
print(df1)

输出：

   Revenue  Profit  Sales
0    100.0   300.0   10.0
1    500.0   900.0    3.0
2    200.0   100.0    4.0
6      NaN     NaN   12.0

【讨论】：

【解决方案2】：

不幸的是，update 方法在加入方法方面有些限制（未实现）（join 参数只能是"left"）。

因此，您必须同时使用update 和concat：

import pandas as pd

df1 = pd.DataFrame({'Revenue': [100,5000,200], 'Profit': [300,900,100], 'Sales': [1,3,4]})
df2 = pd.DataFrame({'Sales': [10,12]}, index=[0,6])

df1.update(df2, overwrite=True)
to_be_added = df2.loc[df2.index.difference(df1.index)]
dd = pd.concat([df1, to_be_added])

结果：

   Revenue  Profit  Sales
0    100.0   300.0   10.0
1   5000.0   900.0    3.0
2    200.0   100.0    4.0
6      NaN     NaN   12.0

【讨论】：

我注意到使用更新会弄乱一些格式。 df2 = "01-10-2019 12:00:00 am" 中的原始值在 df1 中更新为 df1 后变为 "1569888000000000000"。无论如何要解决这个问题？谢谢
显然，您有一个类型化的列。您可以使用astype 恢复列的类型：stackoverflow.com/questions/15891038/…

【解决方案3】：

你可以在update之前reindex你的数据框：

out = df1.reindex(df1.index.union(df2.index))
out.update(df2)
print(out)

# Output:
   Revenue  Profit  Sales
0    100.0   300.0   10.0
1    500.0   900.0    3.0
2    200.0   100.0    4.0
6      NaN     NaN   12.0

【讨论】：