如何在保留现有值的情况下填充数据框中的数据答案

【问题标题】：How to fill in data in dataframe with keeping the existing values如何在保留现有值的情况下填充数据框中的数据
【发布时间】：2017-04-06 08:34:13
【问题描述】：

我有脚本可以将文件 (df4) 中的值填充到现有数据框 (df3) 中。但是数据框 df3 已经包含填充了值的列，并且这些现有值使用以下脚本设置为“NaN”：

df5 = df4.pivot_table(index='source', columns='plasmidgene', values='identity').reindex(index=df3.index, columns=df3.columns)

如何避免我的现有值被覆盖？谢谢

例如，我有 df1

   a   b   c    d   e   f
1  1   30  Nan Nan Nan Nan
2  2   3   Nan Nan Nan Nan
3  16  1   Nan Nan Nan Nan

df2

 1   1  d   80
 2   2  e   90
 3   3  c   60

我想创建这个

   a   b   c   d   e   f
1  1  30  Nan 80  Nan Nan
2  2   3  Nan Nan 90  Nan
3 16   1  60  Nan Nan Nan

【问题讨论】：

您可以添加数据样本和所需的输出吗？
见：How to make good reproducible pandas examples

标签： python pandas dataframe

【解决方案1】：

我觉得你可以用combine_first:

 df = df2.pivot_table(index='source', columns='plasmidgene', values='identity') \
        .reindex(index=df1.index, columns= df1.columns) \
        .combine_first(df1)

print (df)
      a     b     c     d     e   f
1   1.0  30.0   NaN  80.0   NaN NaN
2   2.0   3.0   NaN   NaN  90.0 NaN
3  16.0   1.0  60.0   NaN   NaN NaN

print (df.dtypes)
a    float64
b    float64
c    float64
d    float64
e    float64
f    float64
dtype: object

对于fillna，这是有问题的 - 不会将 dtypes 更改为 float64，所以不要使用它 - 它看起来像错误：

df = df2.pivot_table(index='source', columns='plasmidgene', values='identity') \
        .reindex(index=df1.index, columns= df1.columns) \
        .fillna(df1)

print (df)
    a   b    c    d    e    f
1   1  30  NaN   80  NaN  NaN
2   2   3  NaN  NaN   90  NaN
3  16   1   60  NaN  NaN  NaN

print (df.dtypes)
a    object
b    object
c    object
d    object
e    object
f    object
dtype: object

【讨论】：

是的，最后一个选项效果很好！非常感谢！
在我看来，最好使用combine_first，因为混合类型是有问题的——一些pandas函数是错误的。
如果我使用 combine_first，我会收到以下错误 [AttributeError: 'DataFrame' object has no attribute 'dtype']
可能是拼写错误print (df.dtypes) 添加s - 仅供检查。
所有数据类型都是对象？