在 Pandas 中合并包含空白单元格和重复项的行答案

【问题标题】：Merging rows containing blank cells and duplicates in Pandas在 Pandas 中合并包含空白单元格和重复项的行
【发布时间】：2019-10-30 19:14:24
【问题描述】：

我希望使用 Python Pandas 将行合并到一个大型 Excel 文件中。假设在 Excel 或 csv 文件中，我有：

Kelly | $400 |      |      | $20 |
Kelly |      | $200 |      |     |
Kelly |      |      | $500 |     |
John  |      |  $2  | ($7) |     |
John  |      |      |      | $10 |

我想结束：

Kelly | $400 | $200 | $500 | $20 |
John  |      |  $2  | ($7) | $10 |

有简单的解决方案吗？提前致谢。

【问题讨论】：

标签： python excel pandas csv

【解决方案1】：

听起来您正在寻找groupby：

import pandas as pd
import numpy as np

df = pd.DataFrame(
data={'Name' : ['Kelly', 'Kelly', 'Kelly', 'John', 'John'],
                   'col1' : [400, np.nan, np.nan, np.nan, np.nan],
                   'col2' : [np.nan, 200, np.nan, 2, np.nan],
                   'col3' : [np.nan, np.nan, 500, -7, np.nan],
                   'col4' : [20, np.nan, np.nan, np.nan, 10],})

打印（df）

    Name   col1   col2   col3  col4
0  Kelly  400.0    NaN    NaN  20.0
1  Kelly    NaN  200.0    NaN   NaN
2  Kelly    NaN    NaN  500.0   NaN
3   John    NaN    2.0   -7.0   NaN
4   John    NaN    NaN    NaN  10.0


print(df.groupby('Name').sum())

输出：

        col1   col2   col3  col4
Name                            
John     0.0    2.0   -7.0  10.0
Kelly  400.0  200.0  500.0  20.0

编辑：如果您只获得第一列的总和，那么其他列的数据类型可能是非数字的。如果在整个数据帧上应用 groupby，每一列都会产生 aggfunction 结果。尝试使用 df.info() 查看您的列的数据类型。

【讨论】：

groupby 方法只会返回我第一列的名称和值。我试过df.groupby(["col1", "col2"])["Name"].sum()；但是，这只返回了当时同时填写了 col1 和 col2 的行。