excel中合并的单元格在熊猫中变成NaN答案

【问题标题】：Merged cells in excel become NaN in pandasexcel中合并的单元格在熊猫中变成NaN
【发布时间】：2020-01-05 09:27:08
【问题描述】：

如何将具有这种形式的excel文件读入pandas DataFrame？

a       b   c    d       e    f
Type    1   22   Car     Yes  2019
                 Train   Yes  
Type    2   25   Car     No   2018
Notype  1        Car     Yes  2019
                 Train

第一行包含三列合并单元格（2 行），但其余为单独的行

问题是如果我使用

data = pd.read_excel("excel.xls").fillna(method='ffill')

然后来自第三行的值"25" 和来自第四行的"Yes" 将填充下面的 NaN 值，这不是我想要的。因此，合并的每一列都应该为两行复制准确的值。在这种情况下，"a", "b", "c" 和 "f" 是合并列

它应该像这样正确加载：

a       b   c    d       e   f
Type    1   22   Car     Yes 2019
Type    1   22   Train   Yes 2019
Type    2   25   Car     No  2018
Notype  1   NaN  Car     Yes 2019
Notype  1   NaN  Train   NaN 2019

【问题讨论】：

这些合并的单元格在数据框中的显示方式是否有规律，与“真正的”空单元格不同？我的意思是，如果你只看到 DF 而不知道 excel 文件，你能判断这个单元格是否应该被填充吗？
只有特定列包含合并单元格。如果您要问的话，我知道哪些列包含合并的单元格。
@jezrael 我将一小部分数据上传到谷歌表格：docs.google.com/spreadsheets/d/…
@jezrael 试试这个docs.google.com/spreadsheets/d/…
在 pandas DF 中创建它们？

标签： python excel pandas

【解决方案1】：

如果需要前向填充所有列并从列表中排除某些名称，请使用 Index.difference 并前向填充缺失值：

cols_excluded = ['c','e']
cols = df.columns.difference(cols_excluded)

df[cols] = df[cols].ffill()
print (df)
        a    b     c      d    e
0    Type  1.0  22.0    Car  Yes
1    Type  1.0   NaN  Train  Yes
2    Type  2.0  25.0    Car   No
3  Notype  1.0   NaN    Car  Yes
4  Notype  1.0   NaN  Train  NaN

如果需要，还可以使用排除每列最后一个缺失值（此处为cols_excluded）向前填充所有缺失值：

df[cols_excluded] = df[cols_excluded].where(df[cols_excluded].bfill().isna(),
                                            df[cols_excluded].ffill())
print (df)

        a    b     c      d    e
0    Type  1.0  22.0    Car  Yes
1    Type  1.0  22.0  Train  Yes
2    Type  2.0  25.0    Car   No
3  Notype  1.0   NaN    Car  Yes
4  Notype  1.0   NaN  Train  NaN

【讨论】：

第二个选项仅适用于列中最后的值？那么对于中间的 NaN 就行不通了？
@AlexT - 它仅在列中的最后一个 NaN 值中省略，但所有其他缺失值都会被替换，如果不需要，请删除 df[cols_excluded] = df[cols_excluded].where(df[cols_excluded].bfill().isna(), df[cols_excluded].ffill())