【问题标题】:Replace NaNs in pandas DataFrame based on row entries根据行条目替换 pandas DataFrame 中的 NaN
【发布时间】:2018-04-24 22:28:35
【问题描述】:

我有一个 DataFrame,其中每一行代表一次医生就诊,每一列包含来自一次诊断测试的数据。数据不完整,缺失值用 NaN 填充。

这是一个简化的例子:

       AGE Height     SEX Weight
0   79     40    Male     90
1   79     21    Male     20
2   79    NaN    Male     50
3   79     89    Male    NaN
4   79     90    Male     57
5   81     87  Female    NaN
6   81    NaN  Female     89
7   81     54  Female     79
8   81     21  Female    NaN
9   81     23  Female     23

我想用相同性别和年龄的患者的总体平均值替换每个 NaN。我已经能够使用以下内容创建一个包含每个 AGE 和 SEX 组合的方法的 DataFrame:

age_sex_means = df.groupby(['SEX', 'AGE'])['Height','Weight'].mean()

这会产生以下 DataFrame:

                Height  Weight
SEX    AGE                
Female 81     37.0    38.2
Male   79     48.0    43.4

但是我找不到用第二个 DataFrame 中包含的方法替换第一个 DataFrame 中的 NaN 的方法。 Using Pandas to fill NaN entries based on values in a different column, using a dictionary as a guide 似乎都解决了与我类似的情况,但只有一个索引显然不适用于我的确切情况。

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    选项 1
    你可以使用apply加上fillna

    df.groupby(['AGE', 'SEX'], group_keys=False).apply(lambda x: x.fillna(x.mean()))
    
       AGE  Height     SEX     Weight
    0   79   40.00    Male  90.000000
    1   79   21.00    Male  20.000000
    2   79   60.00    Male  50.000000
    3   79   89.00    Male  54.250000
    4   79   90.00    Male  57.000000
    5   81   87.00  Female  63.666667
    6   81   46.25  Female  89.000000
    7   81   54.00  Female  79.000000
    8   81   21.00  Female  63.666667
    9   81   23.00  Female  23.000000
    

    选项 2
    使用transformcombine_first 生成副本

    df.combine_first(df.groupby(['SEX', 'AGE']).transform('mean'))
    
       AGE  Height     SEX     Weight
    0   79   40.00    Male  90.000000
    1   79   21.00    Male  20.000000
    2   79   60.00    Male  50.000000
    3   79   89.00    Male  54.250000
    4   79   90.00    Male  57.000000
    5   81   87.00  Female  63.666667
    6   81   46.25  Female  89.000000
    7   81   54.00  Female  79.000000
    8   81   21.00  Female  63.666667
    9   81   23.00  Female  23.000000
    

    选项 3
    fillna 相同

    df.fillna(df.groupby(['SEX', 'AGE']).transform('mean'))
    
       AGE  Height     SEX     Weight
    0   79   40.00    Male  90.000000
    1   79   21.00    Male  20.000000
    2   79   60.00    Male  50.000000
    3   79   89.00    Male  54.250000
    4   79   90.00    Male  57.000000
    5   81   87.00  Female  63.666667
    6   81   46.25  Female  89.000000
    7   81   54.00  Female  79.000000
    8   81   21.00  Female  63.666667
    9   81   23.00  Female  23.000000
    

    选项 4
    或使用update 就地编辑

    df.update(df.groupby(['SEX', 'AGE']).transform('mean'))
    df
    
       AGE  Height     SEX     Weight
    0   79   40.00    Male  90.000000
    1   79   21.00    Male  20.000000
    2   79   60.00    Male  50.000000
    3   79   89.00    Male  54.250000
    4   79   90.00    Male  57.000000
    5   81   87.00  Female  63.666667
    6   81   46.25  Female  89.000000
    7   81   54.00  Female  79.000000
    8   81   21.00  Female  63.666667
    9   81   23.00  Female  23.000000
    

    【讨论】:

      猜你喜欢
      • 2018-09-26
      • 1970-01-01
      • 2018-11-14
      • 2018-11-29
      • 2020-06-22
      • 2016-01-08
      • 1970-01-01
      • 2015-10-09
      • 2018-12-05
      相关资源
      最近更新 更多