【问题标题】:Create an indicator variable in Python using a threshold value leaving NaN's as NaN使用将 NaN 保留为 NaN 的阈值在 Python 中创建指示变量
【发布时间】:2020-08-20 15:33:22
【问题描述】:

我有一些来自包含一些 NaN 的电导率探头的浮点数据。我想根据经验阈值将探测数据转换为指示变量,但我希望 NaN 值保持 NaN。转换为指标似乎很简单,但问题在于处理 nan。下面是一个阈值为 50 的示例:

import numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = (df.x <=50)*1

产量:

      x  indicator
0   0.0          1
1   NaN          0
2   2.0          1
3   3.0          1
4   4.0          1
5  51.0          0
6  61.0          0
7  71.0          0
8  81.0          0
9  91.0          0

但我希望 nan 的指标像这样:

      x  indicator
0   0.0          1
1   NaN        NaN  
2   2.0          1
3   3.0          1
4   4.0          1
5  51.0          0
6  61.0          0
7  71.0          0
8  81.0          0
9  91.0          0

感谢任何帮助。谢谢。

【问题讨论】:

  • 给您带来麻烦的代码在哪里?许多教程都介绍了数据框过滤。

标签: python pandas nan


【解决方案1】:

你可以试试这个:

import numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = df.x*(df.x <=50)

输出:

      x  indicator
0   0.0        0.0
1   NaN        NaN
2   2.0        2.0
3   3.0        3.0
4   4.0        4.0
5  51.0        0.0
6  61.0        0.0
7  71.0        0.0
8  81.0        0.0
9  91.0        0.0

精确输出:

mport numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = np.where(df.x.isnull(), np.nan, df.x < 50)

输出:

      x  indicator
0   0.0        1.0
1   NaN        NaN
2   2.0        1.0
3   3.0        1.0
4   4.0        1.0
5  51.0        0.0
6  61.0        0.0
7  71.0        0.0
8  81.0        0.0
9  91.0        0.0

【讨论】:

    【解决方案2】:
    In [1829]: df['indicator'] = df[df.x <=50]*1                                                                                                                                                                
    

    指标将仅针对 x

    In [1830]: df                                                                                                                                                                                               
    Out[1830]: 
          x  indicator
    0   0.0        0.0
    1   NaN        NaN
    2   2.0        2.0
    3   3.0        3.0
    4   4.0        4.0
    5  51.0        NaN
    6  61.0        NaN
    7  71.0        NaN
    8  81.0        NaN
    9  91.0        NaN
    

    【讨论】:

      【解决方案3】:

      我想我尝试将 lambda 应用于列:)

      x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
      df = pd.DataFrame({"x":x})
      indicator = lambda x: np.nan if (np.isnan(x)) else (x<=50)*1 
      df['indicator'] = df['x'].apply(indicator)
      print(df)
      

      打印:

            x  indicator
      0   0.0        1.0
      1   NaN        NaN
      2   2.0        1.0
      3   3.0        1.0
      4   4.0        1.0
      5  51.0        0.0
      6  61.0        0.0
      7  71.0        0.0
      8  81.0        0.0
      9  91.0        0.0
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-02-19
        • 2014-08-07
        • 2023-01-20
        • 2017-10-02
        • 1970-01-01
        • 1970-01-01
        • 2019-04-09
        • 2020-09-20
        相关资源
        最近更新 更多