【问题标题】:pandas data frame transform INT64 columns to booleanpandas 数据框将 INT64 列转换为布尔值
【发布时间】:2013-09-15 21:21:20
【问题描述】:

数据框 df 中的某些列 df.column 存储为数据类型 int64。

这些值都是 1 或 0。

有没有办法用布尔值替换这些值?

【问题讨论】:

    标签: python numpy boolean pandas


    【解决方案1】:
    df['column_name'] = df['column_name'].astype('bool')
    

    例如:

    import pandas as pd
    import numpy as np
    df = pd.DataFrame(np.random.random_integers(0,1,size=5), 
                      columns=['foo'])
    print(df)
    #    foo
    # 0    0
    # 1    1
    # 2    0
    # 3    1
    # 4    1
    
    df['foo'] = df['foo'].astype('bool')
    print(df)
    

    产量

         foo
    0  False
    1   True
    2  False
    3   True
    4   True
    

    给定column_names 的列表,您可以使用以下方法将多列转换为bool dtype:

    df[column_names] = df[column_names].astype(bool)
    

    如果您没有列名列表,但希望转换所有数字列,那么您可以使用

    column_names = df.select_dtypes(include=[np.number]).columns
    df[column_names] = df[column_names].astype(bool)
    

    【讨论】:

    • 如何让 pandas 自动检测?如果只有 0 和 1.. 则将其设为布尔值?
    • 如何对所有适用的列执行此操作?
    • 试过df['column_name'] = df['column_name'].astype('bool')boolean 值默认为True。如何将 boolean 默认为False
    【解决方案2】:

    参考:Stack Overflow unutbu(1 月 9 日 13:25),BrenBarn(2017 年 9 月 18 日)

    我有不想将其转换为布尔值的数字列,例如年龄和 ID。因此,在识别出像 unutbu 向我们展示的数字列之后,我过滤掉了最大值超过 1 的列。

    # code as per unutbu
    column_names = df.select_dtypes(include=[np.number]).columns 
    
    # re-extracting the columns of numerical type (using awesome np.number1 :)) then getting the max of those and storing them in a temporary variable m.
    m=df[df.select_dtypes(include=[np.number]).columns].max().reset_index(name='max')
    
    # I then did a filter like BrenBarn showed in another post to extract the rows which had the max == 1 and stored it in a temporary variable n.
    n=m.loc[m['max']==1, 'max']
    
    # I then extracted the indexes of the rows from n and stored them in temporary variable p.
    # These indexes are the same as the indexes from my original dataframe 'df'.
    p=column_names[n.index]
    
    # I then used the final piece of the code from unutbu calling the indexes of the rows which had the max == 1 as stored in my variable p.
    # If I used column_names directly instead of p, all my numerical columns would turn into Booleans.
    df[p] = df[p].astype(bool)
    

    【讨论】:

      猜你喜欢
      • 2023-03-21
      • 2020-04-01
      • 2020-12-23
      • 1970-01-01
      • 2018-03-25
      • 2017-04-01
      • 1970-01-01
      • 2017-06-22
      相关资源
      最近更新 更多