【问题标题】:Create a new variable from existing variables efficiently in python在python中有效地从现有变量创建一个新变量
【发布时间】:2021-01-29 06:31:32
【问题描述】:

我正在尝试重新编码变量。我已经能够用 map 做到这一点,但是,我试图找出一种有效的方法来将多个值(a、b、c)重新编码为一个值。在下面的示例中,我为Asian 提供了三种不同的分类,并希望对它们进行相应的重新编码。我尝试使用布尔值,但出现以下错误。

df['Race'] = df['Race'].map({ 
    'Black or African American' : 'Black', 
    'White' : 'White', 
    'Hispanic or Latino': 'Non-White Hispanic', 
    ('Asian' | 'Asian/Indian/Pacific Islander' | 'Native Hawaiian or Other Pacific Islander') : 'Asian/Pacific Islander', 
    ('American Indian or Alaska Native' | 'Other/Mixed') : 'Multiracial/other', 
    'Unspecified' : np.nan
})

TypeError: unsupported operand type(s) for |: 'str' and 'str'

是否有更简单但仍然有效的方法将多个变量重新编码为单个值?它不一定是地图,这是我最熟悉的。

【问题讨论】:

    标签: python recode


    【解决方案1】:

    如何使用字典理解和解包:

    df['Race'] = df['Race'].map({ 
        'Black or African American' : 'Black', 
        'White' : 'White', 
        'Hispanic or Latino': 'Non-White Hispanic', 
        **{i: 'Asian/Pacific Islander' for i in ('Asian', 'Asian/Indian/Pacific Islander', 'Native Hawaiian or Other Pacific Islander')}, 
        **{i: 'Multiracial/other' for i in ('American Indian or Alaska Native', 'Other/Mixed')}, 
        'Unspecified' : np.nan
    })
    

    【讨论】:

      【解决方案2】:

      使用dict.fromkeys:

      df['Race'] = df['Race'].map({ 
          'Black or African American' : 'Black', 
          'White' : 'White', 
          'Hispanic or Latino': 'Non-White Hispanic',  
          'Unspecified' : np.nan,
          **dict.fromkeys(['Asian', 'Asian/Indian/Pacific Islander', 'Native Hawaiian or Other Pacific Islander'], 'Asian/Pacific Islander'), 
          **dict.fromkeys(['American Indian or Alaska Native', 'Other/Mixed'], 'Multiracial/other'),
      })
      

      【讨论】:

        【解决方案3】:

        其实这样就可以了:

        df['Race'] = df['Race'].map({ 
            'Black or African American' : 'Black', 
            'White' : 'White', 
            'Hispanic or Latino': 'Non-White Hispanic', 
        
            'Asian': 'Asian/Pacific Islander',
            'Asian/Indian/Pacific Islander': 'Asian/Pacific Islander',
            'Native Hawaiian or Other Pacific Islander': 'Asian/Pacific Islander', 
        
            'American Indian or Alaska Native': 'Multiracial/other',
            'Other/Mixed': 'Multiracial/other', 
        
            'Unspecified' : np.nan
        })
        

        【讨论】:

          【解决方案4】:

          使用 apply 也可以提高可读性。

           race=[
              'Black or African American', 
              'White', 
              'Hispanic or Latino', 
              'Asian', 
              'Asian/Indian/Pacific Islander',
              'Native Hawaiian or Other Pacific Islander',
              'American Indian or Alaska Native',
              'Other/Mixed',
             'Unspecified'
           ]
          
           df=pd.DataFrame({'Race':race})
          
           def lookup(x):
               dictLookup={ 
                  'Black or African American' : 'Black', 
                  'White' : 'White', 
                  'Hispanic or Latino': 'Non-White Hispanic', 
                  'Unspecified' : np.nan,
                  **{i:'Asian/Pacific Islander' for i in('Asian', 'Asian/Indian/Pacific Islander' , 'Native Hawaiian or Other Pacific Islander')},
                  **{i:'Multiracial/other' for i in('American Indian or Alaska Native','Alaska Native', 'Other/Mixed')}
               }
               return dictLookup[x]
          
              df['Race']=df['Race'].apply(lambda x: lookup(x))
                                   
              print(df.head(20))
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2016-02-25
            • 2018-06-05
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            相关资源
            最近更新 更多