【问题标题】:Creating a new column based on conditions of first and last value of groupby Python根据groupby Python的第一个和最后一个值的条件创建一个新列
【发布时间】:2021-09-16 08:40:41
【问题描述】:

我有一个 pandas 数据框,想创建一个新列,其值基于 groupby 的第一行和最后一行的条件。需要的条件是

mgr to mgr = 被聘为经理

emp 到经理 = 晋升为经理

emp to emp = 雇佣为emp

mgr 到 emp = 状态变化

date        email          level 
01/01/2000  john@abc.com   mgr
05/06/2000  john@abc.com   mgr     
10/01/2001  john@abc.com   mgr     
14/02/2000  kimdo@abc.com  emp     
19/10/2001  kimdo@abc.com  mgr     
12/05/2000  waint@abc.com  emp  
08/08/2000  waint@abc.com  emp  
14/04/2001  waint@abc.com  emp     
22/05/2000  neds@abc.com   mgr
08/11/2000  neds@abc.com   mgr     
12/06/2001  neds@abc.com   emp

希望达到以下结果

date        email          level   status
01/01/2000  john@abc.com   mgr     hired as mgr
10/01/2001  john@abc.com   mgr     hired as mgr
14/02/2000  kimdo@abc.com  emp     promoted to mgr
19/10/2001  kimdo@abc.com  mgr     promoted to mgr
12/05/2000  waint@abc.com  emp     hired as emp
14/04/2001  waint@abc.com  emp     hired as emp
22/05/2000  neds@abc.com   mgr     status change
12/06/2001  neds@abc.com   emp     status change

到目前为止,我能够根据 groupyby 选择数据帧的第一行和最后一行,但我不完全确定如何应用条件来获取新的“状态”列。感谢任何形式的帮助,谢谢。

df2 = df.groupby('email', as_index=False).nth([0,-1])

【问题讨论】:

    标签: python pandas dataframe group-by conditional-statements


    【解决方案1】:
    df2 = df.groupby('email', as_index=False).nth([0,-1])
    

    你可以试试:

    d={'mgr:mgr':'hired as mgr','emp:mgr':'promoted to mgr','emp:emp':'hired as emp','mgr:emp':'status change'}
    #created a dict for mapping
    

    最后:

    df2.loc[:,'status']=df2.groupby('email')['level'].transform(':'.join).map(d)
    

    df2的输出:

        date        email           level   status
    0   01/01/2000  john@abc.com    mgr     hired as mgr
    2   10/01/2001  john@abc.com    mgr     hired as mgr
    3   14/02/2000  kimdo@abc.com   emp     promoted to mgr
    4   19/10/2001  kimdo@abc.com   mgr     promoted to mgr
    5   12/05/2000  waint@abc.com   emp     hired as emp
    7   14/04/2001  waint@abc.com   emp     hired as emp
    8   22/05/2000  neds@abc.com    mgr     status change
    10  12/06/2001  neds@abc.com    emp     status change
    

    【讨论】:

      【解决方案2】:

      尝试创建一个map dictionary 来映射状态。

      fl = lambda s: s.iloc[[0,-1]]
      d = {'mgr-mgr': 'hired as mgr', 'emp-mgr': 'promoted to mgr', 'emp-emp': 'hired as emp', 'mgr-emp': 'status change'}
      res = df.groupby('email', as_index=False)['level'].apply(lambda x: (fl(x).shift(1) + "-" + (fl(x))).bfill()).map(d)
      res.index= res.index.droplevel()
      df['status'] = res
      df.dropna(inplace=True)
      

      date email level status
      0 01/01/2000 john@abc.com mgr hired as mgr
      2 10/01/2001 john@abc.com mgr hired as mgr
      3 14/02/2000 kimdo@abc.com emp promoted to mgr
      4 19/10/2001 kimdo@abc.com mgr promoted to mgr
      5 12/05/2000 waint@abc.com emp hired as emp
      7 14/04/2001 waint@abc.com emp hired as emp
      8 22/05/2000 neds@abc.com mgr status change
      10 12/06/2001 neds@abc.com emp status change

      【讨论】:

      • 似乎正在获取已创建的“res”对象中的所有 nans
      • @wjie08:你使用的是同一个数据框吗?
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2022-07-12
      • 1970-01-01
      • 2016-12-12
      • 2023-01-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多