【问题标题】:How to groupby column value year and month to get previous month salary?如何按列值年份和月份分组以获得上个月的工资?
【发布时间】:2020-06-29 23:34:13
【问题描述】:

我有这个形状员工的数据,他的月薪增加了几个月

Employee    year    month     Salary
PersonA     2001    1         $50000 
PersonB     2001    5         $65000 
PersonB     2002    1         $75000 
PersonB     2002    3         $100000 
PersonC     2002    5         $75000 
PersonC     2002    6         $100000 
PersonC     2003    3         $110000 
PersonC     2003    9         $130000 
PersonC     2004    3         $150000 
PersonC     2005    3         $200000

我想创建相同的形状,但有一个名为上个月薪水的额外列

Employee    year    month     Salary     previous month salary 
PersonA     2001    1         $50000     0
PersonB     2001    5         $65000     0
PersonB     2002    1         $75000     $65000
PersonB     2002    3         $100000    $75000
PersonC     2002    5         $75000     0
PersonC     2002    6         $100000    $75000
PersonC     2003    3         $110000    $100000
PersonC     2003    9         $130000    $110000
PersonC     2004    3         $150000    $130000
PersonC     2005    3         $200000    $150000

我在 pandas 中尝试了groupby,但我无法将月份值减一,因为这只是所有月份的样本真实数据,所以如果我能得到上个月的值,就是这样。

但是当我尝试 groupby 时,我无法达到如何减去

df["previous_salary"]=df.groupby(['year',"month"])['salary'].transform('mean').astype(np.float16)

df["previous_salary"]=df.groupby(['year',"month"])['salary']

结果是同月的平均值或值

Employee    year    month     Salary     previous month salary 
PersonA     2001    1         $50000     $50000
PersonB     2001    5         $65000     $65000
PersonB     2002    1         $75000     $75000
PersonB     2002    3         $100000    $100000
PersonC     2002    5         $75000     $75000 
PersonC     2002    6         $100000    $100000
PersonC     2003    3         $110000    $110000
PersonC     2003    9         $130000    $130000
PersonC     2004    3         $150000    $150000
PersonC     2005    3         $200000    $200000

有没有办法在我分组之前减去月份的值,或者有另一种方法

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    IIUC,你可以试试groupbyshift

    df["prev"] = (
        df.sort_values(["Employee", "year", "month"]).groupby("Employee")["Salary"].shift(1)
    )
    
    
    print(df)
      Employee  year  month   Salary     prev
    0  PersonA  2001      1   $50000      NaN
    1  PersonB  2001      5   $65000      NaN
    2  PersonB  2002      1   $75000   $65000
    3  PersonB  2002      3  $100000   $75000
    4  PersonC  2002      5   $75000      NaN
    5  PersonC  2002      6  $100000   $75000
    6  PersonC  2003      3  $110000  $100000
    7  PersonC  2003      9  $130000  $110000
    8  PersonC  2004      3  $150000  $130000
    9  PersonC  2005      3  $200000  $150000
    

    【讨论】:

    • 谢谢我错过了 shift(1) 我能问一下 shift 需要什么参数
    • 我提供的参数是周期,可以是正数或负数。请参阅Pandas Shift 了解更多信息。 @kspmm
    【解决方案2】:

    您可以使用groupby().shift()获取之前的数据:

    prev_salaries = df.groupby(['Employee']).Salary.shift()
    
    # fill with current month
    df['prev_salary'] = prev_salaries.fillna(df['Salary'])
    

    输出:

      Employee  year  month   Salary prev_salary
    0  PersonA  2001      1   $50000      $50000
    1  PersonB  2001      5   $65000      $65000
    2  PersonB  2002      1   $75000      $65000
    3  PersonB  2002      3  $100000      $75000
    4  PersonC  2002      5   $75000      $75000
    5  PersonC  2002      6  $100000      $75000
    6  PersonC  2003      3  $110000     $100000
    7  PersonC  2003      9  $130000     $110000
    8  PersonC  2004      3  $150000     $130000
    9  PersonC  2005      3  $200000     $150000
    

    【讨论】:

    • 但它为没有工作的人带来了价值,我的意思是,如果它下面没有月份,它会给出相同的月份价值,但它几乎做了其他所有事情
    • 谢谢它对我有帮助,我想我会处理零问题,如果不是它仍然有很大帮助谢谢
    猜你喜欢
    • 1970-01-01
    • 2018-12-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-12-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多