大熊猫组内的行之间的差异答案

【问题标题】：difference between rows within groups pandas大熊猫组内的行之间的差异
【发布时间】：2021-01-24 06:31:52
【问题描述】：

数据框：

col1   col2
a       50
b       40
a       40
a       30
b       20
a       20
b       30
b       50

我需要根据 col1 对它们进行分组，并根据每个组的 col2 将它们从高到低排序并找出组中连续行之间的差异。日期范围：

col1  col_entity col2   diff   
a        a1       50     10     
b        a2       40     10     
a        a3       30     10    
a        a4       20     nan    
b        b1       40     10     
a        b4       50     10     
b        b3       30     10     
b        b2       20     nan

请帮我解决这个问题提前致谢

【问题讨论】：

您的预期输出是什么？
请不要在原始问题的答案发布后更改您的问题。如果您想回答其他问题，可以在另一篇文章中进行。
@MichaelSzczesny 我很抱歉我是新来在这里发布问题的人，下次会继续这样做

标签： python python-3.x pandas dataframe pandas-groupby

【解决方案1】：

看看这是否有帮助：

#replaces any value that contains a string value, with a 0
df['col2'] = pd.to_numeric(df.col2, errors='coerce').fillna(0)
#sorts the column in ascending first and calculates the difference 
df['diff']=df.sort_values(['col1','col2'],ascending=[1,1]).groupby('col1').diff()
#display the dataframe after sorting col1 in asc and col2 in desc
df.sort_values(['col1','col2'],ascending=[1,0])

输出：

【讨论】：

谢谢您，先生您的回答很好，但我已更改编辑问题
我收到错误 TypeError: unsupported operand type(s) for -: 'str' and 'str'

【解决方案2】：

您可以使用 assign 和 groupby col1 然后使用 diff 计算差异。

(
    df
    .assign(diff = lambda x: x.groupby('col1').diff())
    .sort_values(['col1','col2'],ascending=False)
)

【讨论】：