【问题标题】:Transform rows with stage changes and dates into "to" and "from"将具有阶段变化和日期的行转换为 \"to\" 和 \"from\"
【发布时间】:2022-08-13 21:29:11
【问题描述】:

我有以下数据集显示一个人进入新阶段的时间:

Name Stage Amount Date
Karen One $1 01/01/21
Karen Two $1 08/12/21
Karen Three $1 05/03/22
Jaren Three $4 02/02/21
Jaren One $4 07/19/22
Laren One $5 04/07/21
Laren Two $5 08/17/22

我希望了解各个阶段的人员流动和数量(每人的数量是恒定的)。所以我需要将数据集转换成下表:

From Stage To Stage Amount Record Count
One Two $6 2
One Three $0 0
Two One $0 0
Two Three $1 1
Three One $4 1
Three Two $0 0

我希望仅按人计算金额,并计算从每个阶段转移到另一个阶段的人数(所有可能性)。

  • 你能解释更多关于From StageTo Stage的信息吗?我仍然不知道如何获得这两列
  • 假设有 3 个阶段:一、二、三。一个人可以从任何阶段移动到另一个阶段。这意味着在阶段之间有三种移动方式:一二、一三、二一、二三、三一和三二。这些列是基于时间的这些可能性的表示-因此,Karen 和 Laren 从第一阶段移至第三阶段意味着记录数为 2,数量为 1+5=6。请让我知道这是否有意义。谢谢

标签: python python-3.x pandas dataframe


【解决方案1】:
# Convert to numerical:
df.Amount = df.Amount.str.lstrip('$').astype(int)

# Make Stage Categorical:
df.Stage = df.Stage.astype('category')

# Optional: Make sure Dates are sorted within each group.
# Wasn't needed in your sample data.
# df.Date = pd.to_datetime(df.Date)
# df = df.sort_values(['Name', 'Date'])

# Find Next Stage for each Stage: 
df['Next_Stage'] = df.groupby('Name')['Stage'].shift(-1)

# Now when we pivot, all categories are represented:
out = (df.pivot_table(index=['Stage', 'Next_Stage'], values='Amount', aggfunc=['sum', 'count'])
         .droplevel(1, 1) # Get rid of "Amount" header.
         .reset_index()   # Reset the Index.
         [lambda x: x['Stage'].ne(x['Next_Stage'])]) # Remove Rows where Stages are the same.
print(out)

输出:

   Stage Next_Stage  sum  count
1    One      Three    0      0
2    One        Two    6      2
3  Three        One    4      1
5  Three        Two    0      0
6    Two        One    0      0
7    Two      Three    1      1

【讨论】:

  • 哇,这个方法太棒了!你也值得我投票! :-)
【解决方案2】:

我采用蛮力方法使用df[col].shift() 定义“From Stage”和“To Stage”列

#redefine column as float, thanks @BeRT2me
df['Amount'] = df['Amount'].str[1:].astype('float')

#define new columns
df['From Name'] = df['Name'].shift(1)
df['From Stage'] = df['Stage'].shift(1)
df['To Stage'] = df['Stage']
df.drop(df.index[df['From Name']!=df['Name']], inplace=True)
print(df)

df1 = df.pivot_table(index=['From Stage', 'To Stage'], values='Amount', aggfunc=['sum', 'count'])
print(df1)

输出:

    Name  Stage  Amount      Date From Name From Stage To Stage
1  Karen    Two     1.0  08/12/21     Karen        One      Two
2  Karen  Three     1.0  05/03/22     Karen        Two    Three
4  Jaren    One     4.0  07/19/22     Jaren      Three      One
6  Laren    Two     5.0  08/17/22     Laren        One      Two

                       sum  count
                    Amount Amount
From Stage To Stage              
One        Two         6.0      2
Three      One         4.0      1
Two        Three       1.0      1

【讨论】:

  • .apply(lambda x: x[1:]) 可以简化为 .str[1:]
猜你喜欢
  • 2022-10-19
  • 2020-06-08
  • 2015-10-13
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-10-08
  • 1970-01-01
相关资源
最近更新 更多