【问题标题】:How to assign values to multiple columns using conditions for values from other multiple columns?如何使用来自其他多列的值的条件将值分配给多列?
【发布时间】:2022-10-07 19:20:19
【问题描述】:

数据集是这样的(原来会有重复的行):

代码:

import pandas as pd

df_in = pd.DataFrame({'email_ID': {0: 'sachinlaltaprayoohoo',
  1: 'sachinlaltaprayoohoo',
  2: 'sachinlaltaprayoohoo',
  3: 'sachinlaltaprayoohoo',
  4: 'sachinlaltaprayoohoo',
  5: 'sachinlaltaprayoohoo',
  6: 'sheldon.yokoohoo',
  7: 'sheldon.yokoohoo',
  8: 'sheldon.yokoohoo',
  9: 'sheldon.yokoohoo',
  10: 'sheldon.yokoohoo',
  11: 'sheldon.yokoohoo'},
 'time_stamp': {0: '2021-09-10 09:01:56.340259',
  1: '2021-09-10 09:01:56.672814',
  2: '2021-09-10 09:01:57.471423',
  3: '2021-09-10 09:01:57.480891',
  4: '2021-09-10 09:01:57.484644',
  5: '2021-09-10 09:01:57.984644',
  6: '2021-09-10 09:01:56.340259',
  7: '2021-09-10 09:01:56.672814',
  8: '2021-09-10 09:01:57.471423',
  9: '2021-09-10 09:01:57.480891',
  10: '2021-09-10 09:01:57.484644',
  11: '2021-09-10 09:01:57.984644'},
 'screen': {0: 'rewardapp.SplashActivity',
  1: 'i1',
  2: 'rewardapp.Signup_in',
  3: 'rewardapp.PaymentFinalConfirmationActivity',
  4: 'rewardapp.Signup_in',
  5: 'i1',
  6: 'rewardapp.SplashActivity',
  7: 'i1',
  8: 'rewardapp.Signup_in',
  9: 'i1',
  10: 'rewardapp.Signup_in',
  11: 'rewardapp.PaymentFinalConfirmationActivity'}})

df_in['time_stamp'] = df_in['time_stamp'].astype('datetime64[ns]')

df_in

输出应该是这样的:

代码:

import pandas as pd

df_out = pd.DataFrame({'email_ID': {0: 'sachinlaltaprayoohoo',
  1: 'sachinlaltaprayoohoo',
  2: 'sachinlaltaprayoohoo',
  3: 'sachinlaltaprayoohoo',
  4: 'sachinlaltaprayoohoo',
  5: 'sachinlaltaprayoohoo',
  6: 'sheldon.yokoohoo',
  7: 'sheldon.yokoohoo',
  8: 'sheldon.yokoohoo',
  9: 'sheldon.yokoohoo',
  10: 'sheldon.yokoohoo',
  11: 'sheldon.yokoohoo'},
 'time_stamp': {0: '2021-09-10 09:01:56.340259',
  1: '2021-09-10 09:01:56.672814',
  2: '2021-09-10 09:01:57.471423',
  3: '2021-09-10 09:01:57.480891',
  4: '2021-09-10 09:01:57.484644',
  5: '2021-09-10 09:01:57.984644',
  6: '2021-09-10 09:01:56.340259',
  7: '2021-09-10 09:01:56.672814',
  8: '2021-09-10 09:01:57.471423',
  9: '2021-09-10 09:01:57.480891',
  10: '2021-09-10 09:01:57.484644',
  11: '2021-09-10 09:01:57.984644'},
 'screen': {0: 'rewardapp.SplashActivity',
  1: 'i1',
  2: 'rewardapp.Signup_in',
  3: 'rewardapp.PaymentFinalConfirmationActivity',
  4: 'rewardapp.Signup_in',
  5: 'i1',
  6: 'rewardapp.SplashActivity',
  7: 'i1',
  8: 'rewardapp.Signup_in',
  9: 'i1',
  10: 'rewardapp.Signup_in',
  11: 'rewardapp.PaymentFinalConfirmationActivity'},
 'series1': {0: 0,
  1: 1,
  2: 2,
  3: 3,
  4: 0,
  5: 1,
  6: 0,
  7: 1,
  8: 2,
  9: 3,
  10: 4,
  11: 5},
 'series2': {0: 0,
  1: 0,
  2: 0,
  3: 0,
  4: 1,
  5: 1,
  6: 2,
  7: 2,
  8: 2,
  9: 2,
  10: 2,
  11: 2}})

df_out['time_stamp'] = df['time_stamp'].astype('datetime64[ns]')

df_out

'series1' 列值以 0、1、2 等逐行开始,但在以下情况下重置为 0:

  1. “email_ID”列值更改。
  2. 'screen' 列值 == 'rewardapp.PaymentFinalConfirmationActivity'

    'series 2' 列值从 0 开始,每当 'series 1' 重置时递增 1。

    我的进步:

    series1 = [0]
    
    x = 0
    
    for index in df[1:].index:
    
      if ((df._get_value(index - 1, 'email_ID')) == df._get_value(index, 'email_ID')) and (df._get_value(index - 1, 'screen') != 'rewardapp.PaymentFinalConfirmationActivity'):
    
        x += 1
    
        series1.append(x)
      
      else:
        x = 0
    
        series1.append(x)
    
    
    df['series1'] = series1
    df
    
    series2 = [0]
    
    x = 0
    
    for index in df[1:].index:
    
      if df._get_value(index, 'series1') - df._get_value(index - 1, 'series1') == 1:
    
        series2.append(x)
      
      else:
        
        x += 1
    
        series2.append(x)
    
    
    df['series2'] = series2
    df
    

    我认为上面的代码是有效的,我会在几个小时内测试回答的代码并选择最好的,谢谢。

【问题讨论】:

    标签: python pandas dataframe numpy


    【解决方案1】:

    我们试试看

    m = (df_in['email_ID'].ne(df_in['email_ID'].shift().bfill()) |
         df_in['screen'].shift().eq('rewardapp.PaymentFinalConfirmationActivity'))
    
    df_in['series1'] = df_in.groupby(m.cumsum()).cumcount()
    df_in['series2'] = m.cumsum()
    
    print(df_in)
    
                    email_ID                 time_stamp                                      screen  series1  series2
    0   sachinlaltaprayoohoo 2021-09-10 09:01:56.340259                    rewardapp.SplashActivity        0        0
    1   sachinlaltaprayoohoo 2021-09-10 09:01:56.672814                                          i1        1        0
    2   sachinlaltaprayoohoo 2021-09-10 09:01:57.471423                         rewardapp.Signup_in        2        0
    3   sachinlaltaprayoohoo 2021-09-10 09:01:57.480891  rewardapp.PaymentFinalConfirmationActivity        3        0
    4   sachinlaltaprayoohoo 2021-09-10 09:01:57.484644                         rewardapp.Signup_in        0        1
    5   sachinlaltaprayoohoo 2021-09-10 09:01:57.984644                                          i1        1        1
    6       sheldon.yokoohoo 2021-09-10 09:01:56.340259                    rewardapp.SplashActivity        0        2
    7       sheldon.yokoohoo 2021-09-10 09:01:56.672814                                          i1        1        2
    8       sheldon.yokoohoo 2021-09-10 09:01:57.471423                         rewardapp.Signup_in        2        2
    9       sheldon.yokoohoo 2021-09-10 09:01:57.480891                                          i1        3        2
    10      sheldon.yokoohoo 2021-09-10 09:01:57.484644                         rewardapp.Signup_in        4        2
    11      sheldon.yokoohoo 2021-09-10 09:01:57.984644  rewardapp.PaymentFinalConfirmationActivity        5        2
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-11-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-20
      相关资源
      最近更新 更多