【问题标题】:Reset Pandas Cumsum for every multiple of 1000为每 1000 的倍数重置 Pandas Cumsum
【发布时间】:2019-11-28 16:49:01
【问题描述】:

我目前有一个如下所示的数据框,每次超过 1000 ex(2000,3000...等)的倍数时,我都需要重置 cumsum,并且

                    Production    ID  cumsum  
     2017-10-19        1054  1323217    1054     
     2017-10-20           0  1323217    1054     
     2017-10-21           0  1323217    1054     
     2017-10-22           0  1323217    1054     
     2017-10-23           0  1323217    1054  

例如在上面,我需要一个如下所示的 df:

                 Production    ID      cumsum  adjCumsum numberGenerated
      2017-10-19        1054  1323217    1054     1000      1
      2017-10-20           0  1323217    1054     54        0
      2017-10-21           0  1323217    1054     54        0
      2017-10-22        3054  1323217    4108     4000      4
      2017-10-23           0  1323217    4018     108       0 
      2017-10-23         500  1323218    500      500       0

下面,每 1000 次正确重置一次值,但我似乎不太明白如何通过按 ID 分组并将其四舍五入到 1000 来翻译它。

maxvalue = 1000

lastvalue = 0
newcum = []
    for row in df.iterrows():
        thisvalue =  row[1]['cumsum'] + lastvalue
           if thisvalue > maxvalue:
              thisvalue = 0
           newcum.append( thisvalue )
           lastvalue = thisvalue
  df['newcum'] = newcum

感谢下面的答案,我现在可以计算生成的累积数量,但我需要计算生成的增量#。

     df['cumsum'] = df.groupby('ID')['Production'].cumsum()
     thresh = 1000
     multiple = (df['cumsum'] // thresh )
     mask = multiple.diff().ne(0)
     df['numberGenerated'] = np.where(mask, multiple, 0)
     df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % 
     thresh

    df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()

My initial thinking was to try something similar to:

      df['numGen1'] = df['cumsum2'].diff()

最终编辑测试并正常工作。感谢您的帮助

I was overthinking it, below is how I was able to do it:

df['cumsum'] = df.groupby('ID')['Production'].cumsum()
thresh = 1000

multiple = (df['cumsum'] // thresh )

mask = multiple.diff().ne(0)
df['numberGenerated'] = np.where(mask, multiple, 0)
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh

df['cumsum2'] = df.groupby('ID')['numberGenerated'].cumsum()

numgen = []
adjcumsum = []

for i in range(len(df['cumsum'])):
    if df['cumsum'][i] > thresh and (df['ID'][i] == df['ID'][i-1]):
        numgenv = (df['cumsum'][i] // thresh) - (df['cumsum'][i-1] // thresh)
        numgen.append(numgenv)
    elif df['cumsum'][i] > thresh:
        numgenv = (df['cumsum'][i] // thresh)
        numgen.append(numgenv)
    else:
        numgenv = 0
        numgen.append(numgenv)

df['numgen2.0'] = numgen

【问题讨论】:

标签: python python-3.x pandas


【解决方案1】:

IIUC,这只是一个带有一些技巧的整数除法问题:

thresh = 1000
df['cumsum'] = df['Production'].cumsum()

# how many times cumsum passes thresh
multiple = (df['cumsum'] // thresh )

# detect where thresh is pass
mask = multiple.diff().ne(0)

# update the number generated:
df['numberGenerated'] = np.where(mask, multiple, 0)

# then the adjusted cumsum 
df['adjCumsum'] = (df['numberGenerated'].mul(thresh)) + df['cumsum'] % thresh

输出:

            Production       ID  cumsum  adjCumsum  numberGenerated
2017-10-19        1054  1323217    1054       1054                1
2017-10-20           0  1323217    1054         54                0
2017-10-21           0  1323217    1054         54                0
2017-10-22        3054  1323217    4108       4108                4
2017-10-23           0  1323217    4108        108                0
2017-10-23         500  1323218    4608        608                0

【讨论】:

  • 实际上 - 我可能错过了代表问题。我需要计算 adjCumSum 的 numberGenerated。在我的示例中,我写了 adjCumsum = 4108 它应该是 3054 + 54 = 3108 并且生成的数字 = 3。
  • 这可能更简单,即您的numberGenerated 只是multiple.diff()
  • 你能解释一下吗?这是第一次遇到这种问题。
  • multiple.diff() 为您提供一个项目与其上一个项目之间的区别。在这里,multiple 或多或少是numberSoFar(类似于cumsum,但向下舍入到数千。因此,如果您正在寻找增量numberGenerated,那么multiple 的变化不正是多少?跨度>
  • 我仍在努力理解这一点。你会建议我去哪里获取文档? python 新手,所以仍然远远超出我的水平。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2023-03-03
  • 2017-02-06
  • 2019-04-04
  • 1970-01-01
  • 1970-01-01
  • 2021-03-21
  • 2018-05-24
相关资源
最近更新 更多