【问题标题】:Turning loop comprehensions into numpy form将循环理解转换为 numpy 形式
【发布时间】:2021-04-27 23:41:00
【问题描述】:

无论如何我可以像y_meanxy_mean 函数一样转换标准差函数来计算。我不想使用 for 循环来计算标准偏差或占用大量 RAM 内存的函数。我正在尝试使用np.convolve() 函数来计算标准差std

变量:

number = 5
PC_list= np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])

原版python函数:

y_mean = sum(PC_list[i:i+number])/number
xy_mean = sum([x * (i + 1) for i, x in enumerate(PC_list[i:i+number])])/number
std = (sum([(k - y_mean)**2 for k in PC_list[i:i+number]])/(number-1))**0.5

Numpy 版本:

y_mean = (np.convolve(PC_list, np.ones(shape=(number)), mode='valid')/number)[:-1]
xy_mean = (np.convolve(PC_list, np.arange(number, 0, -1), mode='valid'))[:-1]
std = ?

【问题讨论】:

  • 顺便说一下它被称为列表理解
  • 在尝试回答这个问题之前,可能值得扫描一下 OP 以前的问题,以寻找像这样的 std 移动窗口。他一直在尝试,超过六个问题,以加快迭代计算。

标签: python function numpy iterator slice


【解决方案1】:

您可以将np.lib.stride_tricks.as_stridednp.stdddof=1 一起使用:

>>> np.std(
        np.lib.stride_tricks.as_strided(
            PC_list, 
            shape=(PC_list.shape[0] - number + 1, number), 
            strides=PC_list.strides*2
        ), 
        axis=-1, 
        ddof=1
    )
array([25.35313557, 11.6209317 , 16.32415133, 15.46019574, 15.29513506,
       14.02947067, 14.68620846, 17.04664993, 16.38348865, 12.9925946 ,
        9.58525968,  5.32623099, 10.61466493, 23.71209646, 27.85489139,
       23.31091745, 14.78211757, 12.11214834, 17.90301391, 15.42895731,
       11.7602241 ,  9.27171536, 12.57714149, 17.25865608, 15.2717403 ,
        9.02825105])

否则你可以移动使用pandas.Series.rolling.std, pandas.Series.dropna 然后pandas.Series.to_numpy:

>>> pd.Series(PC_list).rolling(number).std().dropna().to_numpy()
 
array([25.35313557, 11.6209317 , 16.32415133, 15.46019574, 15.29513506,
       14.02947067, 14.68620846, 17.04664993, 16.38348865, 12.9925946 ,
        9.58525968,  5.32623099, 10.61466493, 23.71209646, 27.85489139,
       23.31091745, 14.78211757, 12.11214834, 17.90301391, 15.42895731,
       11.7602241 ,  9.27171536, 12.57714149, 17.25865608, 15.2717403 ,
        9.02825105])

解释np.lib.stride_tricks.as_strided 用于以特殊方式重塑数组,类似于滚动:

>>> np.lib.stride_tricks.as_strided(
            PC_list, 
            shape=(PC_list.shape[0] - number + 1, number), 
            strides=PC_list.strides*2
        )

array([[457.334015, 424.440002, 394.79599 , 408.903992, 398.821014],   #index: 0,1,2,3,4
       [424.440002, 394.79599 , 408.903992, 398.821014, 402.152008],   #index: 1,2,3,4,5
       [394.79599 , 408.903992, 398.821014, 402.152008, 435.790985],   #index: 2,3,4,5,6
       [408.903992, 398.821014, 402.152008, 435.790985, 423.204987],   # ... and so on
       [398.821014, 402.152008, 435.790985, 423.204987, 411.574005],
       [402.152008, 435.790985, 423.204987, 411.574005, 404.424988],
       [435.790985, 423.204987, 411.574005, 404.424988, 399.519989],
       [423.204987, 411.574005, 404.424988, 399.519989, 377.181   ],
       [411.574005, 404.424988, 399.519989, 377.181   , 375.46701 ],
       [404.424988, 399.519989, 377.181   , 375.46701 , 386.944   ],
       [399.519989, 377.181   , 375.46701 , 386.944   , 383.61499 ],
       [377.181   , 375.46701 , 386.944   , 383.61499 , 375.071991],
       [375.46701 , 386.944   , 383.61499 , 375.071991, 359.511993],
       [386.944   , 383.61499 , 375.071991, 359.511993, 328.865997],
       [383.61499 , 375.071991, 359.511993, 328.865997, 320.51001 ],
       [375.071991, 359.511993, 328.865997, 320.51001 , 330.07901 ],
       [359.511993, 328.865997, 320.51001 , 330.07901 , 336.187012],
       [328.865997, 320.51001 , 330.07901 , 336.187012, 352.940002],
       [320.51001 , 330.07901 , 336.187012, 352.940002, 365.026001],
       [330.07901 , 336.187012, 352.940002, 365.026001, 361.562012],
       [336.187012, 352.940002, 365.026001, 361.562012, 362.299011],
       [352.940002, 365.026001, 361.562012, 362.299011, 378.549011],
       [365.026001, 361.562012, 362.299011, 378.549011, 390.414001],
       [361.562012, 362.299011, 378.549011, 390.414001, 400.869995],
       [362.299011, 378.549011, 390.414001, 400.869995, 394.77301 ],
       [378.549011, 390.414001, 400.869995, 394.77301 , 382.556   ]])

现在如果我们将上述数组的std 穿过最后一个轴,得到滚动的std。默认情况下 numpy 使用 ddof=0,即 Delta Degrees of Freedom = 0,这意味着对于 number 的样本数量,除数将等于 number - 0。现在你需要number - 1,你需要ddof=1

【讨论】:

  • 在之前的问题中,我们已经推荐了as_strided。虽然结果是view,但stdX-X.mean() 作为其计算的一部分(如在他的(k - y_mean)**2 中)。那将复制一份,并炸毁他的记忆。 OP不太擅长解释他从以前的问题中学到的东西。 stackoverflow.com/questions/65768068/…, stackoverflow.com/questions/65757073/…
  • @hpaulj 我明白了。这是被遗漏的非常重要的信息。如果看起来很难解释,建议 OP 至少将他以前的帖子链接到问题中。
  • @SayanipDutta 谢谢pd.Series(PC_list).rolling(number).std().dropna().to_numpy() 功能非常出色,正是我一直在寻找的,可以将其应用于xy_meany_mean
  • @SayanipDutta 我已经成功地将这个函数贡献给我的y_meanstd,但是我找不到将它实现给xy_mean 的方法。我已经在xy_mean issue 发了一篇关于它的帖子,如果它也可以缩写为那个函数的话。如果你能看看那个帖子会很高兴。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-07-30
  • 2020-07-16
  • 2020-03-29
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多