Python numpy percentile vs scipy percentileofscore答案

【问题标题】：Python numpy percentile vs scipy percentileofscorePython numpy percentile vs scipy percentileofscore
【发布时间】：2020-01-06 14:31:45
【问题描述】：

我对自己做错了什么感到困惑。

我有以下代码：

import numpy as np
from scipy import stats

df
Out[29]: array([66., 69., 67., 75., 69., 69.])

val = 73.94
z1 = stats.percentileofscore(df, val)
print(z1)
Out[33]: 83.33333333333334

np.percentile(df, z1)
Out[34]: 69.999999999

我期待np.percentile(df, z1) 会回我val = 73.94

【问题讨论】：

标签： numpy scipy python-3.5

【解决方案1】：

我认为您不太了解 percentileofscore 和 percentile 实际上做了什么。它们不是彼此的倒数。

来自scipy.stats.percentileofscore 的文档：

分数相对于分数列表的百分位排名。

例如 80% 的 percentileofscore 表示 a 中 80% 的分数低于给定分数。在间隙或联系的情况下，确切的定义取决于可选关键字 kind。

因此，当您提供值 73.94 时，df 的 5 元素低于该分数，5/6 为您提供 83.3333% 结果。

现在在 numpy.percentile 的注释中：

给定一个长度为 N 的向量 V，V 的第 q 个百分位数是 V 的排序副本中从最小值到最大值的值 q/100。

默认的interpolation参数是'linear'所以：

'linear'：i + (j - i) * fraction，其中 fraction 是 i 和 j 包围的索引的小数部分。

由于您提供了83 作为输入参数，因此您正在查看数组中从最小值到最大值的值83/100。

如果你有兴趣挖掘源代码，你可以找到它here，但这里是计算的简化视图：

ap = np.asarray(sorted(df))
Nx = df.shape[0]

indices = z1 / 100 * (Nx - 1)
indices_below = np.floor(indices).astype(int)
indices_above = indices_below + 1

weight_above = indices - indices_below
weight_below = 1 - weight_above

x1 = ap[b] * weight_below   # 57.50000000000004
x2 = ap[a] * weight_above   # 12.499999999999956

x1 + x2

70.0

【讨论】：