提取numpy数组中每个元素的最后两位数字的有效方法答案

【问题标题】：Efficient way of extracting the last two digits of every element in a numpy array提取numpy数组中每个元素的最后两位数字的有效方法
【发布时间】：2020-05-21 22:31:45
【问题描述】：

考虑这个示例：

sample = np.array([0, 1, 2, 3, 4])

我需要尽可能快的方法来生成sample 中每个值的二进制表示的最后 2 位数字的列表/数组。这样我就得到了二进制表示：

bin_sample = [bin(x) for x in sample]
>>> ['0b0', '0b1', '0b10', '0b11']

我像这样解析它们并得到正确的输出：

output = [bin(x)[-2:].replace('b','0') for x in sample]
>>> ['00', '01', '10', '11', '00']

问题是它太慢了，我正在处理大型数组，有什么建议吗？谢谢

编辑：处理5 million 元素需要大约5 seconds。我需要它来接~ 1 second 编辑#2：任何获得~ 500% 速度提升的优化都可以与之前的算法相媲美。

【问题讨论】：

问题是太慢了，我在处理大数组 能详细点吗？太慢有多慢？
@AMC 已编辑时间信息。
没有你的硬件细节的时间要求是没有意义的
@VictorDeleau 你可以推断，我需要将速度提高大约 500%，数字只是为了提供上下文。
bin（和相关方法）作用于一个数字，产生一个字符串。所以你被迭代困住了。要使用快速的numpy 编译代码执行此操作，您必须使用数字方法，例如模数。您只生成 4 个不同的字符串，对吧？

标签： python arrays python-3.x numpy binary

【解决方案1】：

这是一个有点玩弄的解决方案：

def pp():
    a64 = a.astype(np.int64)
    return (((a64&1)<<32)+((a64&2)>>1)+ord('0')*0x100000001).view('U2')

查找正确：

bits_map = np.array(['00', '01', '10', '11'])
def AMC_pp():
    return bits_map[a & 3]

【讨论】：

a.astype(np.int64) 是干什么用的？没有它，第一种方法似乎比我的快 60 毫秒！
@PaulPanzer 确实，第二个至少比 @AMC 快 2 倍！
@AMC 只是一个预防措施。我似乎记得在 Windows 机器上，numpy ints 默认为 int32，我不确定这是否会被后来的操作提升。
@Marcos 是的，我发现第二个将运行时间从 ~160 ms 降低到 ~60 ms，我将其添加到我的基准测试中。

【解决方案2】：

快速基准测试

设置

import numpy as np
test_arr = np.random.randint(0, 10000000, 10000000)

1。原始解决方案

def last_two_bits(arr_in):
    return [bin(num)[-2:].replace('b','0') for num in arr_in]

时间：~5200 毫秒

2。 Solution by @aminrd

bits_map = ['00','01','10','11']
def last_two_bits_nv(arr_in):
    return bits_map[arr_in % 4]

last_two_bits = np.vectorize(last_two_bits_nv)

时间：~2600 毫秒

3。我对@aminrd 解决方案的调整

bits_map = np.array(['00', '01', '10', '11'])
def last_two_bits(arr_in):
    return bits_map[arr_in % 4]

时间：~170 毫秒

4。 First solution by @Paul Panzer

def last_two_bits(arr_in):
    return (((arr_in & 1) << 32) + ((arr_in & 2) >> 1) + ord('0') * 0x100000001).view('U2')

时间：~100 毫秒

5。 Optimized version of method 3, by Paul Panzer

bits_map = np.array(['00', '01', '10', '11'])
def last_two_bits(arr_in):
    return bits_map[arr_in & 3]

时间：~60 毫秒

6。 Solution by @Mad Physicist

def last_two_bits(arr_in):
    output = np.empty((arr_in.size, 2), dtype=np.uint8)
    np.bitwise_and(arr_in >> 1, 1, out=output[:, 0], casting='unsafe')
    np.bitwise_and(arr_in, 1, out=output[:, 1], casting='unsafe')
    output += 48
    return output.view(dtype='S2').ravel()

时间：~60 毫秒

【讨论】：

我看到测试用例的输出不正确：[28 3 1 13 14] get's ['00', '11', '10', '10', '01']。 13 是一个奇数，所以它应该是 01 而不是 10。
@Marcos 已修复，我的位顺序错误。顺便说一句，你知道命名它们的好方法吗？
感谢您的尝试，但这个解决方案比我的要慢1.4 倍。
有了这样一个小样本，使用bin 的简单列表理解肯定会更快。只有在大样本上，阵列方法才能发挥作用。不幸的是 join zip 是一个列表理解。获得这些位可能已经足够快了。问题在于快速组合它们。
@Marcos 我可能找到了更好的解决方案。

【解决方案3】：

如果您正在寻找二进制表示中的最后两位，为什么不根据element % 4 将元素映射到['00','01','10','11']。

import numpy as np
sample = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

map_list = ['00','01','10','11']

def f(x):
    return map_list[x % 4]

f = np.vectorize(f)

output = f(sample)

#['00', '01', '10', '11', '00', '01', '10', '11', '00', '01', '10']

【讨论】：

对于长度为 10000000 的示例列表，这在我的 lapotp 上从 4.73 秒下降到 2.1 秒。
np.vectorize 对速度没有帮助。幸运的是，正如@AMC 所示，您可以直接在 map_list 的数组版本上执行此索引。
@aminrd 仍然，感谢您的帮助，您几乎成功了！
矢量化取消了您刚刚从索引操作中获得的所有好处，该操作已经完全矢量化了
@MadPhysicist 是的，hpauli 提到了这一点。当另一个答案已经发布时，我应该更新我的答案吗？！

【解决方案4】：

我无法使用基准测试工具，但我想知道这是否会有所帮助：

output = np.empty((sample.size, 2), dtype=np.uint8)
np.bitwise_and(sample >> 1, 1, out=output[:, 0], casting='unsafe')
np.bitwise_and(sample, 1, out=output[:, 1], casting='unsafe')
output += 48
output = output.view(dtype='S2').ravel()

【讨论】：

我尝试运行它，它导致了一个错误：TypeError: ufunc 'bitwise_and' output (typecode 'l') could not be coerced to provide output parameter (typecode 'B') 根据到强制转换规则''same_kind''
似乎约为 60 毫秒，我正在再次运行它们。
@AMC。这么堪比保罗的优化版？哇。我没想到。

【解决方案5】：

使用 Numpy 可能会提供更清洁、更快的解决方案，但我无法证明您可以获得的边际性能提升程度。

import numpy as np

sample = np.array([0, 1, 2, 3, 4])
print([np.binary_repr(x, width=2)[-2:] for x in sample])

这将返回以下输出：

['00', '01', '10', '11', '00']

【讨论】：

binary_repr 使用bin，增加了一层保证宽度；这使它变慢。在这个小样本上，它的时间要长 2 倍。