访问数组元素时的性能注意事项答案

【问题标题】：Performance considerations while accessing array elements访问数组元素时的性能注意事项
【发布时间】：2015-11-27 11:40:19
【问题描述】：

我目前在空闲时间学习一些 bash，并使用 bash 解决了一些简单的编码挑战。解决最后一个挑战我观察到一些奇怪的优化问题：

# read the number of input values
read N

# read all input values into an array
readarray -n $N P

# sort the input array in an ascending order
PS=($(printf "%s\n" ${P[@]} | sort -n))

# init the current minimum distance of two horses by the max input value plus one
Min=10000001

# iterate over the ascending sorted array
# and find the minimum distance
for ((i = 1; i < N; i++)); do
  # compute the difference between the current and the last
  #D=$((PS[i]-PS[i-1]))
  D=$((-PS[i-1]+PS[i]))

  if [ $D -le $Min ]; then
    Min=$D
  fi
done

# finally print the minimum distnce
echo $Min

以某种方式访问 PS[i] 并随后访问 PS[i-1] 会导致 100'000 个输入值的测试用例超时。然而，以相反的顺序访问完全相同的数组元素会导致测试用例正确运行。非关联数组访问应该花费 O(1) 时间，那么访问顺序怎么可能影响运行时性能呢？内部是否有一些我不知道的 bash 魔术（比如数组被实现为单链表或类似的东西？？）

【问题讨论】：

一句油嘴滑舌的评论：如果您担心性能，请不要使用 bash 进行编程。众所周知，它很慢。
stackoverflow.com/questions/32592662/… - 根据这个问题的答案，正如您所怀疑的那样，Bash 中的数组是链表。
bash数组和maps是关联数组，即kv对的链表。所以数组在最坏的情况下需要 O(n) 时间，地图也是如此。

标签： bash

【解决方案1】：

Bash 数组不是 C 意义上的数组，因为您可以拥有稀疏数组而不会浪费内存：

a=([1]=0 [100000000]=2 [10000000000]=3)

普通数组允许 O(1) 访问，因为索引指定的内存位置可以通过 O(1) 中的公式计算，这通常是因为数组存储在连续内存中，您只需要计算一个抵消。通常，稀疏数组是使用链表实现的，即使它们是使用哈希图之类的其他东西实现的，access may not be O(1)。

【讨论】：