求最小化矩阵向量乘积标准差的向量排列答案

【问题标题】：Find permutation of a vector that minimizes std of matrix vector product求最小化矩阵向量乘积标准差的向量排列
【发布时间】：2021-08-05 05:08:28
【问题描述】：

我正在尝试解决下一个问题：

给定对称矩阵A (12x12)，显示比赛的网格。球队排名的向量x (12)。

他们的乘积给出了一个向量，该向量表示与 A 队比赛的所有球队的总排名。

例如：您有 3 个团队。排名x [1, 2, 3]。矩阵A：

      0 2 1
      2 0 4
      1 4 0

矩阵A 已修复。我们需要找到x 的排列，使STD(Ax) 最小。

我之前的尝试是尝试检查所有排列。但它从 12 点开始工作了很长时间！。

import itertools
import numpy as np

A = np.matrix('0,3,1,2,2,2,2,2,2,1,2,2;3,0,3,1,2,3,1,1,2,2,1,2;1,3,0,2,2,2,2,2,2,1,2,2;2,1,2,0,2,1,2,2,2,2,3,2;2,2,2,2,0,2,1,2,2,3,2,1;2,3,2,1,2,0,3,2,1,2,1,2;2,1,2,2,1,3,0,2,1,1,3,3;2,1,2,2,2,2,2,0,3,2,2,1;2,2,2,2,2,1,1,3,0,3,1,2;1,2,1,2,3,2,1,2,3,0,2,2;2,1,2,3,2,1,3,2,1,2,0,2;2,2,2,2,1,2,3,1,2,2,2,0')
min = 1000000
for x in itertools.permutations([2433,2057,1935,1927,1870,1841,1818,1770,1680,1497,1435,1289]):
    x = np.matrix(x).T
    b = A.dot(x)
    cur = np.std(b)
    if cur < min:
        min = cur
        res = x

我知道有 scipy 最小化，但我不知道它可以应付 x 的排列而不是连续优化。

问题是如何尽可能快速准确地解决此任务。

谢谢。

【问题讨论】：

标签： python performance numpy optimization

【解决方案1】：

您可以编写一个更快的 brute-force 实现。

首先，您可以通过处理大量排列来使用矩阵乘法，而不是许多点积。矩阵乘法内核经过高度优化，因此比许多点积运行得更快。

此外，您可以部分地预先计算排列，通过将排列分成两部分来进一步加快计算速度。这个想法是首先建立一个索引，其中包含包含在 12 个元素中选择 5 个元素的所有排列。然后，这个想法是找到一个包含 7 个项目的数组的所有排列（索引而不是值本身）。最后，所有排列都可以从两个索引构建。

请注意，当上述两种优化一起应用时，可能会进一步优化：如果一个排列的一部分是常数，则可以更有效地计算矩阵乘法。

生成的算法很复杂，但比原始算法更有效。代码如下：

def computeOptim(A):
    mini = 1000000

    permValues = np.array([2433, 2057, 1935, 1927, 1870, 1841, 1818, 1770, 1680, 1497, 1435, 1289])

    # Precompute partial permutations: high and low part of all the permutations.
    loPerms = np.array(list(itertools.permutations(range(7))))
    hiPerms = np.array(list(itertools.permutations(range(12), 5)))

    # Iterate over chunks (of 7!=5040 permutations)
    for hiPerm in hiPerms:
        # Find the remaining index to include in the low-part permutations
        loPermIndices = np.array(list(set(range(12))-set(hiPerm)))

        # Find all the possible low-part permutations for the current 
        # high-part permutation by reindexing the values.
        curLoPerms = loPermIndices[loPerms]

        # Compute the chunks of possible x values
        loPermValues = permValues[curLoPerms]
        hiPermValues = permValues[hiPerm]

        # A matrix multiplcation is used to compute many dot product in a row.
        # Compute effciently  B = A @ X  with  X the matrix containing all the permutations
        hiB = A[:,:len(hiPermValues)] @ hiPermValues[None,:].T
        loB = A[:,len(hiPermValues):] @ loPermValues.T
        B = hiB + loB

        multiCur = np.std(B, axis=0)
        minPos = np.argmin(multiCur)

        if multiCur[0,minPos] < mini:
            mini = multiCur[0,minPos]
            res = np.concatenate((hiPermValues, loPermValues[minPos]))

A = np.matrix('0,3,1,2,2,2,2,2,2,1,2,2;3,0,3,1,2,3,1,1,2,2,1,2;1,3,0,2,2,2,2,2,2,1,2,2;2,1,2,0,2,1,2,2,2,2,3,2;2,2,2,2,0,2,1,2,2,3,2,1;2,3,2,1,2,0,3,2,1,2,1,2;2,1,2,2,1,3,0,2,1,1,3,3;2,1,2,2,2,2,2,0,3,2,2,1;2,2,2,2,2,1,1,3,0,3,1,2;1,2,1,2,3,2,1,2,3,0,2,2;2,1,2,3,2,1,3,2,1,2,0,2;2,2,2,2,1,2,3,1,2,2,2,0')
computeOptim(A)

在我的机器上，它在 50 秒内成功找到了最佳解决方案，而原始代码大约需要 5h30。因此，这段代码的速度快了大约 400。

找到的最优解是：

mini = 291.80729942892106
res = [2433 1841 1289 1818 2057 1927 1770 1870 1497 1680 1935 1435]

【讨论】：

谢谢！您能否提供一个解释排列技巧的链接？
你是如何选择 7/5 分割的？
嗯，我不确定是否有一个链接可以具体解释这一点。我创建这个是为了解决这个问题。然而，这背后的数学思想很简单：从一个袋子中挑选 12 个球严格等价于挑选 5 个球，然后再挑选 7 个其他球。从编程的角度来看，我想可以从标准的recursive one 推断出这个算法（由于symbolic expression computing 是这里的索引）。
对于 7/5，我测试了不同的值。我先选择6/6，发现7/5更快，占用内存更少。这个想法是矩阵乘法应该在足够大的矩阵上工作，因此它可以比点积快得多。但是，矩阵不应该太大，因为它们不适合 CPU 缓存，从而减慢计算速度。此外，loPerms 和 hiPerms 的大小也很重要。预先计算所有排列会占用太多内存（而且速度也很慢）。理想情况下，它们应该同样大以最小化内存占用。 7/5 和 8/4 很适合。