沿着二维数组对数组进行排序答案

【问题标题】：Sorting an Array Alongside a 2d Array沿着二维数组对数组进行排序
【发布时间】：2014-09-17 11:54:04
【问题描述】：

所以我正在使用 NumPy 的线性代数例程来做一些基本的计算量子力学。假设我有一个矩阵，hamiltonian，我想要它的特征值和特征向量

import numpy as np
from numpy import linalg as la

hamiltonian = np.zeros((N, N)) # N is some constant I have defined
# fill up hamiltonian here
energies, states = la.eig(hamiltonian)

现在，我想按升序对能量进行排序，并且我想将状态与它们一起排序。例如，如果我这样做：

groundStateEnergy = min(energies)
groundStateIndex = np.where(energies == groundStateEnergy)
groundState = states[groundStateIndex, :]

我正确地绘制了基态（特征值最低的特征向量）。但是，如果我尝试这样的事情：

energies, states = zip(*sorted(zip(energies, states)))

甚至

energies, states = zip(*sorted(zip(energies, states), key = lambda pair:pair[0])))

以相同的方式绘制不再绘制正确的状态。那么我如何才能将状态与能量一起排序，但只能按行？（即，我想将每行状态与能量值相关联，并且我想重新排列行，以便行的顺序对应于能量值的排序顺序）

【问题讨论】：

你应该检查numpy.argsort
使用来自numpy.argsort(energies) 的索引对states 数组进行排序。
所以，一旦我从 numpy.argsort(energies) 获得索引，我如何仅重新排列使用它们的状态行？

标签： python arrays sorting numpy

【解决方案1】：

你可以使用argsort如下：

>>> x = np.random.random((1,10))

>>> x
array([ 0.69719108,  0.75828237,  0.79944838,  0.68245968,  0.36232211,
        0.46565445,  0.76552493,  0.94967472,  0.43531813,  0.22913607])
>>> y = np.random.random((10))
>>> y
array([ 0.64332275,  0.34984653,  0.55240204,  0.31019789,  0.96354724,
    0.76723872,  0.25721343,  0.51629662,  0.13096252,  0.86220311])
>>> idx = np.argsort(x)
>>> idx
array([9, 4, 8, 5, 3, 0, 1, 6, 2, 7])
>>> xsorted= x[idx]
>>> xsorted
array([ 0.22913607,  0.36232211,  0.43531813,  0.46565445,  0.68245968,
        0.69719108,  0.75828237,  0.76552493,  0.79944838,  0.94967472])
>>> ysordedbyx = y[idx]
>>> ysordedbyx
array([ 0.86220311,  0.96354724,  0.13096252,  0.76723872,  0.31019789,
        0.64332275,  0.34984653,  0.25721343,  0.55240204,  0.51629662])

并且正如 cmets 所建议的那样，我们通过第一个列对 2d 数组进行排序

>>> x=np.random.random((10,2))
>>> x
array([[ 0.72789275,  0.29404982],
       [ 0.05149693,  0.24411234],
       [ 0.34863983,  0.58950756],
       [ 0.81916424,  0.32032827],
       [ 0.52958012,  0.00417253],
       [ 0.41587698,  0.32733306],
       [ 0.79918377,  0.18465189],
       [ 0.678948  ,  0.55039723],
       [ 0.8287709 ,  0.54735691],
       [ 0.74044999,  0.70688683]])
>>> idx = np.argsort(x[:,0])
>>> idx
array([1, 2, 5, 4, 7, 0, 9, 6, 3, 8])
>>> xsorted = x[idx,:]
>>> xsorted
array([[ 0.05149693,  0.24411234],
       [ 0.34863983,  0.58950756],
       [ 0.41587698,  0.32733306],
       [ 0.52958012,  0.00417253],
       [ 0.678948  ,  0.55039723],
       [ 0.72789275,  0.29404982],
       [ 0.74044999,  0.70688683],
       [ 0.79918377,  0.18465189],
       [ 0.81916424,  0.32032827],
       [ 0.8287709 ,  0.54735691]])

【讨论】：

一旦有了对数组进行排序的索引，就使用np.take()，这比花式索引要快...
答案是正确的，但在 OP 的示例中，y 是二维的。展示一个在二维数组上使用idx 的示例可能会更好。
@SaulloCastro 使用take 可能仍然更快，但是在 numpy 1.9 中，花式索引的性能得到了很大改进，所以这个小技巧可能不再值得了。对于任何旧版本，是的，绝对是，速度提高 2 倍到 10 倍，非常值得。
@Jaime 感谢您的更新，NumPy 的改进令人惊叹！