我最终得到了以下解决方案:
- 首先,基于scipy documentation,LiL(链表)格式似乎是此类操作的理想选择。 (但是我从来没有做过任何实际的比较!)
- 我已经使用here 描述的函数来交换行和列。
- 在suggestion of elyase 之后,我在矩阵的“左上”角定义了一个 200*200 的“窗口”,并实现了一个“窗口分数”,它简单地等于窗口内非零元素的数量。
- 为了确定要交换的列,我检查了哪一列在窗口内包含最少的非零元素,以及哪一列在窗口外包含最多的非零元素。在平局的情况下,整个列中非零元素的数量是平局(如果这也是平局,我随机选择)。
- 换行的方法是一样的。
import numpy as np
import scipy.sparse
import operator
def swap_rows(mat, a, b):
''' See link in description'''
def swap_cols(mat, a, b) :
''' See link in description'''
def windowScore(lilmatrix,window):
''' Return no. of non-zero elements inside window. '''
a=lilmatrix.nonzero()
return sum([1 for i,j in list(zip(a[0],a[1])) if i<window and j<window])
def colsToSwap(lilmatrix,window):
''' Determine columns to be swapped.
In: lil_matrix, window (to what col_no is it considered "left")
Out: (minColumnLeft,maxColumnRight) columns inside/outside of window w/ least/most NZ elements'''
# Locate non-zero elements
a=lilmatrix.nonzero()
totalCols=lilmatrix.get_shape()[1]
# Store no. of NZ elements for each column {in the window,in the whole table}, initialize with zeros
colScoreWindow=np.zeros(totalCols)
colScoreWhole=np.zeros(totalCols)
### Set colScoreWindow scores
# Unique row indices
rows_uniq={k for k in a[0] if k<window}
for k in rows_uniq:
# List of tuples w/ location of each NZ element in current row
gen=((row,col) for row,col in list(zip(a[0],a[1])) if row==k)
for row,col in gen:
# Increment no. of NZ elements in current column in colScoreWindow
colScoreWindow[col]+=1
### Set colScoreWhole scores
# Unique row indices
rows_uniq={k for k in a[0]}
for k in rows_uniq:
# List of tuples w/ location of each NZ element in current row
gen=((row,col) for row,col in list(zip(a[0],a[1])) if row==k)
for row,col in gen:
# Increment no. of NZ elements in current column in colScoreWhole
colScoreWhole[col]+=1
# Column inside of window w/ least NZ elements
minColumnLeft=sorted(list(zip(np.arange(totalCols),colScoreWindow,colScoreWhole,np.random.rand(totalCols)))[:window], key=operator.itemgetter(1,2,3))[0][0]
# Column outside of window w/ most NZ elements
maxColumnRight=sorted(list(zip(np.arange(totalCols),colScoreWindow,colScoreWhole,np.random.rand(totalCols)))[window:], key=operator.itemgetter(1,2,3))[-1][0]
return (minColumnLeft,maxColumnRight)
def rowsToSwap(lilmatrix,window):
''' Same as colsToSwap, adjusted for rows.'''
在运行colsToSwap和rowsToSwap的适当次数的迭代和实际的交换函数后,窗口内的非零元素的数量收敛到最大值。请注意,该方法根本没有优化,还有很大的改进空间。例如,我怀疑减少稀疏矩阵类型转换和/或a=lilmatrix.nonzero() 调用的数量会显着加快速度。