密集阵列解决方案
将a_ 形式转换为a 形式很容易,其功能如下:
def foo(a_, n):
a = np.zeros(n,int)
a[a_] = 1
return a
In [1565]: foo([1,2],5)
Out[1565]: array([0, 1, 1, 0, 0])
In [1566]: foo([0,1],5)
Out[1566]: array([1, 1, 0, 0, 0])
使用简单列表,该函数给出所需的值,但带有警告。
In [1572]: a1=[0,1,1,0,0];a2= [1,1,0,0,0]
In [1573]: pairwise.cosine_distances(a1,a2)
/usr/lib/python3/dist-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
Out[1573]: array([[ 0.5]])
所以我需要修改我的foo,所以它也创建了 (1,5) 数组:
def foo(a_, n):
a = np.zeros((1,n),int)
a[:,a_] = 1
return a
In [1575]: pairwise.cosine_distances(foo([1,2],5),foo([0,1],5))
Out[1575]: array([[ 0.5]])
稀疏矩阵解
cosine_distance 接受稀疏矩阵输入。
制作稀疏矩阵的最简单方法是只使用密集数组,甚至是a1 列表
In [1580]: from scipy import sparse
In [1592]: sparse.csr_matrix(a1)
Out[1592]:
<1x5 sparse matrix of type '<class 'numpy.int32'>'
with 2 stored elements in Compressed Sparse Row format>
In [1593]: sparse.csr_matrix(a1).A # view it as a dense array
Out[1593]: array([[0, 1, 1, 0, 0]], dtype=int32)
In [1594]: pairwise.cosine_distances( sparse.csr_matrix(a1), sparse.csr_matrix(a2))
Out[1594]: array([[ 0.5]])
所以作为一个中间步骤我可以做:
In [1581]: sparse.csr_matrix(foo([1,2],5))
Out[1581]:
<1x5 sparse matrix of type '<class 'numpy.int32'>'
with 2 stored elements in Compressed Sparse Row format>
下一步是直接从a_ 格式制作稀疏矩阵。这需要更多关于稀疏矩阵的知识。
使用稀疏的coo 输入样式:
In [1601]: sparse.csr_matrix(([1,1],([0,0],[1,2])), shape=(1,5)).A
Out[1601]: array([[0, 1, 1, 0, 0]], dtype=int32)
def mkcsr(a_, n):
col = np.array(a_)
row = np.zeros_like(col)
data = np.ones_like(col)
return sparse.csr_matrix((data, (row, col)), shape=(1,n))
In [1611]: mkcsr([1,2],5).A
Out[1611]: array([[0, 1, 1, 0, 0]], dtype=int32)
In [1614]: pairwise.cosine_distances(mkcsr([1,2],5), mkcsr([0,1],5))
Out[1614]: array([[ 0.5]])