【发布时间】:2019-10-27 16:58:46
【问题描述】:
我正在尝试计算邻接矩阵的加权拓扑重叠,但我无法弄清楚如何使用numpy 正确计算。正确实现的R 函数来自WGCNA (https://www.rdocumentation.org/packages/WGCNA/versions/1.67/topics/TOMsimilarity)。 equation 4 中详细说明了计算此 (I THINK) 的公式,我相信它在下面正确复制。
有谁知道如何正确实现它以反映 WGCNA 版本?
是的,我知道rpy2,但如果可能的话,我会尝试轻量化。
对于初学者,我的对角线不是 1,并且这些值与原始值没有一致的错误(例如,并非全部被 x 关闭)。
当我在R 中计算这个时,我使用了以下内容:
> library(WGCNA, quiet=TRUE)
> df_adj = read.csv("https://pastebin.com/raw/sbAZQsE6", row.names=1, header=TRUE, check.names=FALSE, sep="\t")
> df_tom = TOMsimilarity(as.matrix(df_adj), TOMType="unsigned", TOMDenom="min")
# ..connectivity..
# ..matrix multiplication (system BLAS)..
# ..normalization..
# ..done.
# I've uploaded it to this url: https://pastebin.com/raw/HT2gBaZC
我不确定我的代码在哪里不正确。 R 版本的源代码是here,但它使用的是C 后端脚本?这对我来说很难解释。
这是我在Python 中的实现:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
def get_iris_data():
iris = load_iris()
# Iris dataset
X = pd.DataFrame(iris.data,
index = [*map(lambda x:f"iris_{x}", range(150))],
columns = [*map(lambda x: x.split(" (cm)")[0].replace(" ","_"), iris.feature_names)])
y = pd.Series(iris.target,
index = X.index,
name = "Species")
return X, y
# Get data
X, y = get_iris_data()
# Create an adjacency network
# df_adj = np.abs(X.T.corr()) # I've uploaded this part to this url: https://pastebin.com/raw/sbAZQsE6
df_adj = pd.read_csv("https://pastebin.com/raw/sbAZQsE6", sep="\t", index_col=0)
A_adj = df_adj.values
# Correct TOM from WGCNA for the A_adj
# See above for code
# https://www.rdocumentation.org/packages/WGCNA/versions/1.67/topics/TOMsimilarity
df_tom__wgcna = pd.read_csv("https://pastebin.com/raw/HT2gBaZC", sep="\t", index_col=0)
# My attempt
A = A_adj.copy()
dimensions = A.shape
assert dimensions[0] == dimensions[1]
d = dimensions[0]
# np.fill_diagonal(A, 0)
# Equation (4) from http://dibernardo.tigem.it/files/papers/2008/zhangbin-statappsgeneticsmolbio.pdf
A_tom = np.zeros_like(A)
for i in range(d):
a_iu = A[i]
k_i = a_iu.sum()
for j in range(i+1, d):
a_ju = A[:,j]
k_j = a_ju.sum()
l_ij = np.dot(a_iu, a_ju)
a_ij = A[i,j]
numerator = l_ij + a_ij
denominator = min(k_i, k_j) + 1 - a_ij
w_ij = numerator/denominator
A_tom[i,j] = w_ij
A_tom = (A_tom + A_tom.T)
有一个名为GTOM (https://github.com/benmaier/gtom) 的包,但它不适用于加权邻接。 The author of GTOM also took a look at this problem(这是一个更加复杂/高效的 NumPy 实现,但仍然没有产生预期的结果)。
有人知道如何重现 WGCNA 实现吗?
编辑:2019.06.20
我已经改编了来自@scleronomic 和@benmaier 的一些代码,并在文档字符串中添加了学分。该功能在soothsayer 从v2016.06 及以后可用。希望这将使人们能够更轻松地在 Python 中使用拓扑重叠,而不仅仅是能够使用 R。
https://github.com/jolespin/soothsayer/blob/master/soothsayer/networks/networks.py
import numpy as np
import soothsayer as sy
df_adj = sy.io.read_dataframe("https://pastebin.com/raw/sbAZQsE6")
df_tom = sy.networks.topological_overlap_measure(df_adj)
df_tom__wgcna = sy.io.read_dataframe("https://pastebin.com/raw/HT2gBaZC")
np.allclose(df_tom, df_tom__wgcna)
# True
【问题讨论】:
标签: python arrays r statistics adjacency-matrix