如果您将协方差矩阵C 分解为L L^T,并生成一个
独立随机向量x,那么Lx 将是一个具有协方差的随机向量
C.
import numpy as np
import matplotlib.pyplot as plt
linalg = np.linalg
np.random.seed(1)
num_samples = 1000
num_variables = 2
cov = [[0.3, 0.2], [0.2, 0.2]]
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((num_variables, num_samples))
mean = [1, 1]
correlated = np.dot(L, uncorrelated) + np.array(mean).reshape(2, 1)
# print(correlated.shape)
# (2, 1000)
plt.scatter(correlated[0, :], correlated[1, :], c='green')
plt.show()
参考:见Cholesky decomposition
如果您想生成两个系列,X 和 Y,带有特定的 (Pearson) correlation coefficient(例如 0.2):
rho = cov(X,Y) / sqrt(var(X)*var(Y))
你可以选择协方差矩阵
cov = [[1, 0.2],
[0.2, 1]]
这使得cov(X,Y) = 0.2 和方差var(X) 和var(Y) 都等于1。所以rho 等于0.2。
例如,下面我们生成一对相关序列,X 和 Y,1000 次。然后我们绘制相关系数的直方图:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
linalg = np.linalg
np.random.seed(1)
num_samples = 1000
num_variables = 2
cov = [[1.0, 0.2], [0.2, 1.0]]
L = linalg.cholesky(cov)
rhos = []
for i in range(1000):
uncorrelated = np.random.standard_normal((num_variables, num_samples))
correlated = np.dot(L, uncorrelated)
X, Y = correlated
rho, pval = stats.pearsonr(X, Y)
rhos.append(rho)
plt.hist(rhos)
plt.show()
如您所见,相关系数通常接近 0.2,但对于任何给定的样本,相关系数很可能不会恰好为 0.2。