让我们将@Daniel 的建议放入代码中。
第 1 步
让我们导入multivariate_normal:
import numpy as np
from scipy.stats import multivariate_normal as mvn
第 2 步
让我们构造协方差数据并生成数据:
cov = np.array([[1, 0.8,.7, .6],[.8,1.,.5,.5],[0.7,.5,1.,.5],[0.6,.5,.5,1]])
cov
array([[ 1. , 0.8, 0.7, 0.6],
[ 0.8, 1. , 0.5, 0.5],
[ 0.7, 0.5, 1. , 0.5],
[ 0.6, 0.5, 0.5, 1. ]])
这是关键步骤。请注意,协方差矩阵在对角线上有1's,并且随着您从左向右移动,协方差会减小。
现在我们准备生成数据,让我们坐下 1'000 个点:
scores = mvn.rvs(mean = [60.,60.,60.,60.], cov=cov, size = 1000)
完整性检查(从协方差矩阵到简单的相关性):
np.corrcoef(scores.T):
array([[ 1. , 0.78886583, 0.70198586, 0.56810058],
[ 0.78886583, 1. , 0.49187904, 0.45994833],
[ 0.70198586, 0.49187904, 1. , 0.4755558 ],
[ 0.56810058, 0.45994833, 0.4755558 , 1. ]])
请注意,np.corrcoef 期望您的数据成行。
最后,让我们将你的数据放入 Pandas 的DataFrame:
df = pd.DataFrame(data = scores, columns = ["Math", "Science","History", "Art"])
df.head()
Math Science History Art
0 60.629673 61.238697 61.805788 61.848049
1 59.728172 60.095608 61.139197 61.610891
2 61.205913 60.812307 60.822623 59.497453
3 60.581532 62.163044 59.277956 60.992206
4 61.408262 59.894078 61.154003 61.730079
第 3 步
让我们可视化我们刚刚生成的一些数据:
ax = df.plot(x = "Math",y="Art", kind="scatter", color = "r", alpha = .5, label = "Art, $corr_{Math}$ = .6")
df.plot(x = "Math",y="Science", kind="scatter", ax = ax, color = "b", alpha = .2, label = "Science, $corr_{Math}$ = .8")
ax.set_ylabel("Art and Science");