numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)[source]

Estimate a covariance matrix, given data and weights.

Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, numpy协方差矩阵numpy.cov, then the covariance matrix element numpy协方差矩阵numpy.cov is the covariance of numpy协方差矩阵numpy.cov and numpy协方差矩阵numpy.cov. The element numpy协方差矩阵numpy.cov is the variance of numpy协方差矩阵numpy.cov.

See the notes for an outline of the algorithm.

Parameters:

m : array_like

A 1-D or 2-D array containing multiple variables and observations. Each row (行) of m represents a variable(变量), and each column(列) a single observation of all those variables(样本). Also see rowvar below.

y : array_like, optional

An additional set of variables and observations. y has the same form as that of m.

rowvar : bool, optional

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias : bool, optional

Default normalization (False) is by (N 1), where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N. These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

ddof : int, optional

If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average. See the notes for the details. The default value is None.

New in version 1.5.

fweights : array_like, int, optional

1-D array of integer freguency weights; the number of times each observation vector should be repeated.

New in version 1.10.

aweights : array_like, optional

1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

New in version 1.10.

Returns:

out : ndarray

The covariance matrix of the variables.

See also

corrcoef
Normalized covariance matrix

Notes

Assume that the observations are in the columns of the observation array m and let fweights and aweights for brevity. The steps to compute the weighted covariance are as follows:

>>> w = f * a
>>> v1 = np.sum(w)
>>> v2 = np.sum(w * a)
>>> m -= np.sum(m * w, axis=1, keepdims=True) / v1
>>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)

Note that when == 1, the normalization factor v1 (v1**2 ddof v2) goes over to (np.sum(f) ddof) as it should.

Examples

Consider two variables, numpy协方差矩阵numpy.cov and numpy协方差矩阵numpy.cov, which correlate perfectly, but in opposite directions:

>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T
>>> x
array([[0, 1, 2],
       [2, 1, 0]])

Note how numpy协方差矩阵numpy.cov increases while numpy协方差矩阵numpy.cov decreases. The covariance matrix shows this clearly:

>>> np.cov(x)
array([[ 1., -1.],
       [-1.,  1.]])

Note that element numpy协方差矩阵numpy.cov, which shows the correlation between numpy协方差矩阵numpy.cov and numpy协方差矩阵numpy.cov, is negative.

Further, note how x and y are combined:

>>> x = [-2.1, -1,  4.3]
>>> y = [3,  1.1,  0.12]
>>> X = np.stack((x, y), axis=0)
>>> print(np.cov(X))
[[ 11.71        -4.286     ]
 [ -4.286        2.14413333]]
>>> print(np.cov(x, y))
[[ 11.71        -4.286     ]
 [ -4.286        2.14413333]]
>>> print(np.cov(x))
11.71


理解协方差矩阵的关键就在于牢记它的计算是不同维度之间的协方差,而不是不同样本之间。拿到一个样本矩阵,最先要明确的就是一行是一个样本还是一个维度,心中明确整个计算过程就会顺流而下,这么一来就不会迷茫了。

相关文章: