Correlation

Are there correlations between variables?

Correlation measures the strength of the linear association between two numerical variables. For example, you could imagine that for children, age correlates with height: the older the child, the taller he or she is. You could reasonably expect to get a straight line or upward curve with a positive slope when you plot age against height.

定义

 Python3Numpy——相关性协方差应用

生物是一个有机的整体,其各个组成部分都是相关联的,我们可以通过研究一个生物的牙齿、爪子或者骨骼来复原这个生物。

协方差:

定义:

 Python3Numpy——相关性协方差应用

对于离散型随机变量:

 Python3Numpy——相关性协方差应用

对于连续性随机变量:

 Python3Numpy——相关性协方差应用

协方差化简:

 Python3Numpy——相关性协方差应用

当X与Y独立时, 有Cov(X, Y) = 0

协方差基本性质:

 Python3Numpy——相关性协方差应用

随机变量和的方差与协方差的关系:

D(X +/- Y) = D(X) + D(Y) +/- 2Cov(X, Y)

协方差的有界性

 Python3Numpy——相关性协方差应用

相关系数:

定义

 Python3Numpy——相关性协方差应用

Python3Numpy——相关性协方差应用

Python3NumPy关于相关性协方差阐述

导入相关模块

import numpy as np
from matplotlib.pyplot import plot
from matplotlib.pyplot import show
import matplotlib.pyplot as plt

导入数据

bhp = np.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=True)

数据BHP.csv文件如下:

BHP

11-02-2011

 

93.11

94.26

92.9

93.72

1741900

BHP

14-02-2011

 

94.57

96.23

94.39

95.64

2620800

BHP

15-02-2011

 

94.45

95.47

93.91

94.56

2461300

BHP

16-02-2011

 

92.67

93.58

92.56

93.3

3270900

BHP

17-02-2011

 

92.65

93.98

92.58

93.93

2650200

BHP

18-02-2011

 

92.34

93

92

92.39

4667300

BHP

22-02-2011

 

93.14

93.98

91.75

92.11

5359800

BHP

23-02-2011

 

91.93

92.46

91.05

92.36

7768400

BHP

24-02-2011

 

92.42

92.71

90.93

91.76

4799100

BHP

25-02-2011

 

93.48

94.04

92.44

93.91

3448300

BHP

28-02-2011

 

94.81

95.11

94.1

94.6

4719800

BHP

01-03-2011

 

95.05

95.2

93.13

93.27

3898900

BHP

02-03-2011

 

93.89

94.89

93.54

94.43

3727700

BHP

03-03-2011

 

95.9

96.11

95.18

96.02

3379400

BHP

04-03-2011

 

96.12

96.44

95.08

95.76

2463900

BHP

07-03-2011

 

96.51

96.66

94.03

94.47

3590900

BHP

08-03-2011

 

93.72

94.47

92.9

94.34

3805000

BHP

09-03-2011

 

92.94

93.13

91.86

92.22

3271700

BHP

10-03-2011

 

89

89.17

87.93

88.31

5507800

BHP

11-03-2011

 

88.24

89.8

88.16

89.59

2996800

BHP

14-03-2011

 

88.17

89.06

87.82

89.02

3434800

BHP

15-03-2011

 

84.58

87.32

84.35

86.95

5008300

BHP

16-03-2011

 

86.31

87.28

83.85

84.88

7809799

BHP

17-03-2011

 

87.32

88.29

86.89

87.38

3947100

BHP

18-03-2011

 

89.53

89.58

88.05

88.56

3809700

BHP

21-03-2011

 

90.13

90.16

88.88

89.59

3098200

BHP

22-03-2011

 

89.5

89.59

88.42

88.71

3500200

BHP

23-03-2011

 

89.57

90.32

88.85

90.02

4285600

BHP

24-03-2011

 

90.86

91.35

89.7

91.26

3918800

BHP

25-03-2011

 

90.42

91.09

90.07

90.67

3632200

vale = np.loadtxt('VALE.csv', delimiter=',', usecols=(6,), unpack=True)

数据VALE.csv文件如下:

VALE

11-02-2011

 

33.88

34.54

33.63

34.37

18433500

VALE

14-02-2011

 

34.53

35.29

34.52

35.13

20780700

VALE

15-02-2011

 

34.89

35.31

34.82

35.14

17756700

VALE

16-02-2011

 

35.16

35.4

34.81

35.31

16792800

VALE

17-02-2011

 

35.18

35.6

35.04

35.57

24088300

VALE

18-02-2011

 

35.31

35.37

34.89

35.03

21286600

VALE

22-02-2011

 

33.94

34.57

33.36

33.44

28364700

VALE

23-02-2011

 

33.43

34.12

33.1

33.94

22559300

VALE

24-02-2011

 

34.3

34.3

33.56

34.21

20591900

VALE

25-02-2011

 

34.67

34.95

34.05

34.27

20151500

VALE

28-02-2011

 

34.34

34.51

33.7

34.23

16126000

VALE

01-03-2011

 

34.39

34.44

33.68

33.76

17282400

VALE

02-03-2011

 

33.61

34.5

33.57

34.32

15870900

VALE

03-03-2011

 

34.77

34.89

34.53

34.87

14648200

VALE

04-03-2011

 

34.67

34.83

34.04

34.5

15330800

VALE

07-03-2011

 

34.43

34.53

32.97

33.23

25040500

VALE

08-03-2011

 

33.22

33.7

32.55

33.29

17093000

VALE

09-03-2011

 

33.23

33.44

32.68

32.88

20026300

VALE

10-03-2011

 

32.17

32.4

31.68

31.91

30803900

VALE

11-03-2011

 

31.53

32.42

31.49

32.17

24429900

VALE

14-03-2011

 

32.03

32.45

31.74

32.44

15525500

VALE

15-03-2011

 

30.99

31.93

30.79

31.91

24767700

VALE

16-03-2011

 

31.99

32.03

30.68

31.04

30394153

VALE

17-03-2011

 

31.44

31.82

31.32

31.51

24035000

VALE

18-03-2011

 

32.17

32.39

31.98

32.14

19740600

VALE

21-03-2011

 

32.81

32.85

32.26

32.42

18923700

VALE

22-03-2011

 

32.13

32.32

31.74

32.25

18934200

VALE

23-03-2011

 

32.39

32.91

32.22

32.7

18359900

VALE

24-03-2011

 

32.82

32.94

32.12

32.36

25894100

VALE

25-03-2011

 

32.26

32.74

31.93

32.34

16688900

数据处理:

bhp_returns = np.diff(bhp) / bhp[:-1]
vale_returns = np.diff(vale) / vale[:-1]

计算bhp_returns和vale_returns的协方差

covariance = np.cov(bhp_returns, vale_returns)
print(covariance)

结果:

[[0.00028179 0.00019766]
 [0.00019766 0.00030123]]

取协方差对角线上的元素:

print(covariance.diagonal())

结果:

[0.00028179 0.00030123]

打印协方差矩阵的迹:

print(covariance.trace())

结果:

0.000583023549920278

计算bhp_returns和vale_returns的相关系数:

print(covariance/((bhp_returns.std()*vale_returns.std())))

结果:

[[1.00173366 0.70264666]
 [0.70264666 1.0708476 ]]
print(np.corrcoef(bhp_returns, vale_returns))

结果:

[[1.         0.67841747]
 [0.67841747 1.        ]]

绘bhp_returns和vale_returns的图像:

t = np.arange(len(bhp_returns))
plot(t, bhp_returns, lw = 1)
plot(t, vale_returns,lw =2)
show()

 结果:

Python3Numpy——相关性协方差应用

相关知识点理解

np.diff(a, n=1, axis=-1)

沿着指定轴计算第N维的离散差值 
参数: 
a:输入矩阵 
n:可选,代表要执行几次差值 
axis:默认是最后一个 
示例:
import numpy as np
A = np.arange(2 , 14).reshape((3 , 4))
A[1 , 1] = 8
print('A:' , A)
# A: [[ 2 3 4 5]
# [ 6 8 8 9]
# [10 11 12 13]]
print(np.diff(A))
# [[1 1 1]
# [2 0 1]
# [1 1 1]]
从输出结果可以看出,其实diff函数就是执行的是后一个元素减去前一个元素

相关文章:

  • 2021-12-08
  • 2022-12-23
  • 2022-01-05
  • 2022-01-02
  • 2021-07-29
  • 2021-08-08
  • 2021-12-16
  • 2021-11-04
猜你喜欢
  • 2022-12-23
  • 2021-11-07
  • 2021-07-18
  • 2021-12-23
  • 2021-05-13
  • 2021-11-08
相关资源
相似解决方案