【发布时间】:2020-07-02 05:37:24
【问题描述】:
下面是查找相关矩阵并对其进行排序的简单代码,但是如何通过获取列对名称来循环它?
import pandas as pd
import numpy as np
d = {
'x1': [1, 4, 4, 5, 6],
'x2': [0, 0, 8, 2, 4],
'x3': [2, 8, 8, 10, 12],
'x4': [-1, -4, -4, -4, -5]
}
df = pd.DataFrame(data=d)
print(df)
print('---')
print(df.corr())
print('---')
corr_matrix = df.corr().abs()
sol = (corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool)).stack().sort_values(ascending=False))
print(sol)
print('---')
for s in sol:
print(s)
# how to print column 1 and 2 pair names with this "s" corr?
结果:
x1 x2 x3 x4
0 1 0 2 -1
1 4 0 8 -4
2 4 8 8 -4
3 5 2 10 -4
4 6 4 12 -5
---
x1 x2 x3 x4
x1 1.000000 0.399298 1.000000 -0.969248
x2 0.399298 1.000000 0.399298 -0.472866
x3 1.000000 0.399298 1.000000 -0.969248
x4 -0.969248 -0.472866 -0.969248 1.000000
---
x1 x3 1.000000
x3 x4 0.969248
x1 x4 0.969248
x2 x4 0.472866
x3 0.399298
x1 x2 0.399298
dtype: float64
---
1.0
0.9692476431690819
0.9692476431690819
0.4728662437434603
0.39929785312496247
0.39929785312496247
我的期望是这样的:
for (column1, column2, s) in sol:
print(column1 + ',' + column2 + ',' + str(s))
结果:
x1, x3, 1.000000
x3, x4, 0.969248
x1, x4, 0.969248
x2, x4, 0.472866
x1, x2, 0.399298
【问题讨论】:
标签: python pandas numpy correlation