【问题标题】:How to loop over correlation sorted list?如何循环相关排序列表?
【发布时间】:2020-07-02 05:37:24
【问题描述】:

下面是查找相关矩阵并对其进行排序的简单代码,但是如何通过获取列对名称来循环它?

import pandas as pd
import numpy as np

d = {
    'x1': [1, 4, 4, 5, 6], 
    'x2': [0, 0, 8, 2, 4], 
    'x3': [2, 8, 8, 10, 12], 
    'x4': [-1, -4, -4, -4, -5]
}
df = pd.DataFrame(data=d)
print(df)
print('---')
print(df.corr())
print('---')

corr_matrix = df.corr().abs()
sol = (corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool)).stack().sort_values(ascending=False))
print(sol)
print('---')

for s in sol:
    print(s)
    # how to print column 1 and 2 pair names with this "s" corr?

结果:

   x1  x2  x3  x4
0   1   0   2  -1
1   4   0   8  -4
2   4   8   8  -4
3   5   2  10  -4
4   6   4  12  -5
---
          x1        x2        x3        x4
x1  1.000000  0.399298  1.000000 -0.969248
x2  0.399298  1.000000  0.399298 -0.472866
x3  1.000000  0.399298  1.000000 -0.969248
x4 -0.969248 -0.472866 -0.969248  1.000000
---
x1  x3    1.000000
x3  x4    0.969248
x1  x4    0.969248
x2  x4    0.472866
    x3    0.399298
x1  x2    0.399298
dtype: float64
---
1.0
0.9692476431690819
0.9692476431690819
0.4728662437434603
0.39929785312496247
0.39929785312496247

我的期望是这样的:

for (column1, column2, s) in sol:
    print(column1 + ',' + column2 + ',' + str(s))

结果:

x1, x3, 1.000000
x3, x4, 0.969248
x1, x4, 0.969248
x2, x4, 0.472866
x1, x2, 0.399298

【问题讨论】:

    标签: python pandas numpy correlation


    【解决方案1】:

    您可以使用DataFrame.itertuples 将数据帧行作为命名对进行迭代:

    pairs = sol.reset_index().itertuples(index=False, name=None)
    print('\n'.join(str(p).strip('()') for p in pairs))
    

    或者也可以使用Series.iteritems:

    for item in sol.iteritems():
        print(str(item).replace('(', '').replace(')', ''))
    

    结果:

    'x1', 'x3', 1.0
    'x3', 'x4', 0.9692476431690819
    'x1', 'x4', 0.9692476431690819
    'x2', 'x4', 0.4728662437434603
    'x2', 'x3', 0.39929785312496247
    'x1', 'x2', 0.39929785312496247
    

    【讨论】:

      【解决方案2】:

      这是你要找的吗:

      print(sol.reset_index())
      
        level_0 level_1         0
      0      x1      x3  1.000000
      1      x3      x4  0.969248
      2      x1      x4  0.969248
      3      x2      x4  0.472866
      4      x2      x3  0.399298
      5      x1      x2  0.399298
      

      【讨论】:

        【解决方案3】:

        你已经接近了,你可以循环 Series.items 并通过 (column1, column2) 解包 MultiIndex 值:

        for ((column1, column2), s) in sol.items():
            print(column1 + ',' + column2 + ',' + str(s))
            
        x1,x3,1.0
        x3,x4,0.9692476431690819
        x1,x4,0.9692476431690819
        x2,x4,0.4728662437434603
        x2,x3,0.39929785312496247
        x1,x2,0.39929785312496247
        

        f-strings 类似的解决方案:

        for ((column1, column2), s) in sol.items():
            print( f"{column1},{column2},{s}")
            
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2021-12-10
          • 2022-06-14
          相关资源
          最近更新 更多