【问题标题】:Binning data and plotting分箱数据和绘图
【发布时间】:2019-06-21 03:17:21
【问题描述】:

我有一个基本上是随机数的数据框(除了一列),其中一些是NaNs。 MWE:

import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd

randomNumberGenerator = np.random.RandomState(1000)
z = 5 * randomNumberGenerator.rand(101)
A = 4 * z - 3+ randomNumberGenerator.randn(101)
B = 4 * z - 2+ randomNumberGenerator.randn(101)
C = 4 * z - 1+ randomNumberGenerator.randn(101)
D = 4 * z - 4+ randomNumberGenerator.randn(101)

A[50] = np.nan
A[:3] = np.nan
B[12:20] = np.nan

sources= pd.DataFrame({'z': z})
sources['A'] = A
sources['B'] = B
sources['C'] = C
sources['D'] = D
#sources= sources.dropna()
x = sources.z
y1 = sources.A
y2 = sources.B
y3 = sources.C
y4 = sources.D

for i in [y1, y2, y3, y4]:
    count = np.count_nonzero(~np.logical_or(np.isnan(x), np.isnan(i)))
    label = 'Points plotted: %d'%count
    plt.scatter(x, i, label = label)

plt.legend()

我需要根据x 对数据进行分箱,并在每个分箱中绘制不同的列,在 3 个并排的子图中:

x_1 <= 1 plot A-B  |  1 < x_2 < 3 plot B+C  |  3 < x_3 plot C-D

我已尝试将数据与

x1 = sources[sources['z']<1]      # z < 1
x2 = sources[sources['z']<3]
x2 = x2[x2['z']>=1]               # 1<= z < 3
x3 = sources[sources['z']<max(z)] 
x3 = x3[x3['z']>=3]               # 3 <= z <= max(z)
x1 = x1['z']
x2 = x2['z']
x3 = x3['z']

但是必须有更好的方法来解决它。制作这样的东西的最佳方法是什么?

【问题讨论】:

    标签: pandas matplotlib subplot binning


    【解决方案1】:

    对于pandas中的分箱使用cut,所以解决方案是:

    sources= pd.DataFrame({'z': z})
    sources['A'] = A
    sources['B'] = B
    sources['C'] = C
    sources['D'] = D
    #sources= sources.dropna()
    
    bins = pd.cut(sources['z'], [-np.inf, 1, 3, max(z)], labels=[1,2,3])
    
    m1 = bins == 1
    m2 = bins == 2
    m3 = bins == 3
    
    x11 = sources.loc[m1, 'A']
    x12 = sources.loc[m1, 'B']
    
    x21 = sources.loc[m2, 'B']
    x22 = sources.loc[m2, 'C']
    
    x31 = sources.loc[m3, 'C']
    x32 = sources.loc[m3, 'D']
    
    y11 = sources.loc[m1, 'A']
    y12 = sources.loc[m1, 'B']
    
    y21 = sources.loc[m2, 'B']
    y22 = sources.loc[m2, 'C']
    
    y31 = sources.loc[m3, 'C']
    y32 = sources.loc[m3, 'D']
    

    tups = [(x11, x12, y11, y12), (x21, x22,y21, y22),(x31, x32, y31, y32)]
    
    fig, ax = plt.subplots(1,3)
    ax = ax.flatten()
    
    for k, (i1, i2, j1, j2) in enumerate(tups):
    
        count1 = np.count_nonzero(~np.logical_or(np.isnan(i1), np.isnan(j1)))
        count2 = np.count_nonzero(~np.logical_or(np.isnan(i2), np.isnan(j2)))
    
        label1 = 'Points plotted: %d'%count1
        label2 = 'Points plotted: %d'%count2
    
        ax[k].scatter(i1, j1, label = label1)
        ax[k].scatter(i2, j2, label = label2)
    
        ax[k].legend()
    

    【讨论】:

      猜你喜欢
      • 2023-03-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-03-09
      • 1970-01-01
      • 2016-12-18
      • 1970-01-01
      相关资源
      最近更新 更多