【问题标题】:Plot scaled and rotated bivariate distribution using matplotlib使用 matplotlib 绘制缩放和旋转的二元分布
【发布时间】:2019-01-17 09:43:13
【问题描述】:

我正在尝试使用matplotlib plot bivariate gaussian distribution。我想使用两个scatter 点(A 组)、(B 组)的xy 坐标来执行此操作。

我想通过调整COV matrix 来调整distribution 以考虑每个组velocity 及其与用作参考点的附加xy 坐标的距离。

我计算了每组xy 坐标到参考点的距离。距离表示为radius,标记为[GrA_Rad],[GrB_Rad]

所以它们离参考点越远,radius 就越大。我还计算了velocity 标记为[GrA_Vel],[GrB_Vel]。每个组的direction 表示为orientation。这被标记为[GrA_Rotation],[GrB_Rotation]

关于我希望如何针对velocity 和距离(radius) 调整distribution 的问题:

我希望使用SVD。具体来说,如果我有每个scatterrotation angle,这将提供directionvelocity 可用于描述scaling matrix [GrA_Scaling],[GrB_Scaling]。所以这个scalingmatrix可以用来扩展x-direction中的radius和收缩y-direction中的radius。这表示COV matrix

最后,distributionmean 值是通过将location (x,y) 组转换为velocity 的一半来找到的。

简单地说radius 应用于每个组的scatter 点。 COV 矩阵由radiusvelocity 调整。所以使用scalingmatrix 扩展radius 中的x-direction 并收缩y-directiondirection 是从 rotation angle 测量的。然后通过将组位置(x,y) 转换为velocity 的一半来确定distribution mean 的值。

下面是这些变量的df

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as animation

d = ({
    'Time' : [1,2,3,4,5,6,7,8],       
    'GrA_X' : [10,12,17,16,16,14,12,8],                 
    'GrA_Y' : [10,12,13,7,6,7,8,8], 
    'GrB_X' : [5,8,13,16,19,15,13,5],                 
    'GrB_Y' : [6,15,12,7,8,9,10,8],   
    'Reference_X' : [6,8,14,18,13,11,16,15],                 
    'Reference_Y' : [10,12,8,12,15,12,10,8],                  
    'GrA_Rad' : [8.3,8.25,8.2,8,8.15,8.15,8.2,8.3],  
    'GrB_Rad' : [8.3,8.25,8.3,8.4,8.6,8.4,8.3,8.65],               
    'GrA_Vel' : [0,2.8,5.1,6.1,1.0,2.2,2.2,4.0],
    'GrB_Vel' : [0,9.5,5.8,5.8,3.16,4.12,2.2,8.2],               
    'GrA_Scaling' : [0,0.22,0.39,0.47,0.07,0.17,0.17,0.31],
    'GrB_Scaling' : [0,0.53,0.2,0.2,0.06,0.1,0.03,0.4],                   
    'GrA_Rotation' : [0,45,23.2,-26.56,-33.69,-36.86,-45,-135], 
    'GrB_Rotation' : [0,71.6,36.87,5.2,8.13,16.70,26.57,90],                       
     })

df = pd.DataFrame(data = d)

我已经为每个xy 坐标创建了一个animated plot

GrA_X = [10,12,17,16,16,14,12,8]
GrA_Y = [10,12,13,7,6,7,8,8]

GrB_X = [5,8,13,16,19,15,13,5]                 
GrB_Y = [6,15,12,10,8,9,10,8]

Item_X = [6,8,14,18,13,11,16,15]  
Item_Y = [10,12,8,12,15,12,10,8]

scatter_GrA = ax.scatter(GrA_X, GrA_Y) 
scatter_GrB = ax.scatter(GrB_X, GrB_Y) 
scatter_Item = ax.scatter(Item_X, Item_Y) 

def animate(i) :
    scatter_GrA.set_offsets([[GrA_X[0+i], GrA_Y[0+i]]])
    scatter_GrB.set_offsets([[GrB_X[0+i], GrB_Y[0+i]]])
    scatter_Item.set_offsets([[Item_X[0+i], Item_Y[0+i]]])    

ani = animation.FuncAnimation(fig, animate, np.arange(0,9),
                              interval = 1000, blit = False)

【问题讨论】:

  • 我把问题重读了好几遍,用代码做了实验,但恐怕还是没有抓住问题的本质。例如,Item 是什么?动画显示ItemGrAGrB 随时​​间移动,但三者之间没有明显的关系。您能否改写和/或简化问题,并提供由特定示例输入产生的特定输出?
  • 简单地说部分就是我所追求的。我已将项目更改为参考点。我基本上想 1)将半径应用于每个组。 2) 使用方向来提供方向。 3)使用缩放因子在x方向扩大半径并在y方向收缩。
  • 感谢您查看此内容。 3之间的关系是两组之间的距离和参考的距离。这就是决定半径的因素。为了提供一个真实世界的应用程序,两个组分散是人。半径表示对某个区域的影响。这个影响应该根据它们的速度和到参考点的距离进行调整。这有意义吗?
  • 我也不明白这个问题。看来你文中说的所有变量在代码中都不存在?
  • @ImportanceOfBeingErnest。我没有在code 中包含我是如何解决它们的。我刚刚将它们添加到df。我不想将读者与与问题无关的code 混淆(我没有做的事情)。但我想提供有关我如何计算变量的背景信息。我应该马上把问题去掉吗?

标签: python pandas matplotlib covariance gaussian


【解决方案1】:

更新

问题已更新,并且变得更加清晰。我已经更新了我的代码以匹配。这是最新的输出:

除了样式,我认为这与 OP 描述的相符。

这是用于生成上述图的代码:

dfake = ({    
    'GrA_X' : [15,15],                 
    'GrA_Y' : [15,15], 
    'Reference_X' : [15,3],                 
    'Reference_Y' : [15,15],                  
    'GrA_Rad' : [15,25],                 
    'GrA_Vel' : [0,10],
    'GrA_Scaling' : [0,0.5],
    'GrA_Rotation' : [0,45]                     
})

dffake = pd.DataFrame(dfake)
fig,axs = plt.subplots(1, 2, figsize=(16,8))
fig.subplots_adjust(0,0,1,1)
plotone(dffake, 'A', 0, xlim=(0,30), ylim=(0,30), fig=fig, ax=axs[0])
plotone(dffake, 'A', 1, xlim=(0,30), ylim=(0,30), fig=fig, ax=axs[1])
plt.show()

我使用的plotone 函数的完整实现在下面的代码块中。如果您只想了解用于生成和转换二维高斯 PDF 的数学运算,请查看 mvpdf 函数(以及它所依赖的 rotgetcov 函数):

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as sts

def rot(theta):
    theta = np.deg2rad(theta)
    return np.array([
        [np.cos(theta), -np.sin(theta)],
        [np.sin(theta), np.cos(theta)]
    ])

def getcov(radius=1, scale=1, theta=0):
    cov = np.array([
        [radius*(scale + 1), 0],
        [0, radius/(scale + 1)]
    ])

    r = rot(theta)
    return r @ cov @ r.T

def mvpdf(x, y, xlim, ylim, radius=1, velocity=0, scale=0, theta=0):
    """Creates a grid of data that represents the PDF of a multivariate gaussian.

    x, y: The center of the returned PDF
    (xy)lim: The extent of the returned PDF
    radius: The PDF will be dilated by this factor
    scale: The PDF be stretched by a factor of (scale + 1) in the x direction, and squashed by a factor of 1/(scale + 1) in the y direction
    theta: The PDF will be rotated by this many degrees

    returns: X, Y, PDF. X and Y hold the coordinates of the PDF.
    """
    # create the coordinate grids
    X,Y = np.meshgrid(np.linspace(*xlim), np.linspace(*ylim))

    # stack them into the format expected by the multivariate pdf
    XY = np.stack([X, Y], 2)

    # displace xy by half the velocity
    x,y = rot(theta) @ (velocity/2, 0) + (x, y)

    # get the covariance matrix with the appropriate transforms
    cov = getcov(radius=radius, scale=scale, theta=theta)

    # generate the data grid that represents the PDF
    PDF = sts.multivariate_normal([x, y], cov).pdf(XY)

    return X, Y, PDF

def plotmv(x, y, xlim=None, ylim=None, radius=1, velocity=0, scale=0, theta=0, xref=None, yref=None, fig=None, ax=None):
    """Plot an xy point with an appropriately tranformed 2D gaussian around it.
    Also plots other related data like the reference point.
    """
    if xlim is None: xlim = (x - 5, x + 5)
    if ylim is None: ylim = (y - 5, y + 5)

    if fig is None:
        fig = plt.figure(figsize=(8,8))
        ax = fig.gca()
    elif ax is None:
        ax = fig.gca()

    # plot the xy point
    ax.plot(x, y, '.', c='C0', ms=20)

    if not (xref is None or yref is None):
        # plot the reference point, if supplied
        ax.plot(xref, yref, '.', c='w', ms=12)

    # plot the arrow leading from the xy point
    if velocity > 0:
        ax.arrow(x, y, *rot(theta) @ (velocity, 0), 
                 width=.4, length_includes_head=True, ec='C0', fc='C0')

    # fetch the PDF of the 2D gaussian
    X, Y, PDF = mvpdf(x, y, xlim=xlim, ylim=ylim, radius=radius, velocity=velocity, scale=scale, theta=theta)

    # normalize PDF by shifting and scaling, so that the smallest value is 0 and the largest is 1
    normPDF = PDF - PDF.min()
    normPDF = normPDF/normPDF.max()

    # plot and label the contour lines of the 2D gaussian
    cs = ax.contour(X, Y, normPDF, levels=6, colors='w', alpha=.5)
    ax.clabel(cs, fmt='%.3f', fontsize=12)

    # plot the filled contours of the 2D gaussian. Set levels high for smooth contours
    cfs = ax.contourf(X, Y, normPDF, levels=50, cmap='viridis', vmin=-.9, vmax=1)

    # create the colorbar and ensure that it goes from 0 -> 1
    cbar = fig.colorbar(cfs, ax=ax)
    cbar.set_ticks([0, .2, .4, .6, .8, 1])

    # add some labels
    ax.grid()
    ax.set_xlabel('X distance (M)')
    ax.set_ylabel('Y distance (M)')

    # ensure that x vs y scaling doesn't disrupt the transforms applied to the 2D gaussian
    ax.set_aspect('equal', 'box')

    return fig, ax

def fetchone(df, l, i, **kwargs):
    """Fetch all the needed data for one xy point
    """
    keytups = (
        ('x', 'Gr%s_X'%l),
        ('y', 'Gr%s_Y'%l),
        ('radius', 'Gr%s_Rad'%l),
        ('velocity', 'Gr%s_Vel'%l),
        ('scale', 'Gr%s_Scaling'%l),
        ('theta', 'Gr%s_Rotation'%l),
        ('xref', 'Reference_X'),
        ('yref', 'Reference_Y')
    )

    ret = {k:df.loc[i, l] for k,l in keytups}
    # add in any overrides
    ret.update(kwargs)

    return ret

def plotone(df, l, i, xlim=None, ylim=None, fig=None, ax=None, **kwargs):
    """Plot exactly one point from the dataset
    """
    # look up all the data to plot one datapoint
    xydata = fetchone(df, l, i, **kwargs)

    # do the plot
    return plotmv(xlim=xlim, ylim=ylim, fig=fig, ax=ax, **xydata)

旧答案-2

我已经调整了答案以匹配 OP 发布的示例:

这是生成上图的代码:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as sts

def rot(theta):
    theta = np.deg2rad(theta)
    return np.array([
        [np.cos(theta), -np.sin(theta)],
        [np.sin(theta), np.cos(theta)]
    ])

def getcov(radius=1, scale=1, theta=0):
    cov = np.array([
        [radius*(scale + 1), 0],
        [0, radius/(scale + 1)]
    ])

    r = rot(theta)
    return r @ cov @ r.T

def datalimits(*data, pad=.15):
    dmin,dmax = min(d.min() for d in data), max(d.max() for d in data)
    spad = pad*(dmax - dmin)
    return dmin - spad, dmax + spad

d = ({
    'Time' : [1,2,3,4,5,6,7,8],       
    'GrA_X' : [10,12,17,16,16,14,12,8],                 
    'GrA_Y' : [10,12,13,7,6,7,8,8], 
    'GrB_X' : [5,8,13,16,19,15,13,5],                 
    'GrB_Y' : [6,15,12,7,8,9,10,8],   
    'Reference_X' : [6,8,14,18,13,11,16,15],                 
    'Reference_Y' : [10,12,8,12,15,12,10,8],                  
    'GrA_Rad' : [8.3,8.25,8.2,8,8.15,8.15,8.2,8.3],  
    'GrB_Rad' : [8.3,8.25,8.3,8.4,8.6,8.4,8.3,8.65],               
    'GrA_Vel' : [0,2.8,5.1,6.1,1.0,2.2,2.2,4.0],
    'GrB_Vel' : [0,9.5,5.8,5.8,3.16,4.12,2.2,8.2],               
    'GrA_Scaling' : [0,0.22,0.39,0.47,0.07,0.17,0.17,0.31],
    'GrB_Scaling' : [0,0.53,0.2,0.2,0.06,0.1,0.03,0.4],                   
    'GrA_Rotation' : [0,45,23.2,-26.56,-33.69,-36.86,-45,-135], 
    'GrB_Rotation' : [0,71.6,36.87,5.2,8.13,16.70,26.57,90],                       
     })

df = pd.DataFrame(data=d)

limitpad = .5
clevels = 5
cflevels = 50

xmin,xmax = datalimits(df['GrA_X'], df['GrB_X'], pad=limitpad)
ymin,ymax = datalimits(df['GrA_Y'], df['GrB_Y'], pad=limitpad)

X,Y = np.meshgrid(np.linspace(xmin, xmax), np.linspace(ymin, ymax))

fig = plt.figure(figsize=(10,6))
ax = plt.gca()

Zs = []
for l,color in zip('AB', ('red', 'yellow')):
    # plot all of the points from a single group
    ax.plot(df['Gr%s_X'%l], df['Gr%s_Y'%l], '.', c=color, ms=15, label=l)

    Zrows = []
    for _,row in df.iterrows():
        x,y = row['Gr%s_X'%l], row['Gr%s_Y'%l]

        cov = getcov(radius=row['Gr%s_Rad'%l], scale=row['Gr%s_Scaling'%l], theta=row['Gr%s_Rotation'%l])
        mnorm = sts.multivariate_normal([x, y], cov)
        Z = mnorm.pdf(np.stack([X, Y], 2))
        Zrows.append(Z)

    Zs.append(np.sum(Zrows, axis=0))

# plot the reference points

# create Z from the difference of the sums of the 2D Gaussians from group A and group B
Z = Zs[0] - Zs[1]

# normalize Z by shifting and scaling, so that the smallest value is 0 and the largest is 1
normZ = Z - Z.min()
normZ = normZ/normZ.max()

# plot and label the contour lines
cs = ax.contour(X, Y, normZ, levels=clevels, colors='w', alpha=.5)
ax.clabel(cs, fmt='%2.1f', colors='w')#, fontsize=14)

# plot the filled contours. Set levels high for smooth contours
cfs = ax.contourf(X, Y, normZ, levels=cflevels, cmap='viridis', vmin=0, vmax=1)
# create the colorbar and ensure that it goes from 0 -> 1
cbar = fig.colorbar(cfs, ax=ax)
cbar.set_ticks([0, .2, .4, .6, .8, 1])


ax.set_aspect('equal', 'box')

旧答案-1

很难准确地说出你在追求什么。可以通过其协方差矩阵来缩放和旋转多元高斯分布。以下是如何根据您的数据执行此操作的示例:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as sts

def rot(theta):
    theta = np.deg2rad(theta)
    return np.array([
        [np.cos(theta), -np.sin(theta)],
        [np.sin(theta), np.cos(theta)]
    ])

def getcov(scale, theta):
    cov = np.array([
        [1*(scale + 1), 0],
        [0, 1/(scale + 1)]
    ])

    r = rot(theta)
    return r @ cov @ r.T

d = ({
    'Time' : [1,2,3,4,5,6,7,8],       
    'GrA_X' : [10,12,17,16,16,14,12,8],                 
    'GrA_Y' : [10,12,13,7,6,7,8,8], 
    'GrB_X' : [5,8,13,16,19,15,13,5],                 
    'GrB_Y' : [6,15,12,7,8,9,10,8],   
    'Reference_X' : [6,8,14,18,13,11,16,15],                 
    'Reference_Y' : [10,12,8,12,15,12,10,8],                  
    'GrA_Rad' : [8.3,8.25,8.2,8,8.15,8.15,8.2,8.3],  
    'GrB_Rad' : [8.3,8.25,8.3,8.4,8.6,8.4,8.3,8.65],               
    'GrA_Vel' : [0,2.8,5.1,6.1,1.0,2.2,2.2,4.0],
    'GrB_Vel' : [0,9.5,5.8,5.8,3.16,4.12,2.2,8.2],               
    'GrA_Scaling' : [0,0.22,0.39,0.47,0.07,0.17,0.17,0.31],
    'GrB_Scaling' : [0,0.53,0.2,0.2,0.06,0.1,0.03,0.4],                   
    'GrA_Rotation' : [0,45,23.2,-26.56,-33.69,-36.86,-45,-135], 
    'GrB_Rotation' : [0,71.6,36.87,5.2,8.13,16.70,26.57,90],                       
     })

df = pd.DataFrame(data=d)
xmin,xmax = min(df['GrA_X'].min(), df['GrB_X'].min()), max(df['GrA_X'].max(), df['GrB_X'].max())
ymin,ymax = min(df['GrA_Y'].min(), df['GrB_Y'].min()), max(df['GrA_Y'].max(), df['GrB_Y'].max())

X,Y = np.meshgrid(
    np.linspace(xmin - (xmax - xmin)*.1, xmax + (xmax - xmin)*.1),
    np.linspace(ymin - (ymax - ymin)*.1, ymax + (ymax - ymin)*.1)
)

fig,axs = plt.subplots(df.shape[0], sharex=True, figsize=(4, 4*df.shape[0]))
fig.subplots_adjust(0,0,1,1,0,-.82)

for (_,row),ax in zip(df.iterrows(), axs):
    for c in 'AB':
        x,y = row['Gr%s_X'%c], row['Gr%s_Y'%c]

        cov = getcov(scale=row['Gr%s_Scaling'%c], theta=row['Gr%s_Rotation'%c])
        mnorm = sts.multivariate_normal([x, y], cov)
        Z = mnorm.pdf(np.stack([X, Y], 2))

        ax.contour(X, Y, Z)

        ax.plot(row['Gr%s_X'%c], row['Gr%s_Y'%c], 'x')
        ax.set_aspect('equal', 'box')

这个输出:

【讨论】:

  • @JPeter 我再次更新了答案,以反映问题描述/示例中的更新。让我知道这是否符合您的目标。
  • 这真是太棒了@tel。很抱歉交换了原始问题中的数字。我试图简化问题以达到目标。关于您的第二个输出,轮廓是否提供范围为 0-1 的影响概率。您能否简要说明您是如何计算的。我还将把它应用到 df 中的大量数据点。你预见到这有什么问题吗?再次感谢您。这太棒了!
  • 对不起@tel。我在edit 2edit 3 上都遇到了错误:它在cfs = ax.contour(X,Y, normPDF, levels = 50, cmap = 'viridis', vmin = -.9, vmax = 1) 上。错误是if self.filled and len(self.levels) < 2: TypeError: len() of unsized object
  • 看起来您使用的是旧版本的 Matplotlib(引发错误的代码行在 latest version 中看起来不同。首先要尝试的是升级您的 Matplotlib 包(他们修复大量的错误)。
  • 谢谢@tel。我刚刚安装了一个新的env 并开始运行。只是一个快速的。底层math 是否与edit 23 相同。你是plotting 的分布probability。我发现很难概念化scatter 点的density 如何影响normalised output 概率
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2022-08-14
  • 1970-01-01
  • 1970-01-01
  • 2017-06-10
  • 2022-01-10
  • 2019-10-13
  • 2018-11-10
相关资源
最近更新 更多