【问题标题】:slow browser rendering of plotly heatmap缓慢的浏览器渲染 plotly 热图
【发布时间】:2022-01-25 17:49:26
【问题描述】:

我正在渲染一个 212 行 x 64 列的整数 (final_df) DF,范围从 0 到 6,作为一个(无注释)带图注释的热图。我正在使用来自fig.write_html() 的文件在我的浏览器(Microsoft Edge)中执行此操作。最终的热图在我的浏览器中呈现非常缓慢,以至于我收到“页面无响应”警告,并且任何放大/缩小图表的速度也非常慢。考虑到 df 并没有那么大,这令人惊讶。

谁能建议这是为什么以及如何加快速度?

谢谢, 蒂姆

def discrete_colorscale(bvals, colors):
    #https://chart-studio.plotly.com/~empet/15229/heatmap-with-a-discrete-colorscale/#/
    """
    bvals - list of values bounding intervals/ranges of interest
    colors - list of rgb or hex colorcodes for values in [bvals[k], bvals[k+1]],0<=k < len(bvals)-1
    returns the plotly  discrete colorscale
    """
    if len(bvals) != len(colors)+1:
        raise ValueError('len(boundary values) should be equal to  len(colors)+1')
    bvals = sorted(bvals)     
    nvals = [(v-bvals[0])/(bvals[-1]-bvals[0]) for v in bvals]  #normalized values
    
    dcolorscale = [] #discrete colorscale
    for k in range(len(colors)):
        dcolorscale.extend([[nvals[k], colors[k]], [nvals[k+1], colors[k]]])
    return dcolorscale


#final_df is a 212 row x 64 col df of ints ranging from 0 to 6
#cell_df is an empty 212x64 df of empty strings to remove cell labelling behaviour
cell_df = final_df.applymap(lambda x: annot_map.get(x, x)) 
cell_labels = cell_df.values.tolist()
bvals = [0,1,2,3,4,5,6,7]

colors_map = ['rgb(244,244,255)', #whiteish 
              'rgb(255, 128, 0)', #orange 
              'rgb(255,0,0)', #red 
              'rgb(0, 0, 255)', #blue 
              'rgb(128, 128, 128)', #grey 
              'rgb(0, 255, 0)', #green 
              'rgb(192, 192, 192)'] #light grey

dcolorsc = discrete_colorscale(bvals, colors_map)
bvals = np.array(bvals)
tickvals = [np.mean(bvals[k:k+2]) for k in range(len(bvals)-1)]
ticktext  = ['param 1', 
             'param 2',
             'param 3',
             'param 4',
             'param 5',
             'param 6',
             'param 7']

fig_df = ff.create_annotated_heatmap(final_df.values.tolist(), 
                                      x= list(final_df.columns), 
                                      y=list(final_df.index), 
                                      annotation_text  = cell_labels, 
                                      colorscale=dcolorsc,
                                      colorbar = dict(thickness=25, 
                                                      tickvals=tickvals, 
                                                      ticktext=ticktext),
                                      showscale  = True,
                                      zmin=0, zmax=7,
                                      ygap = 1,
                                      xgap = 1,
                                      )
fig_df.update_layout(
    xaxis={'title' : 'ID 1'},
    yaxis = {'title' : 'ID 2'},
    yaxis_nticks = len(final_df.index),
    xaxis_nticks = len(final_df.columns)
    )

fig_df.write_html(results_file_df)

【问题讨论】:

    标签: python dataframe performance plotly heatmap


    【解决方案1】:

    我怀疑注释对于 plotly 渲染来说非常昂贵。可能即使您将一个 212x64 的空字符串数组传递给 annotation_text 参数,plotly 仍然必须遍历它们以确定没有要添加的注释。

    我创建了一个 212x64 数组,其中包含 0-6 的随机整数,在我的浏览器中渲染也很慢,而且我收到了与您相同的“页面无响应”警告。

    当我使用go.heatmap 时,我能够获得与ff.create_annotated_heatmap 相同的情节,这将执行时间从 5-6 秒缩短到 0.66 秒,并且它在浏览器。

    这似乎比创建带注释的热图而不使用注释更简单(是否有特殊原因需要 ff.create_annotated_heatmap 而不是 go.heatmap?)

    import numpy as np
    import pandas as pd
    import plotly.figure_factory as ff
    import plotly.graph_objects as go
    
    import time
    start_time = time.time()
    
    def discrete_colorscale(bvals, colors):
        #https://chart-studio.plotly.com/~empet/15229/heatmap-with-a-discrete-colorscale/#/
        """
        bvals - list of values bounding intervals/ranges of interest
        colors - list of rgb or hex colorcodes for values in [bvals[k], bvals[k+1]],0<=k < len(bvals)-1
        returns the plotly  discrete colorscale
        """
        if len(bvals) != len(colors)+1:
            raise ValueError('len(boundary values) should be equal to  len(colors)+1')
        bvals = sorted(bvals)     
        nvals = [(v-bvals[0])/(bvals[-1]-bvals[0]) for v in bvals]  #normalized values
        
        dcolorscale = [] #discrete colorscale
        for k in range(len(colors)):
            dcolorscale.extend([[nvals[k], colors[k]], [nvals[k+1], colors[k]]])
        return dcolorscale
    
    
    #final_df is a 212 row x 64 col df of ints ranging from 0 to 6
    #cell_df is an empty 212x64 df of empty strings to remove cell labelling behaviour
    
    ## recreate your dfs
    
    np.random.seed(42)
    final_df = pd.DataFrame(np.random.randint(0,6,size=(212, 64)), columns=list(range(64)))
    
    # cell_df = final_df.applymap(lambda x: annot_map.get(x, x)) 
    cell_df = pd.DataFrame(np.array(['']*212*64).reshape(212,64), columns=list(range(64)))
    cell_labels = cell_df.values.tolist()
    bvals = [0,1,2,3,4,5,6,7]
    
    colors_map = ['rgb(244,244,255)', #whiteish 
                  'rgb(255, 128, 0)', #orange 
                  'rgb(255,0,0)', #red 
                  'rgb(0, 0, 255)', #blue 
                  'rgb(128, 128, 128)', #grey 
                  'rgb(0, 255, 0)', #green 
                  'rgb(192, 192, 192)'] #light grey
    
    dcolorsc = discrete_colorscale(bvals, colors_map)
    bvals = np.array(bvals)
    tickvals = [np.mean(bvals[k:k+2]) for k in range(len(bvals)-1)]
    ticktext  = ['param 1', 
                 'param 2',
                 'param 3',
                 'param 4',
                 'param 5',
                 'param 6',
                 'param 7']
    
    # fig_df = ff.create_annotated_heatmap(final_df.values.tolist(), 
    #                                       x= list(final_df.columns), 
    #                                       y=list(final_df.index), 
    #                                       annotation_text  = cell_labels, 
    #                                       colorscale=dcolorsc,
    #                                       colorbar = dict(thickness=25, 
    #                                                       tickvals=tickvals, 
    #                                                       ticktext=ticktext),
    #                                       showscale  = True,
    #                                       zmin=0, zmax=7,
    #                                       ygap = 1,
    #                                       xgap = 1,
    #                                       )
    
    fig_df = go.Figure([go.Heatmap(
        z=final_df,
        colorscale=dcolorsc,
        colorbar=dict(
            thickness=25, 
            tickvals=tickvals, 
            ticktext=ticktext),
        showscale=True,
        zmin=0, zmax=7,
        ygap=1,
        xgap=1,
        )
    ])
    
    fig_df.update_layout(
        xaxis={'title' : 'ID 1'},
        yaxis = {'title' : 'ID 2'},
        yaxis_nticks = len(final_df.index),
        xaxis_nticks = len(final_df.columns)
        )
    
    fig_df.show()
    
    print(f"Program executed in {time.time() - start_time} seconds")
    
    ## original code with figure_factory annotated heatmap: Program executed in 5.351915121078491 seconds
    ## modified code with graph_objects heatmap: Program executed in 0.6627509593963623 seconds
    # fig_df.write_html(results_file_df)
    

    【讨论】:

    • 你是对的,这确实更好!我应该在问题中提到这一点,但是当我尝试使用 go.heatmap 时遇到的问题之一是悬停文本不能很好地延续。当我做带注释的热图时,悬停文本自动包含点的 x 和 y 轴(即 ID 1 和 ID 2 的值),但我认为 go.heatmap 悬停文本只包含 x/y 列表索引值hovertext,而不是该索引处的值
    • 此外,现在情况变得更糟了,我尝试使用带有不良悬停标签的 go.heatmap 选项以 1894x7675 df 重复该过程 - 它加载(考虑到这是一项了不起的成就没有更小的DF),但性能就像蜗牛一样。也许情节不够强大:(
    • @TimKirkwood 是的,不幸的是,具有 1450 万个单元的热图很可能会在情节上遇到性能问题。您可能会更幸运地在 matplotlib 中创建热图并自己添加缩放和悬停功能,但我不确定这是否能保证更好的性能
    • 我正在使用的 df 通常会非常稀疏,所以希望我可以减少它!对于遇到此问题的其他人,基于这个小线程的 imshow 也是一个不错的选择 - community.plotly.com/t/…
    猜你喜欢
    • 1970-01-01
    • 2022-11-27
    • 1970-01-01
    • 2014-04-20
    • 1970-01-01
    • 2011-02-21
    • 2013-06-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多