【问题标题】:How to plot multi-index dataframe as stacked bar chart in Plotly如何在 Plotly 中将多索引数据框绘制为堆积条形图
【发布时间】:2021-09-27 11:01:17
【问题描述】:

我有一个下面的数据框,它被进一步处理以创建一个数据透视表。现在,我正在尝试在 Plotly 中绘制多索引枢轴数据。但在情节上,不知何故它没有取值并显示错误。

我需要在 x 轴上绘制类别“发展”和“发展中”,并绘制与这些类别相关的数据。应在每个类别中绘制相关的“员工”数据。 'Y 轴必须是'GDP',堆栈栏必须是'cond_cat'。下面是代码供参考。

示例数据帧

import pandas as pd
import numpy as np

s = 200
np.random.seed(365)  # so the data is the same each time
df = pd.DataFrame({"Country": np.random.choice(["USA America", "JPY one two", "MEX", "IND", "AUS"], s),   
"employee": np.random.choice(["Bob", "Sam", "John", "Tom", "Harry"], s),
"economy_cat": np.random.choice(["developing","develop"], s),
"cond_cat": np.random.choice(["good","bad", 'worse', 'better', 'average'], s),
 "gdp": np.random.randint(5, 75, s),
})
df = df[df.Country=='USA America']

# print(df.head())
        Country employee economy_cat cond_cat  gdp
9   USA America      Sam  developing   better   30
11  USA America      Bob  developing  average   45
21  USA America     John     develop      bad   29
22  USA America      Sam     develop      bad   73
30  USA America    Harry     develop      bad   25

重塑

df_pivot = df.pivot_table(index=['economy_cat','employee'],columns=['cond_cat'],values='gdp',aggfunc='sum')

# print(df_pivot)
cond_cat              average    bad  better  good  worse
economy_cat employee                                     
develop     Bob           6.0    NaN    46.0   NaN    NaN
            Harry         NaN   25.0     9.0   NaN    NaN
            John         37.0   29.0     NaN   NaN    NaN
            Sam           NaN   82.0     NaN   NaN   60.0
            Tom          48.0    NaN     NaN  51.0    NaN
developing  Bob          45.0    NaN     NaN  45.0    NaN
            Harry        75.0  183.0   113.0   NaN    NaN
            John         16.0   36.0    27.0  67.0    NaN
            Sam           NaN    NaN    30.0   NaN   43.0
            Tom         111.0    NaN     NaN  77.0   73.0

情节

fig = make_subplots(rows=1, cols=1)
fig.add_trace(
go.Bar(
    x= df_pivot["economy_cat","employee"],
    y= df_pivot["cond_cat"],marker_color = "#1f77b4",showlegend=False,
    marker_line_color = '#1f77b4',
    ),
   row=1,
   col=1,
  )
fig.add_trace(
go.Bar(
    x= df_pivot["economy_cat","employee"],
    y= df_pivot["cond_cat"],marker_color = "rgba(255, 0, 0, 0.6)",showlegend=False,
    marker_line_color = "rgba(255, 0, 0, 0.6)",
    ),
    row=1,
   col=1,
 )
fig.update_layout(barmode = 'stack')
fig.show()

绘图时出错

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
e:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

e:\Anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

e:\Anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('economy_cat', 'employee')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
C:\Users\TRENTO~1.MCK\AppData\Local\Temp/ipykernel_18596/2928341867.py in <module>
     14 fig.add_trace(
     15 go.Bar(
---> 16     x= df_pivot["economy_cat","employee"],
     17     y= df_pivot["cond_cat"],marker_color = "#1f77b4",showlegend=False,
     18     marker_line_color = '#1f77b4',

e:\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

e:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: ('economy_cat', 'employee')

【问题讨论】:

    标签: python pandas plotly multi-index


    【解决方案1】:

    如果我理解正确,这就是您要查找的完整代码。

    需要注意的是,plotly 期望数据框列作为轴,而不是多索引,因此,枢轴数据框索引被重置,然后列可以传递给x=

    导入和数据帧

    import pandas as pd
    import numpy as np
    from plotly.subplots import make_subplots
    import plotly.graph_objects as go
    import plotly.express as px
    from itertools import cycle
    
    # beginning with df_pivot from the OP, reset the index
    df = df_pivot.reset_index()
    
    # print(df)
    cond_cat economy_cat employee  average    bad  better  good  worse
    0            develop      Bob      6.0    NaN    46.0   NaN    NaN
    1            develop    Harry      NaN   25.0     9.0   NaN    NaN
    2            develop     John     37.0   29.0     NaN   NaN    NaN
    3            develop      Sam      NaN   82.0     NaN   NaN   60.0
    4            develop      Tom     48.0    NaN     NaN  51.0    NaN
    5         developing      Bob     45.0    NaN     NaN  45.0    NaN
    6         developing    Harry     75.0  183.0   113.0   NaN    NaN
    7         developing     John     16.0   36.0    27.0  67.0    NaN
    8         developing      Sam      NaN    NaN    30.0   NaN   43.0
    9         developing      Tom    111.0    NaN     NaN  77.0   73.0
    

    绘图

    # data and colors
    columns = df.columns[2:]
    palette = cycle(px.colors.qualitative.Alphabet)
    # palette = cycle(px.colors.sequential.PuBu)
    colors = {c:next(palette) for c in columns}
    
    # subplot setup
    fig = make_subplots(rows=1, cols=1)
    
    # add bars
    for cols in columns:
        fig.add_trace(go.Bar(x=[df['economy_cat'], df['employee']],
                                 y = df[cols],
                                 name = cols,
                                 legendgroup = cols,
                                 marker_color = colors[cols],
                                 showlegend = True 
                                ), row = 1, col = 1)
    
    fig.update_layout(barmode='stack')
    fig.show()
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-09-28
    • 2021-10-07
    • 2018-06-05
    • 2018-04-05
    • 1970-01-01
    • 2012-09-17
    • 2021-12-27
    相关资源
    最近更新 更多