【问题标题】:Re-shape a dataframe for plotting in python重塑数据框以在 python 中绘图
【发布时间】:2021-10-15 22:59:46
【问题描述】:

我有一张数据表,我希望在 python 中将其绘制为线图,但我无法理解如何将其塑造成我想要的形状。

  • 我希望图表上有 10 条线,即成本范围(在我看来,hue='Cost_Range_')。
  • x 轴为 6 个点:Actual_Cost_7 - Actual_Cost_12。
  • y 轴是数字成本比例。
import pandas as pd

data = {'Cost_Range_': ['0%-10% ', '10%-20% ', '20%-30% ', '30%-40% ', '40%-50% ', '50%-60% ', '60%-70% ', '70%-80% ', '80%-90% ', '90%-100% '], 'Actual_Cost_7': [862524.95, 2103586.97, 4787800.34, 7566317.48, 9234256.79, 8089529.83, 8404393.25, 7914513.2, 7295344.15, 28349329.54], 'Actual_Cost_8': [774796.59, 1926257.27, 4500831.22, 6996465.31, 8649839.07, 7637765.74, 7894598.06, 7507044.41, 6992852.64, 26793378.2], 'Actual_Cost_9': [735822.59, 1950636.11, 4630232.69, 7258691.84, 8876958.69, 7871133.78, 8190628.12, 7780306.34, 7112415.57, 27498755.9], 'Actual_Cost_10': [767162.99, 2005088.29, 4724279.94, 7518358.25, 9308307.16, 8099628.47, 8566112.41, 8012750.97, 7596157.21, 28593451.29], 'Actual_Cost_11': [723476.43, 2011078.37, 4642348.79, 7258459.29, 9032494.08, 7969659.06, 8334203.4, 7882871.23, 7327570.81, 27901242.65], 'Actual_Cost_12': [760538.83, 1940075.69, 4371315.1, 6953409.04, 8695870.37, 7647885.77, 7871087.21, 7517534.52, 6971178.15, 26562179.63]}
df = pd.DataFrame(data)

  Cost_Range_  Actual_Cost_7  Actual_Cost_8  Actual_Cost_9  Actual_Cost_10  Actual_Cost_11  Actual_Cost_12
0     0%-10%       862524.95      774796.59      735822.59       767162.99       723476.43       760538.83
1    10%-20%      2103586.97     1926257.27     1950636.11      2005088.29      2011078.37      1940075.69
2    20%-30%      4787800.34     4500831.22     4630232.69      4724279.94      4642348.79      4371315.10
3    30%-40%      7566317.48     6996465.31     7258691.84      7518358.25      7258459.29      6953409.04
4    40%-50%      9234256.79     8649839.07     8876958.69      9308307.16      9032494.08      8695870.37
5    50%-60%      8089529.83     7637765.74     7871133.78      8099628.47      7969659.06      7647885.77
6    60%-70%      8404393.25     7894598.06     8190628.12      8566112.41      8334203.40      7871087.21
7    70%-80%      7914513.20     7507044.41     7780306.34      8012750.97      7882871.23      7517534.52
8    80%-90%      7295344.15     6992852.64     7112415.57      7596157.21      7327570.81      6971178.15
9   90%-100%     28349329.54    26793378.20    27498755.90     28593451.29     27901242.65     26562179.63

我得到这个:

但是想要这样的东西:

任何建议将不胜感激。

【问题讨论】:

  • 所需的绘图是错误的可视化类型,因为 x 轴是离散的,而不是连续的。在这种情况下,可视化应该是条形图。 (1) df.set_index('Cost_Range_', inplace=True) (2) ax = df.T.plot(kind='bar', figsize=(12, 6), rot=0) (3) ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left') (4) ax.set_yscale('log')。使用pandas 1.3.1

标签: python pandas dataframe matplotlib seaborn


【解决方案1】:
  • 使用plotly生成图
  • pd.melt() 开始构造情节
  • xaxis 需要是连续的而不是分类的才能生成您想要的图。因此生成了另一个数据框,它将为分类提供一个连续的数字
  • 创建图形后,更新 xaxis 刻度以显示您想要的值
import pandas as pd
import plotly.express as px

data = {'Cost_Range_': ['0%-10% ', '10%-20% ', '20%-30% ', '30%-40% ', '40%-50% ', '50%-60% ', '60%-70% ', '70%-80% ', '80%-90% ', '90%-100% '], 'Actual_Cost_7': [862524.95, 2103586.97, 4787800.34, 7566317.48, 9234256.79, 8089529.83, 8404393.25, 7914513.2, 7295344.15, 28349329.54], 'Actual_Cost_8': [774796.59, 1926257.27, 4500831.22, 6996465.31, 8649839.07, 7637765.74, 7894598.06, 7507044.41, 6992852.64, 26793378.2], 'Actual_Cost_9': [735822.59, 1950636.11, 4630232.69, 7258691.84, 8876958.69, 7871133.78, 8190628.12, 7780306.34, 7112415.57, 27498755.9], 'Actual_Cost_10': [767162.99, 2005088.29, 4724279.94, 7518358.25, 9308307.16, 8099628.47, 8566112.41, 8012750.97, 7596157.21, 28593451.29], 'Actual_Cost_11': [723476.43, 2011078.37, 4642348.79, 7258459.29, 9032494.08, 7969659.06, 8334203.4, 7882871.23, 7327570.81, 27901242.65], 'Actual_Cost_12': [760538.83, 1940075.69, 4371315.1, 6953409.04, 8695870.37, 7647885.77, 7871087.21, 7517534.52, 6971178.15, 26562179.63]}
df = pd.DataFrame(data)

dfp = pd.melt(
    df, id_vars="Cost_Range_", value_vars=[c for c in df.columns if c != "Cost_Range_"]
)
dfpcat = (
    df.columns.to_series()
    .reset_index(drop=True)
    .reset_index()
    .loc[lambda d: d[0].str.contains("Actual")]
)

px.line(
    dfp.merge(dfpcat, left_on="variable", right_on=0),
    x="index",
    y="value",
    color="Cost_Range_",
).update_layout(
    xaxis={"tickmode": "array", "tickvals": dfpcat["index"], "ticktext": dfpcat[0]}
)

【讨论】:

    猜你喜欢
    • 2020-02-25
    • 1970-01-01
    • 1970-01-01
    • 2020-10-05
    • 2020-04-19
    • 1970-01-01
    • 2020-10-21
    • 1970-01-01
    • 2012-10-26
    相关资源
    最近更新 更多