是否有一种技术可以为 Plotly Sankey 图创建列标题，类似于 Tableau答案

【问题标题】：Is there a technique to create columns headers for a Plotly Sankey Diagram, similar to Tableau是否有一种技术可以为 Plotly Sankey 图创建列标题，类似于 Tableau
【发布时间】：2022-01-04 19:03:54
【问题描述】：

是否有一种技术可以绘制时间序列流程图，其中列节点表示每个月的开始日期，值表示每种类型的计数，以及表示类型的标签（即消费者、家庭办公室、公司、小型下图示例中的业务）？

Plotly 提供了一些关于如何创建Sankey Diagram in Python 的示例。添加日期作为列标题，类似于 Tableau 示例 Superstore Interactive Demo，将使桑基图更加清晰。例如，“0 级区域”将替换为“2022 年 1 月 1 日”，“2 级客户群”将替换为“2022 年 2 月 1 日”。

【问题讨论】：

标签： python plotly tableau-api sankey-diagram

【解决方案1】：

这是对Plotly.py Sankey Diagrams - Controlling Node Destination 的增强
您声明的要求是按日期创建列，为列使用连接的 sankey 节点的日期部分
清晰的格式可以进一步美化。这显示了如何定义和注释列

样本数据

from_date	to_date	from_type	to_type	value	source	target
2022-01-01 00:00:00	2022-02-01 00:00:00	Consumer	Home Office	3	Consumer_20220101	Home Office_20220201
2022-01-01 00:00:00	2022-03-01 00:00:00	Consumer	Corporate	6	Consumer_20220101	Corporate_20220301
2022-01-01 00:00:00	2022-03-01 00:00:00	Small Business	Corporate	21	Small Business_20220101	Corporate_20220301
2022-01-01 00:00:00	2022-04-01 00:00:00	Consumer	Home Office	14	Consumer_20220101	Home Office_20220401
2022-02-01 00:00:00	2022-03-01 00:00:00	Corporate	Consumer	20	Corporate_20220201	Consumer_20220301

解决方案

import pandas as pd
import numpy as np
import plotly.graph_objects as go

ms = pd.date_range("1-jan-2022", freq="MS", periods=4)
types = ["Consumer", "Home Office", "Corporate", "Small Business"]

# simulate some data, date and type to date and type
s = 50
df = pd.DataFrame(
    {
        "from_date": np.random.choice(ms, s),
        "to_date": np.random.choice(ms, s),
        "from_type": np.random.choice(types, s),
        "to_type": np.random.choice(types, s),
        "value": np.random.randint(1, 20, s),
    }
).loc[
    # remove invalid combis from random generation
    lambda d: (d["to_date"] > d["from_date"]) & (d["from_type"] != d["to_type"])
].groupby(
    ["from_date", "to_date", "from_type", "to_type"], as_index=False
).sum()

# start of solution, define source and target of sankey from column concat
df = df.assign(source=lambda d: d["from_type"] + "_" + d["from_date"].dt.strftime("%Y%m%d"),
          target=lambda d: d["to_type"] + "_" + d["to_date"].dt.strftime("%Y%m%d"),
         )


def factorize(s):
    a = pd.factorize(s, sort=True)[0]
    return (a + 0.01) / (max(a) + 0.1)


# unique nodes
nodes = np.unique(df[["source", "target"]], axis=None)
nodes = pd.Series(index=nodes, data=range(len(nodes)))
# work out positioning of nodes
nodes = (
    nodes.to_frame("id")
    .assign(
        y=lambda d: factorize(d.index.to_series().apply(lambda s: s.split("_")[0])),
        x=lambda d: factorize(d.index.to_series().apply(lambda s: s.split("_")[1])),
    )
)

# now simple job of building sankey
fig = go.Figure(
    go.Sankey(
        arrangement="snap",
        node={"label": nodes.index.to_series().apply(lambda s: s.split("_")[0]), "x": nodes["x"], "y": nodes["y"]},
        link={
            "source": nodes.loc[df["source"], "id"],
            "target": nodes.loc[df["target"], "id"],
            "value": df["value"],
        },
    )
)

for i, x in nodes["x"].drop_duplicates().iteritems():
    fig.add_annotation(x=x, y=1.4, text=i.split("_")[1], showarrow=False)
    
fig

【讨论】：

这太棒了！它结合了在 DataFrame 中构建 from 和 to 的挑战。谢谢！