在 Pandas 中按索引和名称查找值答案

【问题标题】：Lookup value by index and name in Pandas在 Pandas 中按索引和名称查找值
【发布时间】：2023-01-09 15:41:54
【问题描述】：

我有一个带有扁平层次结构的熊猫数据框：

Level 1 ID	Level 2 ID	Level 3 ID	Level 4 ID	Name	Path
1	null	null	null	Finance	Finance
1	4	null	null	Reporting	Finance > Reporting
1	4	5	null	Tax Reporting	Finance > Reporting > Tax Reporting

我想要做的是根据 Level [] ID 列添加或替换为具有 4 个 Level Name 列的 Level ID 列，如下所示：

Level 1 Name	Level 2 Name	Level 3 Name	Level 4 Name	Name	Path
Finance	null	null	null	Finance	Finance
Finance	Reporting	null	null	Reporting	Finance > Reporting
Finance	Reporting	Tax Reporting	null	Tax Reporting	Finance > Reporting > Tax Reporting

我会在 Path 列上使用分隔符，但在真实数据框中，有 ID 而不是名称。

我应该如何处理这个问题？

【问题讨论】：

“我会在 Path 列上使用分隔符，但在真实的数据框中，有 ID 而不是名称。”这是什么意思？您在路径栏中有类似“财务> 4 > 5”的内容吗？

标签： python pandas

【解决方案1】：

逻辑不清楚，特别是最终值的来源是什么？请参阅下面的两个不同选项。

假设来源是`df['Name']`

cols = df.filter(like='Level ').columns
names = df['Name'].values
mask = df[cols[:len(names)]].notna()

df[cols[:len(names)]] = mask.mul(names, axis=1).where(mask)

输出：

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance        NaN            NaN         NaN        Finance                              Finance
1    Finance  Reporting            NaN         NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance > Reporting > Tax Reporting

如果您想从“路径”中提取

cols = df.filter(like='Level ').columns
names = df['Path'].str.split(' > ', expand=True)

df.loc[:, cols[:names.shape[1]]] = names.to_numpy()

输出：

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance       None           None         NaN        Finance                              Finance
1    Finance  Reporting           None         NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance > Reporting > Tax Reporting

【讨论】：

df['Path'].str.split(' > ', expand=True).reindex(columns=range(4)).fillna('').rename(columns=lambda x: f'Level {x+1} Name')
@Corralien 我想到了这样的事情，但这取决于列的正确格式。但是，是的，这也应该有效；）

假设来源是df['Name']

如果您想从“路径”中提取

假设来源是`df['Name']`