【问题标题】:Lookup value by index and name in Pandas在 Pandas 中按索引和名称查找值
【发布时间】:2023-01-09 15:41:54
【问题描述】:

我有一个带有扁平层次结构的熊猫数据框:

Level 1 ID Level 2 ID Level 3 ID Level 4 ID Name Path
1 null null null Finance Finance
1 4 null null Reporting Finance > Reporting
1 4 5 null Tax Reporting Finance > Reporting > Tax Reporting

我想要做的是根据 Level [] ID 列添加或替换为具有 4 个 Level Name 列的 Level ID 列,如下所示:

Level 1 Name Level 2 Name Level 3 Name Level 4 Name Name Path
Finance null null null Finance Finance
Finance Reporting null null Reporting Finance > Reporting
Finance Reporting Tax Reporting null Tax Reporting Finance > Reporting > Tax Reporting

我会在 Path 列上使用分隔符,但在真实数据框中,有 ID 而不是名称。

我应该如何处理这个问题?

【问题讨论】:

  • “我会在 Path 列上使用分隔符,但在真实的数据框中,有 ID 而不是名称。”这是什么意思?您在路径栏中有类似“财务> 4 > 5”的内容吗?

标签: python pandas


【解决方案1】:

逻辑不清楚,特别是最终值的来源是什么?请参阅下面的两个不同选项。

假设来源是df['Name']
cols = df.filter(like='Level ').columns
names = df['Name'].values
mask = df[cols[:len(names)]].notna()

df[cols[:len(names)]] = mask.mul(names, axis=1).where(mask)

输出:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance        NaN            NaN         NaN        Finance                              Finance
1    Finance  Reporting            NaN         NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance > Reporting > Tax Reporting
如果您想从“路径”中提取
cols = df.filter(like='Level ').columns
names = df['Path'].str.split(' > ', expand=True)

df.loc[:, cols[:names.shape[1]]] = names.to_numpy()

输出:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance       None           None         NaN        Finance                              Finance
1    Finance  Reporting           None         NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance > Reporting > Tax Reporting

【讨论】:

  • df['Path'].str.split(' > ', expand=True).reindex(columns=range(4)).fillna('').rename(columns=lambda x: f'Level {x+1} Name')
  • @Corralien 我想到了这样的事情,但这取决于列的正确格式。但是,是的,这也应该有效;)
猜你喜欢
  • 2013-02-21
  • 1970-01-01
  • 1970-01-01
  • 2014-11-26
  • 1970-01-01
  • 1970-01-01
  • 2017-04-30
  • 1970-01-01
  • 2020-06-05
相关资源
最近更新 更多