【问题标题】:Get node ancestors in a Pandas dataframe在 Pandas 数据框中获取节点祖先
【发布时间】:2019-11-13 00:06:27
【问题描述】:

我有一个如下所示的数据框:

name               parent_id       id
languages                  0        1
cyrillic script            1        2       
latin script               1        3
bulgarian                  2        4
russian                    2        5
czech                      3        6
polish                     3        7

我使用此命令从父 ID 获取父名称:

df['parent_name'] = df['parent_id'].map(df.set_index('id')['name'])
print(df)

name               parent_id       id            parent_name
russian                    2        5            cyrillic script
czech                      3        6            latin script
polish                     3        7            latin script

但是,我还想递归地获取每个节点的所有祖先的列表,例如这样:

name               parent_id       id            path
languages                  0        1            []
...
russian                    2        5            ['languages', 'cyrillic script']
czech                      3        6            ['languages', 'latin script']
polish                     3        7            ['languages', 'latin script']

在我的情况下,列表中祖先元素的顺序无关紧要。

有可能吗?

【问题讨论】:

    标签: python pandas numpy


    【解决方案1】:

    我建议使用递归函数来构造 id 路径。然后将其应用于数据框的 id 列。

    df= pd.DataFrame({'name': ['languages',
    'cyrillic script',
    'latin script',
    'bulgarian',
    'russian',
    'czech',
    'polish',],
    'parent_id': [0,    1,  1,  2,  2,  3,  3,],
    'id': [1,   2,  3,  4,  5,  6,  7]})
    
    dict_id = df.set_index('id').parent_id.to_dict()
    dict_name = df.set_index('id').name.to_dict()
    
    def get_parent_id(anc):
    
        anc = [anc] if not isinstance(anc, list) else anc
    
        if anc[-1] == 0:
            return anc
    
        else:
            parent = get_parent_id([dict_id[anc[-1]]])
            anc += parent
            return anc
    
    df['path_id'] = df.id.apply(get_parent_id)  # includes language id
    # get names and drop the language itself
    df['path'] = df.apply(lambda x: [dict_name[id_] for id_ in x.path_id
                                     if not (id_ == x.id or id_ == 0)], axis=1)
    
    Out[237]: 
                  name  parent_id  id       path_id                          path
    0        languages          0   1        [1, 0]                            []
    1  cyrillic script          1   2     [2, 1, 0]                   [languages]
    2     latin script          1   3     [3, 1, 0]                   [languages]
    3        bulgarian          2   4  [4, 2, 1, 0]  [cyrillic script, languages]
    4          russian          2   5  [5, 2, 1, 0]  [cyrillic script, languages]
    5            czech          3   6  [6, 3, 1, 0]     [latin script, languages]
    6           polish          3   7  [7, 3, 1, 0]     [latin script, languages]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-11-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多