【问题标题】:Recurse Over Excel File To Find Top Level Item From Tree Structure递归Excel文件以从树结构中查找顶级项目
【发布时间】:2018-08-26 18:43:21
【问题描述】:

我正在尝试对数据集进行递归以找到最高级别的项目,即没有父项的项目。

结构如下:

╔════════════╦════════════╗
║    Item    ║  Material  ║
╠════════════╬════════════╣
║ 2094-00003 ║ MHY00007   ║
║ 2105-0001  ║ 2105-0002  ║
║ 2105-0002  ║ 2105-1000  ║
║ 2105-1000  ║ 2105-1003  ║
║ 2105-1003  ║ 7547-122   ║
║ 7932-00001 ║ 7932-00015 ║
║ 7932-00002 ║ 7932-00015 ║
║ 7932-00010 ║ MHY00007   ║
║ 7932-00015 ║ 7932-05000 ║
║ 7932-05000 ║ MHY00007   ║
╚════════════╩════════════╝

因此,例如,如果我选择 7547-122,该函数将返回 2105-0001。所以函数递归地沿着树向上,7547-122 -> 2105-1003 -> 2105-1000 -> ... -> 2105-0001。

当我运行我的代码时,我只能让它返回一个顶层,从 MHY00007 案例中可以看出,有时有多个顶层。如何返回任何给定材料具有的所有顶级列表?

我的代码:

import pandas as pd


class BillOfMaterials:

    def __init__(self, bom_excel_path):
        self.df = pd.read_excel(bom_excel_path)
        self.df = self.df[['Item', 'Material']]

    def find_parents(self, part_number):
        material_parent_search = self.df[self.df.Material == part_number]

        parents = list(set(material_parent_search['Item']))

        return parents

    def find_top_levels(self, parents):

        top_levels = self.__ancestor_finder_([parents])

        print(f'{parents} top level is {top_levels}')
        return {parents: top_levels}

    def __ancestor_finder_(self, list_of_items):

        for ancestor in list_of_items:
            print(f'Searching for ancestors of {ancestor}')
            ancestors = self.find_parents(ancestor)
            print(f'{ancestor} has ancestor(s) {ancestors}')

            if not ancestors:
                return ancestor
            else:
                highest_level = self.__ancestor_finder_(ancestors)
        return highest_level


BOM = BillOfMaterials(bom_excel_path="Path/To/Excel/File/BOM.xlsx")

ItemsToSearch = ['7547-122', 'MHY00007']

top_levels = []
for item in ItemsToSearch:
    top_levels.append(BOM.find_top_levels(item))

【问题讨论】:

    标签: python python-3.x pandas recursion tree


    【解决方案1】:

    pandas 数据帧上的递归将比使用 dict 慢。

    为了提高性能,我建议您创建一个字典并创建一个简单的函数来迭代循环您的树结构。下面是一个例子。

    import pandas as pd
    
    df = pd.DataFrame({'Item': ['2094-00003', '2105-0001', '2105-0002', '2105-1000',
                                '2105-1003', '7932-00001', '7932-00002', '7932-00010',
                                '7932-00015', '7932-05000'],
                       'Material': ['MHY00007', '2105-0002', '2105-1000', '2105-1003',
                                    '7547-122', '7932-00015', '7932-00015', 'MHY00007',
                                    '7932-05000', 'MHY00007']})
    
    parent_child = df.set_index('Item')['Material'].to_dict()
    child_parent = {v: k for k, v in parent_child.items()}
    
    def get_all_parents(x):
        while x in child_parent:
            x = child_parent[x]
            yield x
    
    def get_grand_parent(x):
        for last in get_all_parents(x):
            pass
        return last
    
    get_grand_parent('7547-122')
    # '2105-0001'
    

    【讨论】:

      【解决方案2】:

      是的,您可以递归地执行此操作,例如:

      import pandas as pd
      
      
      class BillOfMaterials:
      
          def __init__(self, bom_excel_path):
              self.df = pd.read_excel(bom_excel_path)
              self.df = self.df[['Item', 'Material']]
      
          def find_parents(self, part_number):
              return list(set(self.df[self.df.Material == part_number]['Item']))
      
          def find_top_levels(self, item):
              parents = self.find_parents(item)
              if not parents:
                  # there are no parent items => this item is a leaf
                  return [item]
              else:
                  # there are parent items => recursively find grandparents
                  grandparents = []
                  for parent in parents:
                      grandparents = grandparents + self.find_top_levels(parent)
                  return grandparents
      
      
      if __name__ == '__main__':
          BOM = BillOfMaterials(bom_excel_path="testdata.xlsx")
          ItemsToSearch = ['7547-122', 'MHY00007']
      
          for i in ItemsToSearch:
              print('')
              print('The top levels of ' + i + ' are: ')
              print(BOM.find_top_levels(i))
      

      注意self.find_top_levels(parent) 的递归调用。 这将给出输出

      The top levels of 7547-122 are: 
      ['2105-0001']
      
      The top levels of MHY00007 are: 
      ['2094-00003', '7932-00001', '7932-00002', '7932-00010']
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-06-17
        • 1970-01-01
        • 2017-08-02
        • 1970-01-01
        • 2012-10-22
        • 2017-06-07
        相关资源
        最近更新 更多