【问题标题】:Python pandas : How to create a list by matching element in one column another columnPython pandas:如何通过在另一列中匹配元素来创建列表
【发布时间】:2019-06-10 19:16:49
【问题描述】:

带有 :

的数据框
     Locations      Locations 
        1              2
        1              3
        2              7
        2              8
        7              11

位置是成对的,例如,位置 1 的鸟会飞到 2,但它们也可以飞到 3。然后在位置 2,它们会飞到位置 7,然后是 11。

我想创建列表,我可以将这些对链接在一起,没有重复的元素,以一种有效的方式。

预期样本输出:

     [1,2,7,11]
     [1,3]
     [2,8]

【问题讨论】:

  • networkx

标签: python pandas list dataframe linked-list


【解决方案1】:

创建一个列表字典来表示图形

g = {}
for _, l0, l1 in df.itertuples():
    g.setdefault(l0, []).append(l1)

print(g)

{1: [2, 3], 2: [7, 8], 7: [11]}

然后定义一个递归函数来遍历图

def paths(graph, nodes, path=None):
    if path is None:
        path = []

    for node in nodes:
        new_path = path + [node]
        if node not in graph:
            yield new_path
        else:
            yield from paths(graph, graph[node], new_path)

roots = g.keys() - set().union(*g.values())

p = [*paths(g, roots)]
print(*p, sep='\n')

[1, 2, 7, 11]
[1, 2, 8]
[1, 3]

【讨论】:

    【解决方案2】:

    您可能需要使用DiGraph from networkx

    import networkx as nx
    
    G=nx.from_pandas_edgelist(df,source='Locations',
                                       target='Locations.1',edge_attr=True,
                                       create_using=nx.DiGraph())
    roots = list(v for v, d in G.in_degree() if d == 0)
    leaves = list(v for v, d in G.out_degree() if d == 0)
    
    [nx.shortest_path(G, x, y) for y in leaves for x in roots]
    
    Out[58]: [[1, 3], [1, 2, 8], [1, 2, 7, 11]]
    

    【讨论】:

    • 不错的答案@WeNYoBen
    【解决方案3】:

    所以我找到了这种方法来解决您的问题,而无需涉及任何图表。 但是,如果您以后想使用它,则必须使用数据框的副本。 并且您的数据必须像在您的示例中一样进行排序。

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame(columns=["loc1","Loc2"],data=[[1,2],[1,3],[2,7],[2,8],[7,11]])
    
    res = []
    n = -1
    m = -1
    x = 0
    for i in df.values:
        if(x in df.index): ###  test wether i has already been deleted
            res.append(i.tolist()) ### saving the value
    
            m = m +1  ###        m is for later use as index of res
            tmp = i[1]
            for j in df.values:
                n = n +1       ### n is the index of the df rows
                if(j[0] == tmp):
                    res[m].append(j[1])
                    df = df.drop(df.index[n])   ### deleting the row from which the value was taken
                    tmp = res[m][len(res[m])-1]
                    n = n -1
    
            n = -1
        x = x+1
    print(res)
    
    [[1, 2, 7, 11], [1, 3], [2, 8]]
    

    我知道它不是最好看的,但它确实有效。

    【讨论】:

      【解决方案4】:

      这可能比您要求的要多,但这个问题很适合使用 Networkx 绘制的图表。您可以在数据框定义的有向图中搜索每个节点(位置)之间的所有简单路径:

      import networkx as nx
      from itertools import combination
      
      # Create graph from dataframe of pairs (edges)
      G = nx.DiGraph()
      G.add_edges_from(df.values)
      
      # Find paths
      paths = []
      for pair in combinations(G.nodes(), 2):
          paths.extend(nx.all_simple_paths(G, source=pair[0], target=pair[1]))
          paths.extend(nx.all_simple_paths(G, source=pair[1], target=pair[0]))
      

      paths:

      [[1, 2],
       [1, 3],
       [1, 2, 7],
       [1, 2, 8],
       [1, 2, 7, 11],
       [2, 7],
       [2, 8],
       [2, 7, 11],
       [7, 11]]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-12-12
        • 2019-01-05
        • 1970-01-01
        • 2020-07-18
        • 2021-06-13
        • 2012-08-30
        相关资源
        最近更新 更多