【问题标题】:Pair-wise intersection set operation between column values in a data frame数据框中列值之间的成对交集操作
【发布时间】:2021-08-02 22:09:39
【问题描述】:

我有一个包含一列的数据框。此列中的每个值都是一个列表。例如,

     A
0   [1, 3, 4]
1   [43, 1, 42]
2   [50, 3]

我想在每个列表之间执行集合交集操作以找到公共元素并生成如下数据框。

    0           1           2 
0   [1, 2, 3]   [1]         [3]
1   [1]         [43, 1, 42] []
2   [3]         []          [50, 3]

有没有一种优雅的方式来做到这一点而不是循环?

【问题讨论】:

    标签: python pandas dataframe set set-intersection


    【解决方案1】:

    我们可以将apply设置为将A中的所有值转换为set然后broadcast设置交集:

    import pandas as pd
    
    df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})
    
    # Convert to set
    a = df['A'].apply(set).values
    # Broadcast set intersection
    new_df = pd.DataFrame(a[:, None] & a)
    

    new_df:

               0            1        2
    0  {1, 3, 4}          {1}      {3}
    1        {1}  {1, 42, 43}       {}
    2        {3}           {}  {50, 3}
    

    如果需要,或者np.vectorize可以用来转换成list(也可以用来转换成set而不是apply):

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})
    
    # Convert to set (using vectorize instead of apply):
    a = np.vectorize(set, otypes=['O'])(df['A'])
    # Broadcast set intersection and convert back to list
    new_df = pd.DataFrame(
        np.vectorize(list, otypes=['O'])(a[:, None] & a)
    )
    

    new_df:

               0            1        2
    0  [1, 3, 4]          [1]      [3]
    1        [1]  [1, 42, 43]       []
    2        [3]           []  [50, 3]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-03-16
      • 1970-01-01
      • 1970-01-01
      • 2017-06-17
      • 2020-06-18
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多