【问题标题】:Combining Pandas values into membership groups将 Pandas 值组合到成员组中
【发布时间】:2018-03-22 23:29:22
【问题描述】:

我正在尝试将两个 Pandas 列中的数字关联到会员组。 这是我目前所拥有的:

import pandas as pd
df = pd.DataFrame({'A':[0, 1, 3, 4, 6, 7, 8, 8, 8, 9, 9, 9, 9, 9, 11, 12, 13, 14, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 21, 22, 24, 25, 26, 27, 28, 29, 29],
               'B':[1, 0, 4, 3, 7, 6, 112, 9, 114, 134, 135, 112, 8, 114, 14, 13, 12, 11, 16, 17, 18, 17, 15, 18, 19, 16, 18, 15, 19, 17, 16, 15, 19, 20, 20, 18, 17, 16, 19, 18, 22, 21, 25, 24, 27, 26, 29, 28, 30]})   

df = df.groupby('A')['B'].apply(lambda x: list(set(x))).reset_index()

^ 杰兹瑞尔的功劳

df['A']=df['A'].apply(lambda x : [x])
df_new=pd.DataFrame((df['A'] + df['B']),columns=["Combined"])
df_new["Combined"]=df_new["Combined"].sort_values().apply(lambda x: sorted(x))

将 A 列中的数字和 B 中分组的值组合并排序。

                       Combined
0                       [0, 1]
1                       [0, 1]
2                       [3, 4]
3                       [3, 4]
4                       [6, 7]
5                       [6, 7]
6             [8, 9, 112, 114]
7   [8, 9, 112, 114, 134, 135]
8                     [11, 14]
9                     [12, 13]
10                    [12, 13]
11                    [11, 14]
12            [15, 16, 17, 18]
13        [15, 16, 17, 18, 19]
14        [15, 16, 17, 18, 19]
15    [15, 16, 17, 18, 19, 20]
16        [16, 17, 18, 19, 20]
17                [18, 19, 20]
18                    [21, 22]
19                    [21, 22]
20                    [24, 25]
21                    [24, 25]
22                    [26, 27]
23                    [26, 27]
24                    [28, 29]
25                [28, 29, 30]

如何删除 df_new 中的重复列表。大概可以将列表转换为字符串值?

最重要的是,我想从原始 col_A 中获取每个值,并将其与它所属的组合列表中最具包容性的一个相关联。 因此,df 的 col_A 中的数字 8 应该与 df_new 中的 Combined 列的第 7 行相关联,该行包含数字 8 - [8, 9, 112, 114, 134, 135] 的最具包容性的列表。

感谢您的帮助

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    我建议通过将 DataFrame 转换为 numpy 矩阵,使用 np.unique 方法获取唯一列表矩阵,然后再转换回 DataFrame,如下所示:

    df_new["Combined"] = pd.DataFrame(np.unique(df_new.as_matrix()))
    
    #                              0
    # 0                       [0, 1]
    # 1                       [3, 4]
    # 2                       [6, 7]
    # 3             [8, 9, 112, 114]
    # 4   [8, 9, 112, 114, 134, 135]
    # 5                     [11, 14]
    # 6                     [12, 13]
    # 7             [15, 16, 17, 18]
    # 8         [15, 16, 17, 18, 19]
    # 9     [15, 16, 17, 18, 19, 20]
    # 10        [16, 17, 18, 19, 20]
    # 11                [18, 19, 20]
    # 12                    [21, 22]
    # 13                    [24, 25]
    # 14                    [26, 27]
    # 15                    [28, 29]
    # 16                [28, 29, 30]
    

    【讨论】:

      【解决方案2】:

      您可以转换为tuple,使用drop_duplicates,然后再转换回list

      之所以需要这样做是因为pandas 使用哈希表,它要求元素是不可变的。元组是不可变的,而列表则不是。

      res = df_new['Combined'].map(tuple).drop_duplicates().map(list)
      
      # 0                         [0, 1]
      # 2                         [3, 4]
      # 4                         [6, 7]
      # 6               [8, 9, 112, 114]
      # 7     [8, 9, 112, 114, 134, 135]
      # 8                       [11, 14]
      # 9                       [12, 13]
      # 12              [15, 16, 17, 18]
      # 13          [15, 16, 17, 18, 19]
      # 15      [15, 16, 17, 18, 19, 20]
      # 16          [16, 17, 18, 19, 20]
      # 17                  [18, 19, 20]
      # 18                      [21, 22]
      # 20                      [24, 25]
      # 22                      [26, 27]
      # 24                      [28, 29]
      # 25                  [28, 29, 30]
      # Name: Combined, dtype: object
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-08-03
        • 2016-02-12
        • 2016-03-27
        • 1970-01-01
        相关资源
        最近更新 更多