【问题标题】:How to find all combinations of DataFrame rows?如何找到 DataFrame 行的所有组合?
【发布时间】:2020-12-24 13:02:44
【问题描述】:

如果这个问题与本论坛中其他人提出的问题相似,我很抱歉,但我找不到足够相似的问题。我有一个包含 9 列和 3 行的 df,我想找到这些行之间的所有可能组合。我曾尝试使用 itertools 包中的组合,但我似乎无法使其工作。 我想要的输出将是所有可能组合的列表。谢谢,如果它与其他问题相似,我们很抱歉。

import pandas as pd
from itertools import combinations

df1 = pd.DataFrame({"Main1": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main2": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main3": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main4": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main5": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main6": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main7": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main8": ["Outcome1", "Outcome2", "Outcome3"],
                    "Main9": ["Outcome1", "Outcome2", "Outcome3"]})

    Main1   Main2   Main3   Main4   Main5   Main6   Main7   Main8   Main9
0   Outcome1    Outcome1    Outcome1    Outcome1    Outcome1    Outcome1    Outcome1    Outcome1    Outcome1
1   Outcome2    Outcome2    Outcome2    Outcome2    Outcome2    Outcome2    Outcome2    Outcome2    Outcome2
2   Outcome3    Outcome3    Outcome3    Outcome3    Outcome3    Outcome3    Outcome3    Outcome3    Outcome3

all_combinations = list(combinations(df1, 3))

编辑:较小的样本和所需的输出:

df1 = pd.DataFrame({"Main1": ["Outcome1", "Outcome2", "Outcome3"], "Main2": ["Outcome1", "Outcome2", "Outcome3"]}) 

想要的输出是这样的:

[["Outcome1","Outcome1"], ["Outcome1","Outcome2"], ["Outcome1","Outcome3"], ["Outcome2","Outcome1"], ["Outcome2","Outcome2"], ["Outcome2","Outcome3"], ["Outcome3","Outcome1"], ["Outcome3","Outcome2"], ["Outcome3","Outcome3"]] 

【问题讨论】:

  • 你的预期输出是什么?
  • 嗨!所有结果组合的列表。示例:第一个组合将仅是每行中的 Outcome1,第二个组合将仅是每行中的 Outcome2,第三个组合将是第一行中的 Outcome2 和每隔一行中的 Outcome1 等等。抱歉不清楚
  • 也许发布一个较小的样本并显示该样本的结果。
  • 好主意。 @RichieV 我用更小的样本和示例输出更新了帖子。谢谢!
  • 谢谢,您似乎希望每个输出在您的 df 中包含与列一样多的项目...但在您的第一个示例中,您试图获得 3 个项目的输出(与 df 中的行数一样多) ……这是什么?或者,如果您发布数据集中的真实样本,可能会更清楚。

标签: python pandas combinations


【解决方案1】:

您正在寻找列表自身的笛卡尔积。

from itertools import product

options = ['Outcome1', 'Outcome2', 'Outcome3']

result = product(options, options)
print(*result, sep='\n')

输出

('Outcome1', 'Outcome1')
('Outcome1', 'Outcome2')
('Outcome1', 'Outcome3')
('Outcome2', 'Outcome1')
('Outcome2', 'Outcome2')
('Outcome2', 'Outcome3')
('Outcome3', 'Outcome1')
('Outcome3', 'Outcome2')
('Outcome3', 'Outcome3')

【讨论】:

    【解决方案2】:

    使用列表理解

    >>> [[i,j] for i in df1.Main1 for j in df1.Main2]
    [['Outcome1', 'Outcome1'], ['Outcome1', 'Outcome2'], ['Outcome1', 'Outcome3'], [
    'Outcome2', 'Outcome1'], ['Outcome2', 'Outcome2'], ['Outcome2', 'Outcome3'], ['O
    utcome3', 'Outcome1'], ['Outcome3', 'Outcome2'], ['Outcome3', 'Outcome3']]
    

    【讨论】:

      【解决方案3】:

      使用 itertools 产品

      对于较小的数据框

      import pandas as pd
      from itertools import product
      
      # Define dataframe
      df1 = pd.DataFrame({"Main1": ["Outcome1", "Outcome2", "Outcome3"], "Main2": ["Outcome1", "Outcome2", "Outcome3"]}) 
      
      # Take product of row values
      # Once transposed, all the columns are the rows are the same
      # We take the value of first row, and repeat to get the desired product
      all_combinations = list(product(np.transpose(df1.values)[0], repeat=2))
      
      # Show result
      from pprint import pprint as pp
      pp(all_combinations)
      

      输出

      [('Outcome1', 'Outcome1'),
       ('Outcome1', 'Outcome2'),
       ('Outcome1', 'Outcome3'),
       ('Outcome2', 'Outcome1'),
       ('Outcome2', 'Outcome2'),
       ('Outcome2', 'Outcome3'),
       ('Outcome3', 'Outcome1'),
       ('Outcome3', 'Outcome2'),
       ('Outcome3', 'Outcome3')]
      

      对于原始数据帧

      df1 = pd.DataFrame({"Main1": ["Outcome1", "Outcome2", "Outcome3"],
                          "Main2": ["Outcome1", "Outcome2", "Outcome3"],
                          "Main3": ["Outcome1", "Outcome2", "Outcome3"],
                          "Main4": ["Outcome1", "Outcome2", "Outcome3"],
                          "Main5": ["Outcome1", "Outcome2", "Outcome3"],
                          "Main6": ["Outcome1", "Outcome2", "Outcome3"],
                          "Main7": ["Outcome1", "Outcome2", "Outcome3"],
                          "Main8": ["Outcome1", "Outcome2", "Outcome3"],
                          "Main9": ["Outcome1", "Outcome2", "Outcome3"]})
      all_combinations = list(product(np.transpose(df1.values)[0], repeat=3))
      
      pp(all_combinations)
      

      输出

      [('Outcome1', 'Outcome1', 'Outcome1'),
       ('Outcome1', 'Outcome1', 'Outcome2'),
       ('Outcome1', 'Outcome1', 'Outcome3'),
       ('Outcome1', 'Outcome2', 'Outcome1'),
       ('Outcome1', 'Outcome2', 'Outcome2'),
       ('Outcome1', 'Outcome2', 'Outcome3'),
       ('Outcome1', 'Outcome3', 'Outcome1'),
       ('Outcome1', 'Outcome3', 'Outcome2'),
       ('Outcome1', 'Outcome3', 'Outcome3'),
       ('Outcome2', 'Outcome1', 'Outcome1'),
       ('Outcome2', 'Outcome1', 'Outcome2'),
       ('Outcome2', 'Outcome1', 'Outcome3'),
       ('Outcome2', 'Outcome2', 'Outcome1'),
       ('Outcome2', 'Outcome2', 'Outcome2'),
       ('Outcome2', 'Outcome2', 'Outcome3'),
       ('Outcome2', 'Outcome3', 'Outcome1'),
       ('Outcome2', 'Outcome3', 'Outcome2'),
       ('Outcome2', 'Outcome3', 'Outcome3'),
       ('Outcome3', 'Outcome1', 'Outcome1'),
       ('Outcome3', 'Outcome1', 'Outcome2'),
       ('Outcome3', 'Outcome1', 'Outcome3'),
       ('Outcome3', 'Outcome2', 'Outcome1'),
       ('Outcome3', 'Outcome2', 'Outcome2'),
       ('Outcome3', 'Outcome2', 'Outcome3'),
       ('Outcome3', 'Outcome3', 'Outcome1'),
       ('Outcome3', 'Outcome3', 'Outcome2'),
       ('Outcome3', 'Outcome3', 'Outcome3')]
      ​
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-07-31
        • 2019-05-23
        • 1970-01-01
        • 2017-08-13
        • 1970-01-01
        • 1970-01-01
        • 2011-05-31
        相关资源
        最近更新 更多