为数据集生成所有排列答案

【问题标题】：Producing all permutations for a dataset为数据集生成所有排列
【发布时间】：2021-02-20 17:33:13
【问题描述】：

我有一个如下所示的数据框：

df1 = pd.DataFrame({'Gene':['TP53', 'COX5', 'P16'], 'test':[1,3,0], 'Healthy':[0,0,2]})

    Gene    test    Healthy
0   TP53    1       0
1   COX5    3       0
2   P16     0       2

我一直在尝试创建所有可能值的排列。想法是将第一个基因“TP53”及其“test”列中的值映射到其他所有基因，并记录“Healthy”列的值。

例如，最初 TP53 将映射到自身：TP53:TP53:1:0 然后 TP53 将从健康列映射到 COX5：TP53:COX5:1:0 其次是下一个基因：TP53:P16:1:2 接下来，将使用“测试”列中的值映射基因 COX5，以与“健康”列进行比较：COX5:TP53:3:0 然后：COX5:COX5:3:0

所以最终会产生下表：

All_combinations
TP53:TP53:1:0
TP53:COX5:1:0
TP53:P16:1:2
COX5:TP53:3:0
COX5:COX5:3:0
COX5:P16:3:2
P16:TP53:0:0
P16:COX5:0:0
P16:P16:0:2

我尝试了以下代码，但遇到了困难。

import pandas as pd
df1 = pd.DataFrame({'Gene':['TP53', 'COX5', 'P16'], 'test':[1,3,0], 'Healthy':[0,0,2]})
df2 = df1.transpose()
df2.columns = df2.iloc[0]
df2 = df2.iloc[1:]

from itertools import product
uniques = [df1[i].unique().tolist() for i in df1.iloc[:,[1,2]]]
pd.DataFrame(product(*uniques), columns = df2.iloc[:,])

真实的数据集有超过 32,000 行，所以快速运行的东西会很棒。感谢您的帮助

【问题讨论】：

请提供预期的MRE - Minimal, Reproducible Example。显示中间结果与预期结果的偏差。我们应该能够将您的代码块粘贴到文件中，运行它并重现您的问题。这也让我们可以在您的上下文中测试任何建议。 “有困难”不是问题规范。
您意识到 32,000 行的所有成对组合将为您提供一个包含超过 10 亿行的数据框...

标签： python pandas combinations permutation

【解决方案1】：

这段代码能解决你的问题吗？

import pandas as pd
df1 = pd.DataFrame({'Gene':['TP53', 'COX5', 'P16'], 'test':[1,3,0], 'Healthy':[0,0,2]})

# Create all the combinations as tuples. 
# Note that test is taken from gene1 but Healthy from gene2
# The enumerate is used to get the row number related to that gene
row_list = []
for i, gene1 in enumerate(df1.Gene):
    for j, gene2 in enumerate(df1.Gene):
        row_list.append((gene1, gene2, df1.iloc[i].test, df1.iloc[j].Healthy))

# Now create a new dataframe with the results
df2 = pd.DataFrame(row_list, columns=['Gene1', 'Gene2', 'test', 'Healthy'])

这会产生：

  Gene1 Gene2  test  Healthy
0  TP53  TP53     1        0
1  TP53  COX5     1        0
2  TP53   P16     1        2
3  COX5  TP53     3        0
4  COX5  COX5     3        0
5  COX5   P16     3        2
6   P16  TP53     0        0
7   P16  COX5     0        0
8   P16   P16     0        2

【讨论】：

【解决方案2】：

既然已经给出了pandas 解决方案。只是展示product 的工作原理

a=[1,3,0]
b=[0,0,2]
from itertools import product
list(product(*[a]+[b]))

[(1, 0), (1, 0), (1, 2), (3, 0), (3, 0), (3, 2), (0, 0), (0, 0), (0, 2)]

【讨论】：