Python：如何遍历数据框中的一系列列，检查特定值并将列名存储在列表中答案

【问题标题】：Python: How to iterate through a range of columns in a dataframe, check for specific values and store column name in a listPython：如何遍历数据框中的一系列列，检查特定值并将列名存储在列表中
【发布时间】：2020-01-02 20:29:14
【问题描述】：

我正在尝试遍历数据框中的一系列列并检查每一行中的特定值。这些值应该与我的列表匹配。如果在我的列表的每一行中都有匹配的值，那么第一个匹配的列名应该附加到我的新列表中。怎样才能做到这一点？我尝试了以下 for 循环，但无法正确执行。

我查看了一些示例，但找不到我想要的。

iterating through a column in dataframe and creating a list with name of the column + str

How to get the column name for a specific values in every row of a dataframe


import pandas as pd

random = {
        'col1': ['45c','5v','27','k22','wh','u5','36'],
        'col2': ['abc','bca','cab','bac','cab','aab','ccb'],
        'col3': ['xyz','zxy','yxz','zzy','yyx','xyx','zzz'],
        'col4': ['52','75c','k22','d2','3n','4b','cc'],
        'col5': ['tuv','vut','tut','vtu','uvt','uut','vvt'],
        'col6': ['la3','pl','5v','45c','3s','k22','9i']
        }

df = pd.DataFrame(random)

"""
Only 1 value from this list should match with the values in each row of the df
i.e if '45c' is in row 3, then it's a match. place the name of column where '45c' is found in the df in the new list
"""
list = ['45c','5v','d2','3n','k22',]

"""
empty list that should be populated with df column names if there is a single match
"""
rand = []
for row in df.iloc[:,2:5]:
    for x in row:
        if df[x] in list:
            rand.append(df[row][x].columns)
            break

print(rand)

#this is what my df looks like when I print it
  col1 col2 col3 col4 col5 col6
0  45c  abc  xyz   52  tuv  la3
1   5v  bca  zxy  75c  vut   pl
2   27  cab  yxz  k22  tut   5v
3  k22  bac  zzy   d2  vtu  45c
4   wh  cab  yyx   3n  uvt   3s
5   u5  aab  xyx   4b  uut  k22
6   36  ccb  zzz   cc  vvt   9i

我希望得到的输出如下：

rand = ['col1','col4','col1','col6']

【问题讨论】：

标签： python python-3.x pandas for-loop nested-loops

【解决方案1】：

首先将所有值与DataFrame.isin 进行比较，然后将第一个匹配值的列与DataFrame.idxmax 进行比较，但是因为如果不匹配，则返回第一列与DataFrame.any 一起添加条件以进行测试：

L = ['45c','5v','d2','3n','k22']
m = df.isin(L)
out = np.where(m.any(1), m.idxmax(axis=1), 'no match').tolist()
print (out)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6', 'no match']

如果只需要匹配的值：

out1 = m.idxmax(axis=1)[m.any(1)].tolist()
print (out1)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']

详情：

print (m)
    col1   col2   col3   col4   col5   col6
0   True  False  False  False  False  False
1   True  False  False  False  False  False
2  False  False  False   True  False   True
3   True  False  False   True  False   True
4  False  False  False   True  False  False
5  False  False  False  False  False   True
6  False  False  False  False  False  False

循环解决是可能的，但是not recommended:

rand = []
for i, row in df.iterrows():
    for x in row:
        if x in L:
            rand.append(i)
print(rand)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']

【讨论】：

谢谢！这正是我需要的:)