检查DataFrame中的第n个值是否等于字符串中的第n个字符答案

【问题标题】：Check if nth value in DataFrame is equal to nth character in a string检查DataFrame中的第n个值是否等于字符串中的第n个字符
【发布时间】：2020-06-15 16:29:06
【问题描述】：

我有一个 df：

df =
     c1  c2   c3   c4  c5
  0  K   6    nan  Y   V
  1  H   nan  g    5   nan
  2  U   B    g    Y   L

还有一个字符串

s = 'HKg5'

我想返回 s[0]=c1 的值，s[1]=c2 的值，..... + 在某些情况下 s[i]=nan 的行。

例如上面df中的第1行与字符串匹配

    row 1=
           c1  c2   c3   c4  c5
        1  H   nan  g    5   nan
                                                match=True,   regardless of s[1,4]=nan
     s   = H   K    g    5

而且字符串长度是动态的，所以我的 df cols 高于 c10

我正在使用 df.apply，但我无法弄清楚。我想写一个函数传给df.apply，同时传递字符串。

感谢您的帮助！

克里斯的回答输出

  df=  
        c1  c2  c3  c4  c5 
     0  K   6  NaN  Y   V
     1  H  NaN  g   5  NaN
     2  U   B   g   Y   L

  s = 'HKg5'
  s1 = pd.Series(list(s), index=[f'c{x+1}' for x in range(len(s))])
  df.loc[((df == s1) | (df.isna())).all(1)]

输出

  `c1  c2  c3  c4  c5`

【问题讨论】：

标签： python pandas numpy data-structures data-science

【解决方案1】：

从您的字符串创建一个助手Series 并使用布尔逻辑进行过滤：

s1 = pd.Series(list(s), index=[f'c{x+1}' for x in range(len(s))])

# print(s1)    
# c1    H
# c2    K
# c3    g
# c4    5
# dtype: object

逻辑是 df 等于 (==) 这个值 OR (|) 是 nan (isna)
沿轴 1 使用 all 返回所有值为 True 的行

df.loc[((df == s1) | (df.isna())).all(1)]

[出]

  c1   c2 c3 c4   c5
1  H  NaN  g  5  NaN

所以，作为一个函数，你可以这样做：

def df_match_string(frame, string):
    s1 = pd.Series(list(string), index=[f'c{x+1}' for x in range(len(string))])
    return ((frame == s1) | (frame.isna())).all(1)

df_match_string(df, s)

[出]

0    False
1     True
2    False
dtype: bool

更新

我无法使用提供的示例重现您的问题。我的猜测是您的 DataFrame 中的某些值可能有前导/尾随空格？

在尝试上述解决方案之前，请尝试以下预处理步骤：

for col in df:
    df[col] = df[col].str.strip()

【讨论】：

你好，我这里有一个警告：-----> FutureWarning: elementwise comparison failed;而是返回标量，但将来将执行元素比较 result = method(y) 。返回不正确
看起来只是一个警告，由numpy 中的错误引起，请查看此答案here。如果返回不正确，您能否提供一个未按预期工作的可重现示例 - 具有预期结果？
请检查我编辑了我的原始帖子并包含了你的输出，它返回一个空的 df @Chris A
我无法复制这个问题，这个例子的代码对我有用。我唯一能想到的可能是你们中的一些列有前导或尾随空格..？例如c1处的1行中的值实际上是"H "（注意H后面的空格）