【问题标题】:Highlight panda df errors based on conditions根据条件突出显示 panda df 错误
【发布时间】:2019-03-23 22:46:58
【问题描述】:

美好的一天 SO 社区,

我在尝试逐行突出显示我的 df 中的错误时遇到了问题。

reference_dict = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
dict = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
df = pd.DataFrame(data=dict) 

def highlight_rows(df):
  for i in df.index:
    if df.jobclass[i] in reference_dict['jobclass']:
      print(df.jobclass[i])
      return 'background-color: green'

df.style.apply(highlight_rows, axis = 1)

我收到错误: TypeError: ('字符串索引必须是整数', '发生在索引 0')

我希望得到的是我的 df,其中突出显示了在我的 reference_dict 中找不到的值。

任何帮助将不胜感激..干杯!

编辑:

x = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
d = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
df = pd.DataFrame(data=d) 
print(df)
def highlight_rows(s):
  ret = ["" for i in s.index]
  for i in df.index:
    if df.jobclass[i] not in x['jobclass']:
      ret[s.index.get_loc('Jobs')] = "background-color: yellow"
      return ret
df.style.apply(highlight_rows, axis = 1)

试过这个并突出显示整个列而不是我想要的特定行值.. =/

【问题讨论】:

    标签: python pandas dataframe highlight


    【解决方案1】:

    您可以使用merge 和参数indicator 查找未匹配的值,然后创建DataFrame 的样式:

    x = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
    d = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
    df = pd.DataFrame(data=d) 
    print (df)
      jobclass       Jobs
    0        A    Teacher
    1        C    Plumber
    2        A  Policeman
    

    详情

    print (df.merge(pd.DataFrame(x) , on='jobclass', how='left', indicator=True))
      jobclass     Jobs_x   Jobs_y     _merge
    0        A    Teacher  Teacher       both
    1        C    Plumber      NaN  left_only
    2        A  Policeman  Teacher       both
    

    def highlight_rows(s):
        c1 = 'background-color: yellow'
        c2 = '' 
    
        df1 = pd.DataFrame(x)
        m = s.merge(df1, on='jobclass', how='left', indicator=True)['_merge'] == 'left_only'
        df2 = pd.DataFrame(c2, index=s.index, columns=s.columns)
        df2.loc[m, 'Jobs'] = c1
        return df2
    
    df.style.apply(highlight_rows, axis = None)
    

    【讨论】:

    • 嗨耶兹瑞尔,谢谢!我基于您的解决方案已被接受。
    • @hakkonen - 欢迎您!也可以免费投票给我的解决方案。谢谢。
    【解决方案2】:

    祝你也有美好的一天!

    What i hope to get is my df with values not found in my reference_dict being highlighted.
    

    如果您正在查找要突出显示的 reference_dict 中的 not 值,您的意思是函数如下吗?

    def highlight_rows(df):
      for i in df.index:
        if df.jobclass[i] not in reference_dict['jobclass']:
          print(df.jobclass[i])
          return 'background-color: green'
    

    不管怎样,既然可以隔离行,为什么还要突出显示它们呢?您似乎想查看 df 中的所有作业类,而 reference_dict 中没有。

    import pandas as pd
    
    
    reference_dict = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
    
    data_dict = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
    
    
    
    ref_df = pd.DataFrame(reference_dict)
    df = pd.DataFrame(data_dict)
    
    outliers = df.merge(ref_df, how='outer', on='jobclass') # merge the two tables together, how='outer' includes jobclasses which the DataFrames do not have in common. Will automatically generate columns Jobs_x and Jobs_y once joined together because the columns have the same name
    outliers = outliers[ outliers['Jobs_y'].isnull() ] # Jobs_y is null when there is no matching jobclass in the reference DataFrame, so we can take advantage of that by filtering
    outliers = outliers.drop('Jobs_y', axis=1) # let's drop the junk column after we used it to filter for what we wanted
    
    print("The reference DataFrame is:")
    print(ref_df,'\n')
    
    print("The input DataFrame is:")
    print(df,'\n')
    
    print("The result is a list of all the jobclasses not in the reference DataFrame and what job is with it:")
    print(outliers)
    

    结果是:

    The reference DataFrame is:
      jobclass     Jobs
    0        A  Teacher
    1        B  Plumber 
    
    The input DataFrame is:
      jobclass       Jobs
    0        A    Teacher
    1        C    Plumber
    2        A  Policeman 
    
    The result is a list of all the jobclasses not in the reference DataFrame and what job is with it:
      jobclass   Jobs_x
    2        C  Plumber
    

    这可能是一个切线,但这是我会做的。我根本不知道您可以突出显示 pandas 中的行,很酷的技巧。

    【讨论】:

    • 谢谢 Jess,它很有用,但我的任务是创建一个突出显示值的函数。但是非常感谢您的回答!=))
    • 啊,好吧!很高兴你发现它很有用,希望你能找出突出显示的地方。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-10-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多