是否有任何 python 函数来检查列表理解中的 nan 值而不更改它答案

【问题标题】：Is there any python function to check for nan values in list comprehension without changing it是否有任何 python 函数来检查列表理解中的 nan 值而不更改它
【发布时间】：2019-08-29 07:29:18
【问题描述】：

我正在编写代码以从数据框的每个列中获取值并对其进行一些处理。每当有 NaN 值时，我都会遇到异常。我不想与 Nan 一起删除列。以前我通过简单地捕获异常来解决问题，但现在我无法像在这里使用列表推导一样做同样的事情。有人可以建议这样做的正确方法吗？以前我是这样解决的：

for index, row in df_work.iterrows():
        descrip = row['description']
        try:
            r = Rake()
            r.extract_keywords_from_text(descrip)
            key_words_dict_scores = r.get_word_degrees()
            row['Key_words'] = list(key_words_dict_scores.keys())
        except Exception as e:
            print(e)
            row['Key_words'] = ''

我想在这里做同样的事情：

df_work['specialties'] = [','.join(x) for x in df_work['specialties'].map(lambda x: x.lower().replace(' ','').split(',')).values]
    df_work['industry'] = [','.join(x) for x in df_work['industry'].map(lambda x: x.lower().replace(' ','').split(',')).values]
    df_work['type'] = [','.join(x) for x in df_work['type'].map(lambda x: x.lower().replace(' ','').split(',')).values]

我在上面的代码中得到这个错误：

'float' object has no attribute 'lower'

Specialties 列包含如下数据：

df_work.loc['TOTAL', 'specialties']

输出>>'Oil & Gas - Exploration & Production,Upstream,Refining,Trading,Shipping,Marketing,Energy,Crude Oil,Petroleum,Petrochemicals,Liquified Natural Gas,Renewable Energy,Drilling Engineering,Completion & Intervention Engineering,Geology,Geoscientists,IT'

type(df_work.loc['TOTAL', 'specialties'])

输出>>str

运行我上面的代码后的预期输出应该是：输出>>'oil&gas-exploration&production,upstream,refining,trading,shipping,marketing,energy,crudeoil,petroleum,petrochemicals,liquifiednaturalgas,renewableenergy,drillingengineering,completion&interventionengineering,geology,geoscientists,it'

type(df_work.loc['TOTAL', 'specialties'])

输出>>str

【问题讨论】：

是否可以添加一些示例数据，例如 3 行对于specialties 列？
已添加。请再次检查
你能检查一下我的解决方案吗？

标签： python-3.x pandas numpy dataframe nan

【解决方案1】：

这里可以使用 pandas 函数来处理 NaNs nice：

df_work['specialties'] = df_work['specialties'].str.lower().str.replace(' ','')

如果需要使用NaNs，请通过isinstance() 和if-else 语句对其进行测试：

df_work['specialties'] = (df_work['specialties']
        .map(lambda x: x.lower().replace(' ','') if isinstance(x, str) else x))

和列表理解解决方案：

df_work['specialties'] = [x.lower().replace(' ','') 
                          if isinstance(x, str) 
                          else x 
                          for x in df_work['specialties']]

示例：

df_work = pd.DataFrame({'specialties':['First spec, Sec spec','A Vb,ds RT', np.nan]})
print (df_work)
            specialties
0  First spec, Sec spec
1            A Vb,ds RT
2                   NaN

df_work['specialties'] = [x.lower().replace(' ','') 
                          if isinstance(x, str) 
                          else x 
                          for x in df_work['specialties']]
print (df_work)
         specialties
0  firstspec,secspec
1           avb,dsrt
2                NaN

【讨论】：

是的，它现在可以工作了。我得到了一个错误，但它是别的东西。谢谢:)