Numpy float 对象在运行函数时不可迭代答案

【问题标题】：Numpy float object not iterable while running a functionNumpy float 对象在运行函数时不可迭代
【发布时间】：2021-04-16 11:19:51
【问题描述】：

我正在尝试根据一些比较条件将名称分配给集群，但是我收到错误消息，指出 numpy 浮点对象不可迭代。此外，我不希望对数据集进行子集化，如下所示，从 df 到 df1 再将其连接起来。下面是代码：

import pandas as pd
df = pd.DataFrame({'cluster':[0, 1, 2, 9999], 'earlypc':[88.943,4.034,6.839,0.488],'C':[3.491,8.306,75.329,34.5],'D':[14.548,87.66,17.832,65.012]})
df1=df[df['cluster']!=999]
def cluster(a,b,c,d):
    if(max(b)==b):
        return 'high'
    elif (max(c)==c):
         return 'low'
    elif (max(d)==d):
        return 'medium'
    else: return 'medium'

df1['Vendor_Segmentation']=df1.apply(lambda x:cluster(x['cluster'],x['earlypc'],x['C'],x['D']),axis=1)

TypeError: 'numpy.float64' object is not iterable

【问题讨论】：

标签： python pandas

【解决方案1】：

我赞同上述观点，不使用apply，并提供numpy 包中的两个替代方案，专为像您这样的情况而设计：

import numpy as np
import pandas as pd

numpy.select

# Specify the conditions
conditions = [ 
    (df1['earlypc'] == df1['earlypc'].max()),
    (df1['C'] == df1['C'].max()),
    (df1['D'] == df1['D'].max())     
    ]

# What each condition should return
choices =['high','low','medium']

# Return the array as a column
df1['Vendor_segmentation'] = np.select(conditions, choices,default='medium')

2numpy.where

df1['Vendor_segmentation'] = np.where(df1['earlypc'].eq(df1['earlypc'].max()),'high',
                              np.where(df1['C'].eq(df1['C'].max()),'low',
                              np.where(df1['D'].eq(df1['D'].max()),'medium',
                                                                   'medium')))

打印：

Out[531]: 

   cluster  earlypc       C       D Vendor_segmentation
0        0   88.943   3.491  14.548                high
1        1    4.034   8.306  87.660              medium
2        2    6.839  75.329  17.832                 low
3     9999    0.488  34.500  65.012              medium

Series.eq 或 == 是等价的。

【讨论】：

【解决方案2】：

完全不使用 apply 怎么样：

df1['Vendor_Segmentation'] = 'medium'
df1.loc[df1.earlypc==df1.earlypc.max(), "Vendor_Segmentation"] = 'high'
df1.loc[df1.C==df1.C.max(), "Vendor_Segmentation"] = 'low'
df1.loc[df1.D==df1.D.max(), "Vendor_Segmentation"] = 'medium'

这给出了预期的结果。

    cluster earlypc C   D        Vendor_Segmentation
0   0       88.943  3.491   14.548      high
1   1       4.034   8.306   87.660      medium
2   2       6.839   75.329  17.832      low
3   9999    0.488   34.500  65.012      medium

【讨论】：