使用 def 函数映射数据框答案

【问题标题】：Mapping a dataframe with a def function使用 def 函数映射数据框
【发布时间】：2017-06-30 17:03:23
【问题描述】：

错误： AttributeError: 'DataFrame' object has no attribute 'map'

代码：我创建了一个函数来根据不同列中的值对数据框中的一些时间事件（行）进行分类。

def usage(x):
    if x['Dest']==x['Origin']: return 'round'
    elif x['Origin']==x['next_dest']:
        if x['Dest']==x['next_origin']: return 'perfectsym'
        else: return 'nonperfectsym'
    else: 'None'

有了这个，我希望能够使用地图功能来分类新列中的条目，如下所示：

All_data['usagetype'] = All_Data.map(usage)

但这不起作用。

感谢您的帮助。

【问题讨论】：

标签： python-3.x pandas anaconda

【解决方案1】：

解决方案：

在数据帧级别相当于map 是apply：

All_data['usagetype'] = All_Data.apply(usage, axis=1)

替代和评论：

但是对于这样的事情，一个相当简单的逐行条件，您可以使用np.where 获得更快的计算：

def  usage2(df):
    return np.where(df['Dest'] == df['Origin'], 'round',
                    np.where(df['Origin'] == df['next_dest'],
                             np.where(df['Dest'] == df['next_origin'],
                                      'perfectsym', 'nonperfectsym'),
                             None))

All_data['usagetype'] = usage2(All_Data)

在 1000 行上大约快一百倍：

df = pd.DataFrame(np.random.randint(0, 4, size=(1000, 4)),
                  columns=['Dest', 'Origin', 'next_dest', 'next_origin'])

%timeit usage2(df)
1000 loops, best of 3: 463 µs per loop

%timeit df.apply(usage, axis=1)
10 loops, best of 3: 46.1 ms per loop

我还建议删除 None 周围的引号，就像我在上面的 usage2 中所做的那样，除非您明确想要字符串“None”而不是 NaN 值。

【讨论】：

感谢您的帮助！所以，当我使用apply:KeyError: ('Dest', 'occurred at index Dest') 时出现了这个错误。但是，替代方法有效！
@Helk 很高兴它成功了。使用apply 时，您是否像我上面那样指定axis=1？这就是我让它默认为axis=0的错误，它会尝试遍历列，查找索引为Dest、Origin等的行。
哦，对了！我忘记指定轴了。 :)