【问题标题】:Creating a new column in a dataframe based on the result of the addition of three others根据添加其他三个的结果在数据框中创建新列
【发布时间】:2018-01-27 06:31:10
【问题描述】:

我已经生成了以下代码:

data['Customer_segment'] = np.where(((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=5,1),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>5 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=8,2),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>8 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=11,3),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>11 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=14,4),5)

我收到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

非常感谢帮助以达到最佳解决方案,我觉得我正在尝试做的那个可能不是最佳的。

输入示例如下:

MOVC % segment  order_size_seg  Order frequency segment
1                      2                 3
5                      2                 1
5                      5                 5

我正在尝试根据对每一行求和的结果添加一列,如下所示:

如果 3-5 那么 1 如果 6-8 那么 2 如果 9-11 那么 3 如果 12-14 那么 4 如果 15+ 则 5

真的会帮助解决这个问题

【问题讨论】:

  • 每个条件都有问题你错过了() - () &amp; () &amp;()...

标签: python pandas conditional where


【解决方案1】:

试试pd.cut怎么样

df = pd.DataFrame([[1,2,3],[5,2,1],[5,5,5]], columns=['M','O','F'])

pd.cut(df.T.sum(),[5, 8, 11, 14,np.inf],labels=[1,2,3,4]) 

Out[1180]: 
0    1
1    1
2    4
dtype: category

【讨论】:

    【解决方案2】:

    我认为您需要多个np.where 一个numpy.select

    #only once sum values 
    a = data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment']
    #conditions with ()
    m1 = a<=5
    m2 = (a>5) & (a<=8)
    m3 = (a>8) & (a<=11)
    m4 = (a>11) & (a<=14)
    
    data['Customer_segment'] = np.select([m1, m2, m3, m4],[1,2,3,4], default=5)
    

    另一种解决方案是使用cut:

    bins = [-np.inf,5,8,11,14, np.inf]
    labels = [1,2,3,4,5]
    
    data['Customer_segment'] = pd.cut(df['B'], bins=bins, labels=labels)
    

    【讨论】:

    • 我们的想法一样:),会删除我的答案,为你点赞~
    【解决方案3】:

    query 方法怎么样?它似乎有非常强大的语法:

    import pandas as pd
    d = pd.DataFrame([[1,2,3],[5,2,1],[5,5,5]], columns=['M','O','F'])
    d.query("5 < M+O+F < 8")
    
    Out[4]: 
       M  O  F
    1  5  2  1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-01-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-10-03
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多