【问题标题】:Search and process data from multi-index DataFrames从多索引 DataFrame 中搜索和处理数据
【发布时间】:2020-10-24 13:30:15
【问题描述】:

我有两个数据框 df2,其中包含付款统计信息(客户支付一定债务的概率)和 df3,其中包含新客户数据。

import pandas as pd

d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
     'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
     'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
     'Client Number':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
     'DebtPaid':[0.8,0.1,0.5,0.30,0,0.2,0.4,1,0.60,1,0.5,0.2,0,0.3,0,0,0.2,0,0.1,0.70,0.5,0.1]}

df = pd.DataFrame(data=d)

df2=df.groupby(['City','Card','Colateral'])['DebtPaid'].\
           value_counts(bins=[-0.001,0,0.25,0.5,0.75,1,1.001,2],normalize=True)
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
     'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
     'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
     'Client Number':[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],
     'Total Debt':[100,240,200,1000,50,20,345,10,600,40,50,20,100,30,100,600,200,200,150,700,50,120]}

df3 = pd.DataFrame(data=d)

我想计算客户将支付的估计金额。 前任: 如果客户来自里斯本,持有 Visa 和 Colateral,则它有 0.333333 的 0% 债务变化,0.3333% 的债务变化 ]0-25%] 和 0,3333% 的支付 ]0,75-1] % 的债务。 因此,如果该客户的债务为 100,则预期值应来自

[(0,33 * 0 * 100)+(0 * 0 * 100)+(0,33 * 0,75 * 100] 到 [(0,33 * 0 * 100 + 0,33 * 0, 25 * 100+0,33 * 1 * 100)。

所以这个客户将支付 24.75 欧元到 41.25 欧元。

然后为所有其他客户端计算。

关于如何解决这个问题的任何想法?

【问题讨论】:

    标签: python python-3.x pandas dataframe multi-index


    【解决方案1】:

    你可以这样做:

    #First merge dataframes
    df_out = df2.rename('Prob').reset_index().merge(df3, on=['City', 'Card', 'Colateral'])
    
    #Use the right and left attributes of pd.Interval
    df_out['lower'] = [x.left for x in df_out['DebtPaid']]
    df_out['upper'] = [x.right for x in df_out['DebtPaid']]
    
    #Calculate lower and upper partial prices
    df_out['l_partial'] = df_out[['lower', 'Prob', 'Total Debt']].prod(axis=1)
    df_out['u_partial'] = df_out[['upper', 'Prob', 'Total Debt']].prod(axis=1)
    
    #Sum partial prices to get lower and upper price grouped on Client Number
    df_out.groupby('Client Number')[['l_partial', 'u_partial']]\
          .agg(lower_price=('l_partial', 'sum'),
               upper_price=('u_partial', 'sum')).clip(0,np.inf)
    

    输出:

                  lower_price  upper_price
    Client Number                          
    23               37.500000    62.500000
    24                0.000000    15.000000
    25               50.000000   100.000000
    26              187.000000   375.000000
    27                0.000000     3.125000
    28                0.000000     5.000000
    29              143.750000   230.000000
    30                2.493333     4.166667
    31              225.000000   375.000000
    32               16.666667    26.666667
    33               18.750000    31.250000
    34                7.500000    12.500000
    35                0.000000     6.250000
    36                7.500000    15.000000
    37               18.700000    37.500000
    38                0.000000    37.500000
    39                0.000000    50.000000
    40               49.866667    83.333333
    41               37.400000    62.500000
    42              130.900000   262.500000
    43               20.833333    33.333333
    44               22.440000    45.000000
    

    【讨论】:

      猜你喜欢
      • 2018-02-21
      • 2012-07-11
      • 1970-01-01
      • 2018-02-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-09-19
      • 1970-01-01
      相关资源
      最近更新 更多