【发布时间】:2020-10-24 13:30:15
【问题描述】:
我有两个数据框 df2,其中包含付款统计信息(客户支付一定债务的概率)和 df3,其中包含新客户数据。
import pandas as pd
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'],
'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
'Client Number':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
'DebtPaid':[0.8,0.1,0.5,0.30,0,0.2,0.4,1,0.60,1,0.5,0.2,0,0.3,0,0,0.2,0,0.1,0.70,0.5,0.1]}
df = pd.DataFrame(data=d)
df2=df.groupby(['City','Card','Colateral'])['DebtPaid'].\
value_counts(bins=[-0.001,0,0.25,0.5,0.75,1,1.001,2],normalize=True)
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'],
'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
'Client Number':[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],
'Total Debt':[100,240,200,1000,50,20,345,10,600,40,50,20,100,30,100,600,200,200,150,700,50,120]}
df3 = pd.DataFrame(data=d)
我想计算客户将支付的估计金额。 前任: 如果客户来自里斯本,持有 Visa 和 Colateral,则它有 0.333333 的 0% 债务变化,0.3333% 的债务变化 ]0-25%] 和 0,3333% 的支付 ]0,75-1] % 的债务。 因此,如果该客户的债务为 100,则预期值应来自
[(0,33 * 0 * 100)+(0 * 0 * 100)+(0,33 * 0,75 * 100] 到 [(0,33 * 0 * 100 + 0,33 * 0, 25 * 100+0,33 * 1 * 100)。
所以这个客户将支付 24.75 欧元到 41.25 欧元。
然后为所有其他客户端计算。
关于如何解决这个问题的任何想法?
【问题讨论】:
标签: python python-3.x pandas dataframe multi-index