【问题标题】:Make this code run for several excel files使此代码针对多个 excel 文件运行
【发布时间】:2020-07-09 03:47:14
【问题描述】:

所以我想为几个excel文件运行这个脚本,所以我将导入几个excel文件而不是df3,并将所有结果合并到一个数据框中。

这是主要的代码示例


import pandas as pd

d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
     'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
     'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
     'Client Number':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
     'DebtPaid':[0.8,0.1,0.5,0.30,0,0.2,0.4,1,0.60,1,0.5,0.2,0,0.3,0,0,0.2,0,0.1,0.70,0.5,0.1]}

df = pd.DataFrame(data=d)

df2=df.groupby(['City','Card','Colateral'])['DebtPaid'].\
           value_counts(bins=[-0.001,0,0.25,0.5,0.75,1,1.001,2],normalize=True)

d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
     'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
     'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
     'Client Number':[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],
     'Total Debt':[100,240,200,1000,50,20,345,10,600,40,50,20,100,30,100,600,200,200,150,700,50,120]}

df3 = pd.DataFrame(data=d)

#First merge dataframes
df_out = df2.rename('Prob').reset_index().merge(df3, on=['City', 'Card', 'Colateral'])

#Use the right and left attributes of pd.Interval
df_out['lower'] = [x.left for x in df_out['DebtPaid']]
df_out['upper'] = [x.right for x in df_out['DebtPaid']]

#Calculate lower and upper partial prices
df_out['l_partial'] = df_out[['lower', 'Prob', 'Total Debt']].prod(axis=1)
df_out['u_partial'] = df_out[['upper', 'Prob', 'Total Debt']].prod(axis=1)

#Sum partial prices to get lower and upper price grouped on Client Number
final = df_out.groupby('Client Number')[['l_partial', 'u_partial']]\
      .agg(lower_price=('l_partial', 'sum'),
           upper_price=('u_partial', 'sum')).clip(0,np.inf)



w = (final['upper_price'].sum() + final['lower_price'].sum()) / 2 
y = 1000
z = ((w/y)-1)*100

d1 = {'1': [w,y,z],
     'Index':['Estimate','Real','Error']}
     
results = pd.DataFrame(data=d1).set_index('Index')
print(results)

这是我试图做的,以便在没有成功的情况下运行带有多个 excel 文件的脚本:

files = [1,2,3,4,5]

for x in files:
    df3 = pd.read_excel(str(x) + '.xlsx')

#First merge dataframes
    df_out = df2.rename('Prob').reset_index().merge(df3, on=['City', 'Card', 'Colateral'])

#Use the right and left attributes of pd.Interval
    df_out['lower'] = [x.left for x in df_out['DebtPaid']]
    df_out['upper'] = [x.right for x in df_out['DebtPaid']]

#Calculate lower and upper partial prices
    df_out['l_partial'] = df_out[['lower', 'Prob', 'Total Debt']].prod(axis=1)
    df_out['u_partial'] = df_out[['upper', 'Prob', 'Total Debt']].prod(axis=1)

#Sum partial prices to get lower and upper price grouped on Client Number
    final = df_out.groupby('Client Number')[['l_partial', 'u_partial']]\
      .agg(lower_price=('l_partial', 'sum'),
           upper_price=('u_partial', 'sum')).clip(0,np.inf)


    w = (final['upper_price'].sum() + final['lower_price'].sum()) / 2 
    y = 1000
    z = ((w/y)-1)*100

    d1 = {x : [w,y,z],
     'Index':['Estimate','Real','Error']}
     
    results = pd.DataFrame(data=d1).set_index('Index')

results

它只显示一个 excel 文件的结果。你知道我该如何解决这个问题吗?

【问题讨论】:

  • 缩进 print() 语句?
  • “没有成功”是一个非常有限的诊断描述。就像当你说“我病了”时医生不能给你开药一样。我们需要更多信息。
  • 出现 &: 'int' 和 'str' 错误的不支持的操作数类型
  • 不,我只懂一点python

标签: python pandas numpy dataframe


【解决方案1】:

第一个问题在这里:

df3 = pd.read_excel(x &".xlsx").format(x)

在 Visual Basic 和 VBA 中,字符串与 & 连接。

在Python中,操作符是+,但需要确保两边都有字符串。

由于files 只包含数字,x 也将是一个数字。要将其转换为字符串,请使用str(x):

df3 = pd.read_excel(str(x) + ".xlsx").format(x)

下一个问题很可能在这里:

results = pd.DataFrame(data=d1).set_index('Index')

对于第二个文件,这将替换第一个文件的结果。您需要找到一种方法来组合您的数据。可能喜欢described here

【讨论】:

  • 您好!感谢您的时间和解释。该脚本仍然只解决一个 excel 作为结果,我希望有一个包含所有 excel 文件计算的数据框。
猜你喜欢
  • 1970-01-01
  • 2019-09-08
  • 2023-01-25
  • 2015-06-21
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-07-08
  • 1970-01-01
相关资源
最近更新 更多