【问题标题】:Calculate specific function from the Date column and input parameter specified by user in Pandas根据用户在 Pandas 中指定的日期列和输入参数计算特定函数
【发布时间】:2020-07-14 06:51:27
【问题描述】:

我有一个如下所示的 df。

Date                t_factor     
2020-02-01             5             
2020-02-03             23              
2020-02-06             14           
2020-02-09             23
2020-02-10             23  
2020-02-11             23          
2020-02-13             30            
2020-02-20             29            
2020-02-29             100
2020-03-01             38
2020-03-10             38               
2020-03-11             38                    
2020-03-26             70           
2020-03-29             70 

       

据此,我想创建一个函数,该函数将根据计算值 t1、t2 和 t3 计算名为 t_function 的列。

用户将在其中输入以下参数。

Step1:
Enter start_date1 = 2020-02-01
Enter end_date1 =  2020-02-06
Enter a0 = 3
Enter a1 = 1
Enter a2 = 0

calculate t1 as number of days from start_date1 (2020-02-01) to the values in date column till end_date1.
t_function = a0 + a1*t1 + a2*(t1)**2

Step2:
Enter start_date2 = 2020-02-13
Enter end_date2 =  2020-02-29
Enter a0 = 2
Enter a1 = 0
Enter a2 = 1
calculate time_in_days as t2, which is 1 on start_date2 = 2020-02-13 and so on till end_date2
t_function = a0 + a1*t2 + a2*(t2)**2


Step3:
Enter start_date3 = 2020-03-11
Enter end_date3 =  2020-03-29
Enter a0 = 4
Enter a1 = 0
Enter a2 = 0
calculate time_in_days as t3, which is 1 on start_date2 = 2020-02-13 and so on till end_date2
t_function = t_function = a0 + a1*t3 + a2*(t3)**2

预期输出:

Date                t_factor     t1         t2         t3       t_function
2020-02-01             5          1         NaN        NaN      4
2020-02-03             23         3         NaN        NaN      6
2020-02-06             14         6         NaN        NaN      9
2020-02-09             23         NaN       NaN        NaN      NaN
2020-02-10             23         NaN       NaN        NaN      NaN
2020-02-11             23         NaN       NaN        NaN      NaN
2020-02-13             30         NaN        1         NaN      3   
2020-02-20             29         NaN        8         NaN      66
2020-02-29             100        NaN        17        NaN      291
2020-03-01             38         NaN       NaN        NaN      NaN
2020-03-10             38         NaN       NaN        NaN      NaN
2020-03-11             38         NaN       NaN        1        4 
2020-03-26             70         NaN       NaN        15       4
2020-03-29             70         NaN       NaN        18       4

注意: 初始 start_date 即 start_date1 应该是 Date 列的第一个日期。 最终 end_date 是 end_date3 应该是 Date 列的最终日期。 未使用列 t_factor。

之后我尝试了下面的代码来计算 t1,我很困惑。因为我是python和pandas的新手

df['t1'] = (df['Date'] - df.at[0, 'Date']).dt.days + 1

【问题讨论】:

  • 您能否澄清“将 t1 计算为从 start_date1 (2020-02-01) 到日期列中的值到 end_date1 的天数。”
  • @quest start_date1 是 2020-02-01,所以 t1 是 1,对于第二行 Date = 2020-02-03,它在 start_date1 和 end_date1 之间,所以 t1 = (2020-02 -03 - 2020-02-01) 天 + 1
  • 如果我理解正确的话,t1、t2和t3计算为与第一天在组中的差+1。但是 - 你如何计算 t_function?这是什么逻辑?
  • @Roy2012 你是对的.. t_function = a0 + a1*t1 + a2*(t1)**2 在每个步骤中用户可以更改 a0、a1 和 a2。

标签: python python-3.x pandas dataframe


【解决方案1】:

下面是我的做法:

import pandas as pd
from io import StringIO
from datetime import datetime, timedelta
import numpy as np

df = pd.read_csv(StringIO("""Date                t_factor     
2020-02-01             5             
2020-02-03             23              
2020-02-06             14           
2020-02-09             23           
2020-02-13             30            
2020-02-20             29            
2020-02-29             100               
2020-03-11             38                    
2020-03-26             70           
2020-03-29             70 """), sep="\s+", parse_dates=[0])
df

def fun(x, start="2020-02-01", end="2020-02-06", a0=3, a1=1, a2=0):
    start = datetime.strptime(start, "%Y-%m-%d")
    end = datetime.strptime(end, "%Y-%m-%d")
    if start <= x.Date <= end:
        t2 = (x.Date - start)/np.timedelta64(1, 'D') + 1
        diff = a0 + a1*t2 + a2*(t2)**2
    else:
        diff = np.NaN
    return diff

df["t1"] = df.apply(lambda x: fun(x), axis=1)
df["t2"] = df.apply(lambda x: fun(x, "2020-02-13", "2020-02-29", 2, 0, 1), axis=1)
df["t3"] = df.apply(lambda x: fun(x, "2020-03-11", "2020-03-29", 4, 0, 0), axis=1)
df["t_function"] =  df["t1"].fillna(0) + df["t2"].fillna(0) + df["t3"].fillna(0)

df

这是输出:

 Date   t_factor    t1  t2    t3    t_function
0   2020-02-01  5   4.0 NaN   NaN   4.0
1   2020-02-03  23  6.0 NaN   NaN   6.0
2   2020-02-06  14  9.0 NaN   NaN   9.0
3   2020-02-09  23  NaN NaN   NaN   0.0
4   2020-02-13  30  NaN 3.0   NaN   3.0
5   2020-02-20  29  NaN 66.0  NaN   66.0
6   2020-02-29  100 NaN 291.0 NaN   291.0
7   2020-03-11  38  NaN NaN   4.0   4.0
8   2020-03-26  70  NaN NaN   4.0   4.0
9   2020-03-29  70  NaN NaN   4.0   4.0

【讨论】:

  • 请分享你的输出
  • 非常感谢,这个计算t1、t2和t3..请帮我计算t_function
  • 啊,不用担心。这是完整的溶胶。想让做缺失的部分。我认为您在问题的所需输出中存在计算错误。
  • 有什么方法可以让我们使用 for 循环实现自动化。如果用户想要计算 t1、t2、t3、t4、t5 和 t6,并根据用户输入自动检查用户想要计算多少 t
  • 而不是 -> df["t_function"] = col1.fillna(0) + col2.fillna(0) + col3.fillna(0) 我们必须替换为 df["t_function"] = df["t1"].fillna(0) + df["t2"].fillna(0) + df["t3"].fillna(0)
猜你喜欢
  • 2013-05-10
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-02-01
  • 1970-01-01
相关资源
最近更新 更多