【问题标题】:Python - take the time difference from the first date in a columnPython - 取列中第一个日期的时间差
【发布时间】:2019-03-30 18:05:30
【问题描述】:

给定日期列,我想创建另一个列差异来计算距第一个日期的天数。

date                    diff
2011-01-01 00:00:10      0
2011-01-01 00:00:11      0.000011 days
2011-02-01 00:00:11      30.000011 days 
2013-02-01 00:00:11      395.000011 days
2014-02-01 00:00:11      760.000011 days

日期为日期时间。到目前为止我尝试了什么:

df = df.sort_values(['date'], ascending=True)
df.set_index('date', inplace = True)
first = df.index[0]
df['diff'] = (first - df.index.shift()).fillna(0)

【问题讨论】:

    标签: python pandas datetime dataframe


    【解决方案1】:

    您可以使用这种方法而无需设置新索引

    原始数据帧

    df
                     date        diff
    0 2011-01-01 00:00:10    0.000000
    1 2011-01-01 00:00:11    0.000011
    2 2011-02-01 00:00:11   30.000011
    3 2013-02-01 00:00:11  395.000011
    4 2014-02-01 00:00:11  760.000011
    

    可能的答案

    df['diff_new'] = df['date'] - df.loc[0,'date']
    
                     date        diff           diff_new
    0 2011-01-01 00:00:10    0.000000    0 days 00:00:00
    1 2011-01-01 00:00:11    0.000011    0 days 00:00:01
    2 2011-02-01 00:00:11   30.000011   31 days 00:00:01
    3 2013-02-01 00:00:11  395.000011  762 days 00:00:01
    4 2014-02-01 00:00:11  760.000011 1127 days 00:00:01
    

    顺便说一句,我得到了您在第三行的原始数据中显示的不同日期差异。您可以手动与this online tool to calculate date differences in days进行比较。

    【讨论】:

    • 我在这里的第一个答案是错误的,因为它给出了连续行之间的差异。我已经更新了这个。对造成的误解深表歉意。希望这会有所帮助。
    【解决方案2】:

    这是我将天数作为浮点数值的方法:

    dates = pd.to_datetime(df.date) # make sure we are working with dates and not strings
    df["diff"] = (dates - dates[0]).apply(lambda x: x.total_seconds() / 86400))
    

    生成的df

                      date         diff
    0  2011-01-01 00:00:10     0.000000
    1  2011-01-01 00:00:11     0.000012
    2  2011-02-01 00:00:11    31.000012
    3  2013-02-01 00:00:11   762.000012
    4  2014-02-01 00:00:11  1127.000012
    

    【讨论】:

      【解决方案3】:

      这就是你尝试的......

      >>> df
                        date
      0  2011-01-01 00:00:10
      1  2011-01-01 00:00:11
      2  2011-02-01 00:00:11
      3  2013-02-01 00:00:11
      4  2014-02-01 00:00:11
      

      首先将它们转换为时间戳,这样数据才能正确地框起来,一旦转换,只需区分DataFrame:

      >>> df2 = df.apply(lambda x: [pd.Timestamp(ts) for ts in x])
      >>> df['diff']  = (df2 - df2.shift()).fillna(0)
      >>> df
                        date              diff
      0  2011-01-01 00:00:10   0 days 00:00:00
      1  2011-01-01 00:00:11   0 days 00:00:01
      2  2011-02-01 00:00:11  31 days 00:00:00
      3  2013-02-01 00:00:11 731 days 00:00:00
      4  2014-02-01 00:00:11 365 days 00:00:00
      

      【讨论】:

        【解决方案4】:

        你可以试试

        df['diff'] = df.date - df.date.min()
        
        df
                         date               diff
        0 2011-01-01 00:00:10    0 days 00:00:00
        1 2011-01-01 00:00:11    0 days 00:00:01
        2 2011-02-01 00:00:11   31 days 00:00:01
        3 2013-02-01 00:00:11  762 days 00:00:01
        4 2014-02-01 00:00:11 1127 days 00:00:01
        

        【讨论】:

        • @nimrodz min() 如果有其他日期少于第一个日期,则可能会失败。最好只使用df.date - df.date[0]
        • @pygo df 示例添加
        • @SaiKumar min() 应该在这里工作,因为 df.date[0] 是最小日期,因为值排序 df.sort_values(['date'], ascending=True)
        猜你喜欢
        • 2020-06-30
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-09-08
        • 1970-01-01
        • 2021-09-08
        • 1970-01-01
        • 2012-10-23
        相关资源
        最近更新 更多