【问题标题】:Calculate dates differences in months beetwen dates in a groupby计算组中日期之间的月份日期差
【发布时间】:2021-05-17 22:26:40
【问题描述】:

我有以下数据框:

Id, country, date
1, ar, 2019-01-01
1, ar, , 2019-02-01
1, ar, 2019-03-01
1, it, , 2019-01-01
1, it, , 2019-02-01
1, it, 2019-03-01
1, it, , 2019-04-01
1, it, 2019-03-01
2, ar, 2019-01-01
2, ar, , 2019-02-01
2, ar, 2019-03-01
2, it, , 2019-01-01
2, it, , 2019-02-01
3, it, 2019-03-01
3, it, , 2019-04-01
4, it, 2019-05-01

我需要按 ID、国家/地区分组并计算每个组的日期之间的差异(以月为单位)。

我试过了:

df['daysdiff'] = df.sort_values('date').groupby(['id','country'])['date'].diff()

但它会在几天内获得差异。我需要几个月的差异。我认为将“daysdiff”除以 30 是不准确的,因为月份有不同的天数......和闰年......

欢迎任何帮助!

【问题讨论】:

    标签: python pandas datetime pandas-groupby


    【解决方案1】:

    我根据你的情况调整了这个approach

    基本上你必须处理NaT 值。我选择将它们视为0

    如果您愿意,可以将月份四舍五入为整数。

    在您的示例中,有一个重复的行:"1", "it", "2019-03-01"

    这一行的结果是7, 1, it, 2019-03-01, 0 days, 0(因为它被认为是排序的唯一行作为输入)

    对于这种情况,它似乎有效,尽管我没有在其他情况下进行测试。

    import pandas as pd
    
    df = pd.DataFrame(columns=["id", "country", "date"]
        , data=[
        ["1", "ar", "2019-01-01"],
        ["1", "ar", "2019-02-01"],
        ["1", "ar", "2019-03-01"],
        ["1", "it", "2019-01-01"],
        ["1", "it", "2019-02-01"],
        ["1", "it", "2019-03-01"],
        ["1", "it", "2019-04-01"],
        ["1", "it", "2019-03-01"],
        ["2", "ar", "2019-01-01"],
        ["2", "ar", "2019-02-01"],
        ["2", "ar", "2019-03-01"],
        ["2", "it", "2019-01-01"],
        ["2", "it", "2019-02-01"],
        ["3", "it", "2019-03-01"],
        ["3", "it", "2019-04-01"],
        ["4", "it", "2019-05-01"]
    ])
    df["date"] = pd.to_datetime(df["date"])
    
    df['daysdiff'] = df.sort_values('date').groupby(['id','country'])['date'].diff()
    df['monthsdiff'] = (
        df
        .sort_values('date')
        .groupby(['id','country'])['date']
        .diff()
        # 365.25 [days/year] / (12 [months/year]) = 30.4375 [days/month]
        .div(pd.Timedelta(days=365.25/12), fill_value="0")
        .round()
        .astype(int)
        )
    print(df)
    #    id country       date daysdiff  monthsdiff
    # 0   1      ar 2019-01-01      NaT           0
    # 1   1      ar 2019-02-01  31 days           1
    # 2   1      ar 2019-03-01  28 days           1
    # 3   1      it 2019-01-01      NaT           0
    # 4   1      it 2019-02-01  31 days           1
    # 5   1      it 2019-03-01  28 days           1
    # 6   1      it 2019-04-01  31 days           1
    # 7   1      it 2019-03-01   0 days           0
    # 8   2      ar 2019-01-01      NaT           0
    # 9   2      ar 2019-02-01  31 days           1
    # 10  2      ar 2019-03-01  28 days           1
    # 11  2      it 2019-01-01      NaT           0
    # 12  2      it 2019-02-01  31 days           1
    # 13  3      it 2019-03-01      NaT           0
    # 14  3      it 2019-04-01  31 days           1
    # 15  4      it 2019-05-01      NaT           0
    

    【讨论】:

    • 感谢您的帮助!运行您的代码,我收到以下错误:UFuncTypeError: ufunc 'true_divide' cannot use operands with types dtype('O') and dtype('<m8[ns]')
    猜你喜欢
    • 2019-12-27
    • 2010-12-04
    • 2022-11-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-05-17
    • 2017-12-24
    • 1970-01-01
    相关资源
    最近更新 更多