【问题标题】:Finding the amount of time difference between dates in python在python中查找日期之间的时间差
【发布时间】:2011-01-27 15:37:24
【问题描述】:

假设我有 2 个类似这样的列表:

L1=['Smith, John, 2008,  12,  10,  Male', 'Bates, John,  2006,  1,  Male', 'Johnson, John,  2009,  1,  28,  Male', 'James,  John,  2008,  3,  Male']

L2=['Smith,  Joy, 2008,  12,  10,  Female', 'Smith,  Kevin,  2008,  12,  10,  Male', 'Smith,  Matt,  2008,  12,  10,  Male', 'Smith,  Carol,  2000,  12,  11,  Female', 'Smith,  Sue,  2000,  12,  11,  Female', 'Johnson,  Alex,  2008,  3,  Male', 'Johnson,  Emma,  2008,  3,  Female', 'James,  Peter,  2008,  3,  Male', 'James,  Chelsea,  2008,  3,  Female'] 

我想用它来比较一个家庭中每个人(姓氏相同)的日期与他们每个家庭中的“约翰”。日期从包括年、月和日,到只有年和月,再到只有年。我想找出约翰的日期和他每个家庭成员的日期之间的差异,直到我能做到的最具体的点(如果一个日期包含所有 3 个部分而另一个只有月份和年份,那么只找到月份和年份的时差)。这是我迄今为止尝试过的方法,但它不起作用,因为它没有使用正确的名称和日期(它只给了每个约翰一个兄弟姐妹),而且它计算日期之间时间的方式令人困惑和错误:

for line in L1:
    type=line.split(',')
    if len(type)>=1:
        family=type[0]
        if len(type)==6:
            yearA=type[2]
            monthA=type[3]
            dayA=type[4]
            sex=type[5]
            print '%s, John Published in %s, %s, %s, %s' %(family, yearA, monthA, dayA, sex)
        elif len(type)==5:
            yearA=type[2]
            monthA=type[3]
            sex=type[4]
            print '%s, John Published in %s, %s, %s' %(family, yearA, monthA, sex)
        elif len(type)==4:
            yearA=type[2]
            sex=type[3]
            print '%s, John Published in %s, %s' %(family, yearA, sex)
    for line in L2:
        if re.search(family, line):
            word=line.split(',')
            name=word[1]
            if len(word)==6:
                yearB=word[2]
                monthB=word[3]
                dayB=word[4]
                sex=word[5]
            elif len(word)==5:
                yearB=word[2]
                monthB=word[3]
                sex=word[4]
            elif len(word)==4:
                yearB=word[2]
                sex=word[3]
    if dayA and dayB:
        yeardiff= int(yearA)-int(yearB)
        monthdiff=int(monthA)-int(monthB)
        daydiff=int(dayA)-int(dayB)
        print'%s, %s Published %s year(s), %s month(s), %s day(s) before/after John, %s' %(family, name, yeardiff, monthdiff, daydiff, sex)
    elif not dayA and not dayB  and monthA and monthB:
        yeardiff= int(yearA)-int(yearB)
        monthdiff=int(monthA)-int(monthB)
        print'%s, %s Published %s year(s), %s month(s), before/after John, %s' %(family, name, yeardiff, monthdiff, sex)
    elif not monthA and not monthB and yearA and yearB:
        yeardiff= int(yearA)-int(yearB)
        print'%s, %s Published %s year(s), before/after John, %s' %(family, name, yeardiff, sex)

我想最终得到一个看起来像这样的东西,如果可能的话,让程序能够区分兄弟姐妹是在之前还是之后出现的东西,并且只打印月份和日期,如果它们同时出现在两个比较日期:

Smith, John Published in  2008,  12,  10,  Male 
Smith,  Joy Published _ year(s) _month(s) _day(s) before/after John, Female 
Smith,  Kevin Published _ year(s) _month(s) _day(s) before/after John,  Male
Smith,  Matt Published _ year(s) _month(s) _day(s) before/after John,  Male
Smith,  Carol Published _ year(s) _month(s) _day(s) before/after John,  Female
Smith,  Sue Published _ year(s) _month(s) _day(s) before/after John,  Female
Bates, John Published in  2006,  1,  Male
Johnson, John Published in  2009,  1,  28,  Male
Johnson,  Alex Published _ year(s) _month(s) _day(s) before/after John,  Male
Johnson,  Emma Published _ year(s) _month(s) _day(s) before/after John,  Female
James,  John Published in  2008,  3,  Male
James,  Peter Published _ year(s) _month(s) _day(s) before/after John,  Male
James,  Chelsea Published _ year(s) _month(s) _day(s) before/after John,  Female

【问题讨论】:

  • 将时间间隔表示为年+月+日并非易事。示例:2007-01-31 和 2008-03-01 之间有多少年、多少月和多少天?
  • 这就是我希望程序在“约翰”和他的每个兄弟姐妹之间找出的内容,并查看兄弟姐妹是在之前还是之后。
  • 之前/之后以及天数都不是问题。问题是将它们转换为年+月+日。
  • 这是我应该使用 datetime 模块的东西吗?我从来没有听说过它,但我现在正在读一点。不知道如何将信息正确地输入那里
  • 您想要的是相对增量。查看 datetimes 和 dateutil 模块:niemeyer.net/python-dateutil Dateutil 提供相对增量,将闰年和二月等因素考虑在内。 (即“未来 1 个月后的日期是什么?”或“这两个日期之间有多少个月和多少周”)

标签: python list time


【解决方案1】:

正如 Joe Kington 所建议的,dateutil module 对此很有用。 特别是,它可以告诉您两个日期之间的年、月和日之间的差异。 (自己进行计算需要考虑闰年等。使用经过良好测试的模块比重新发明这个轮子要好得多。)

这个问题适用于类。

让我们创建一个 Person 类来跟踪一个人的姓名、性别和发布日期:

class Person(object):
    def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
        self.lastname=lastname
        self.firstname=firstname
        self.ymd=VagueDate(year,month,day)
        self.gender=gender

发布日期可能会丢失数据,所以让我们创建一个特殊的类来处理丢失的日期数据:

class VagueDate(object):
    def __init__(self,year=None,month=None,day=None):
        self.year=year
        self.month=month
        self.day=day
    def __sub__(self,other):
        d1=self.asdate()
        d2=other.asdate()
        rd=relativedelta.relativedelta(d1,d2)
        years=rd.years
        months=rd.months if self.month and other.month else None
        days=rd.days if self.day and other.day else None
        return VagueDateDelta(years,months,days)

datetime 模块定义了datetime.datetime 对象,并使用datetime.timedelta 对象来表示两个datetime.datetime 对象之间的差异。类似地,让我们定义一个VagueDateDelta 来表示两个VagueDates 之间的差异:

class VagueDateDelta(object):
    def __init__(self,years=None,months=None,days=None):
        self.years=years
        self.months=months
        self.days=days
    def __str__(self):
        if self.days is not None and self.months is not None:
            return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
        elif self.months is not None:
            return '{s.years} years, {s.months} months'.format(s=self)
        else:
            return '{s.years} years'.format(s=self)

现在我们已经为自己构建了一些方便的工具,解决问题并不难。

第一步是解析字符串列表并将其转换为Person对象:

def parse_person(text):
    data=map(str.strip,text.split(','))
    lastname=data[0]
    firstname=data[1]
    gender=data[-1]
    ymd=map(int,data[2:-1])
    return Person(lastname,firstname,gender,*ymd)
johns=map(parse_person,L1)
peeps=map(parse_person,L2)

接下来我们将peeps重组为家庭成员的字典:

family=collections.defaultdict(list)
for person in peeps:
    family[person.lastname].append(person)

最后,您只需遍历johns 和每个john 的家庭成员,比较发布日期并报告结果。

完整的脚本可能如下所示:

import datetime as dt
import dateutil.relativedelta as relativedelta
import pprint
import collections

class VagueDateDelta(object):
    def __init__(self,years=None,months=None,days=None):
        self.years=years
        self.months=months
        self.days=days
    def __str__(self):
        if self.days is not None and self.months is not None:
            return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
        elif self.months is not None:
            return '{s.years} years, {s.months} months'.format(s=self)
        else:
            return '{s.years} years'.format(s=self)

class VagueDate(object):
    def __init__(self,year=None,month=None,day=None):
        self.year=year
        self.month=month
        self.day=day
    def __sub__(self,other):
        d1=self.asdate()
        d2=other.asdate()
        rd=relativedelta.relativedelta(d1,d2)
        years=rd.years
        months=rd.months if self.month and other.month else None
        days=rd.days if self.day and other.day else None
        return VagueDateDelta(years,months,days)
    def asdate(self):
        # You've got to make some kind of arbitrary decision when comparing
        # vague dates. Here I make the arbitrary decision that missing info
        # will be treated like 1s for the purpose of calculating differences.
        return dt.date(self.year,self.month or 1,self.day or 1)
    def __str__(self):
        if self.day is not None and self.month is not None:
            return '{s.year}, {s.month}, {s.day}'.format(s=self)
        elif self.month is not None:
            return '{s.year}, {s.month}'.format(s=self)
        else:
            return '{s.year}'.format(s=self)

class Person(object):
    def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
        self.lastname=lastname
        self.firstname=firstname
        self.ymd=VagueDate(year,month,day)
        self.gender=gender
    def age_diff(self,other):
        return self.ymd-other.ymd
    def __str__(self):
        fmt='{s.lastname}, {s.firstname} ({s.gender}) ({d.year},{d.month},{d.day})'
        return fmt.format(s=self,d=self.ymd)
    __repr__=__str__
    def __lt__(self,other):
        d1=self.ymd.asdate()
        d2=other.ymd.asdate()
        return d1<d2

def parse_person(text):
    data=map(str.strip,text.split(','))
    lastname=data[0]
    firstname=data[1]
    gender=data[-1]
    ymd=map(int,data[2:-1])
    return Person(lastname,firstname,gender,*ymd)

def main():
    L1=['Smith, John, 2008, 12, 10, Male', 'Bates, John, 2006, 1, Male',
        'Johnson, John, 2009, 1, 28, Male', 'James, John, 2008, 3, Male']

    L2=['Smith, Joy, 2008, 12, 10, Female', 'Smith, Kevin, 2008, 12, 10, Male',
        'Smith, Matt, 2008, 12, 10, Male', 'Smith, Carol, 2000, 12, 11, Female',
        'Smith, Sue, 2000, 12, 11, Female', 'Johnson, Alex, 2008, 3, Male',
        'Johnson, Emma, 2008, 3, Female', 'James, Peter, 2008, 3, Male',
        'James, Chelsea, 2008, 3, Female']

    johns=map(parse_person,L1)
    peeps=map(parse_person,L2)

    print(pprint.pformat(johns))
    print
    print(pprint.pformat(peeps))
    print

    family=collections.defaultdict(list)
    for person in peeps:
        family[person.lastname].append(person)

    # print(family)
    pub_fmt='{j.lastname}, {j.firstname} Published in {j.ymd}, {j.gender}'
    rel_fmt='  {r.lastname}, {r.firstname} Published {d} {ba} John, {r.gender}'
    for john in johns:
        print(pub_fmt.format(j=john))
        for relative in family[john.lastname]:
            diff=john.ymd-relative.ymd
            ba='before' if relative<john else 'after'
            print(rel_fmt.format(
                r=relative,
                d=diff,
                ba=ba,                
                ))

if __name__=='__main__':
    main()

产量

[Smith, John (Male) (2008,12,10),
 Bates, John (Male) (2006,1,None),
 Johnson, John (Male) (2009,1,28),
 James, John (Male) (2008,3,None)]

[Smith, Joy (Female) (2008,12,10),
 Smith, Kevin (Male) (2008,12,10),
 Smith, Matt (Male) (2008,12,10),
 Smith, Carol (Female) (2000,12,11),
 Smith, Sue (Female) (2000,12,11),
 Johnson, Alex (Male) (2008,3,None),
 Johnson, Emma (Female) (2008,3,None),
 James, Peter (Male) (2008,3,None),
 James, Chelsea (Female) (2008,3,None)]

Smith, John Published in 2008, 12, 10, Male
  Smith, Joy Published 0 years, 0 months, 0 days after John, Female
  Smith, Kevin Published 0 years, 0 months, 0 days after John, Male
  Smith, Matt Published 0 years, 0 months, 0 days after John, Male
  Smith, Carol Published 7 years, 11 months, 29 days before John, Female
  Smith, Sue Published 7 years, 11 months, 29 days before John, Female
Bates, John Published in 2006, 1, Male
Johnson, John Published in 2009, 1, 28, Male
  Johnson, Alex Published 0 years, 10 months before John, Male
  Johnson, Emma Published 0 years, 10 months before John, Female
James, John Published in 2008, 3, Male
  James, Peter Published 0 years, 0 months after John, Male
  James, Chelsea Published 0 years, 0 months after John, Female

【讨论】:

  • 这似乎是要走的路。 +1!
【解决方案2】:

如 cmets 中所述(在 @Matt 的回答中),您至少需要“年、月、日”才能使用 datetime.datedatetime.timedelta。从上面的示例数据中,看起来有些条目可能缺少“day”,这使得它变得更加棘手。

如果您不使用月份/天数的默认值(例如 1 月 1 日),那么您可以很快将这些日期转换为 datetime.date 实例。

举个简单的例子:

johns = []
for s in L1:
    # NOTE: not the most robust parsing method. 
    v = [x.strip() for x in s.split(",")]
    data = {
        "gender": v[-1],
        "last_name": v[0],
        "first_name": v[1],
    }

    # build keyword args for datetime.date()
    v = v[2:-1] # remove parsed data
    kwargs = { "year": int(v.pop(0)), "month": 1, "day":1 }
    try:
        kwargs["month"] = int(v.pop(0))
        kwargs["day"] = int(v.pop(0))
    except:
        pass

    data["date"] = date(**kwargs)
    johns.append(data)

这会为您提供包含姓名、性别和日期的dict 列表。您可以对L2 执行相同的操作,通过从另一个中减去一个date 来计算日期差异(这会产生一个timedelta 对象。

>>> a = date(2008, 12,12)
>>> b = date(2010, 1, 13)
>>> delta = b - a
>>> print delta.days
397
>>> print "%d years, %d days" % divmod(delta.days, 365)
1 years, 32 days

我故意省略了 month,因为它不会像将 30 天等同于一个月那样简单。可以说,如果考虑闰年,假设一年 365 天同样不准确。

更新:以年、月、日的形式显示时间增量

如果您需要以年、月和日的形式显示增量,则在 timedelta 返回的日期上执行 divmod 可能不准确,因为这没有考虑闰年和月份中的不同日期。您必须手动比较每个日期的每一年、每一月和每一天。

这是我对这样一个功能的尝试。 (仅经过轻微测试,因此请谨慎使用)

from datetime import timedelta
def my_time_delta(d1,d2):
    """
    Returns time delta as the following tuple:
        ("before|after|same", "years", "months", "days")
    """
    if d1 == d2:
        return ("same",0,0,0)

    # d1 before or after d2?
    if d1 > d2:
        ba = "after"
        d1,d2 = d2,d1 # swap so d2 > d1
    else:
        ba = "before"

    years  = d2.year - d1.year
    months = d2.month - d1.month
    days   = d2.day - d1.day

    # adjust for -ve days/months
    if days < 0:
        # get last day of month for month before d1
        pre_d1 = d1 - timedelta(days=d1.day)
        days = days + pre_d1.day
        months = months - 1

    if months < 0:
        months = months + 12
        years  = years - 1

    return (ba, years, months, days)

示例用法:

>>> my_time_delta(date(2003,12,1), date(2003,11,2))
('after', 0, 0, 30)
>>> my_time_delta(date(2003,12,1), date(2004,11,2))
('before', 0, 11, 1)
>>> my_time_delta(date(2003,2,1), date(1992,3,10))
('after', 10, 10, 20)
>>> p,y,m,d = my_time_delta(date(2003,2,1), date(1992,3,10))
>>> print "%d years, %d months, %d days %s" % (y,m,d,p)
10 years, 10 months, 20 days after

【讨论】:

  • 谢谢!只是一个问题,当我执行 date1-date2 来获取 timedelta 对象时,它会以“333 天,0:00:00”之类的形式给我答案。有没有办法把它变成年月日格式?
  • 我已经更新了应该回答这个问题的答案,即使用 timedelta.days()。
  • 如果你要添加月份,只使用标准的 30 天 = 1 个月,你会怎么写?
  • years, days = divmod(delta.days, 365); months, days = divmod(days, 30);
  • 如果我的格式是“%d 年,%d 个月,%d 天”,我将如何做到这一点?抱歉问了这么多问题!
【解决方案3】:

这种类型的事情可能存在现有模块,但我会首先将日期转换为常见的时间单位(即,在您的示例中,自 19XX 年 1 月 1 日以来的天数)。然后您可以轻松地比较它们、减去它们等,并且您可以将它们转换回您认为适合显示的天数。如果天数是你想要的,这应该很容易。

【讨论】:

  • 如果我只需要使用这个小数据集,这将起作用,但这正在应用于更大的数据集,以上数据只是一个示例
  • 您也可以尝试使用 datetime.timedelta,它适用于任何两个日期、时间或日期时间实例,但同样必须先将日期转换为这些格式。
  • AFAIK,您至少需要“年、月、日”才能使用 datetime.date 和 datetime.timedelta。从上面的示例数据来看,有些条目可能缺少“day”,这使得它变得更加棘手。
  • 这是我可以通过说 if len(type)==6: yearA=type[2] monthA=type[3] dayA=type[4] date=datetime.date(yearA , 月 A, 日 A)
  • 是的,我认为这是正确的。在缺少默认/平均日期的情况下,您始终可以使用默认/平均日期,以尽可能保持比较标准。不过,这一切都很丑陋。编辑:这是对肖恩评论的回应。
猜你喜欢
  • 2019-01-18
  • 2016-11-30
  • 1970-01-01
  • 1970-01-01
  • 2019-04-17
  • 1970-01-01
  • 2023-04-01
  • 2021-12-27
  • 1970-01-01
相关资源
最近更新 更多