【问题标题】:What is the most pythonic way to sort dates sequences?对日期序列进行排序的最 Pythonic 方式是什么?
【发布时间】:2014-04-15 05:40:20
【问题描述】:

我有一个代表一年中一个月的字符串列表(未排序且不连续): ['1/2013', '7/2013', '2/2013', '3/2013', '4/2014', '12/2013', '10/2013', '11/2013', '1/2014', '2/2014']

我正在寻找一种 Pythonic 方法来对所有这些进行排序并按照以下建议分隔每个连续序列:

[ ['1/2013', '2/2013', '3/2013', '4/2013'], 
  ['7/2013'], 
  ['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'] 
]

有什么想法吗?

【问题讨论】:

  • 它们是如何分组的,连续几个月?

标签: python algorithm python-2.7 sequence


【解决方案1】:

基于the example from the docs that shows how to find runs of consecutive numbers 使用itertools.groupby()

from itertools import groupby
from pprint import pprint

def month_number(date):
    month, year = date.split('/')
    return int(year) * 12 + int(month)

L = [[date for _, date in run]
     for _, run in groupby(enumerate(sorted(months, key=month_number)),
                           key=lambda (i, date): (i - month_number(date)))]
pprint(L)

解决方案的关键是与enumerate() 生成的范围进行区分,以便连续的月份都出现在同一个组中(运行)。

输出

[['1/2013', '2/2013', '3/2013'],
 ['7/2013'],
 ['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'],
 ['4/2014']]

【讨论】:

  • months 未定义?
  • @RedBaron: months 是问题的原始列表。
  • 虽然最短,但这个解决方案看起来不像是非常惯用的 Python,尽管 Haskell 或 F# 粉丝会喜欢它:) 可以通过将日期转换为月数来稍微简化它:pastebin.com/8KR8Ayzc
  • @9000:使用月数绝对是一种改进。我已经更新了答案
【解决方案2】:

groupby 示例很可爱,但过于密集,并且会在以下输入上中断:['1/2013', '2/2017'],即当存在不相邻年份的相邻月份时。

from datetime import datetime
from dateutil.relativedelta import relativedelta

def areAdjacent(old, new):
    return old + relativedelta(months=1) == new

def parseDate(s):
    return datetime.strptime(s, '%m/%Y')

def generateGroups(seq):
    group = []
    last = None
    for (current, formatted) in sorted((parseDate(s), s) for s in seq):
        if group and last is not None and not areAdjacent(last, current):
            yield group
            group = []
        group.append(formatted)
        last = current
    if group:
        yield group

结果:

[['1/2013', '2/2013', '3/2013'], 
 ['7/2013'],
 ['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'],
 ['4/2014']]

【讨论】:

    【解决方案3】:

    如果您只想对列表进行排序,请使用 sorted 函数并传递 key 值 = 将日期字符串转换为 Python 的 datetime 对象的函数 lambda d: datetime.strptime(d, '%m/%Y'),请检查以下代码示例为您的列表作为 L

    >>> from datetime import datetime
    >>> sorted(L, key = lambda d: datetime.strptime(d, '%m/%Y'))
    ['1/2013', '2/2013', '3/2013', '7/2013', '10/2013', 
     '11/2013', '12/2013', '1/2014', '2/2014', '4/2014'] # indented by hand
    

    要将“月/年字符串列表”拆分为“连续月份列表列表”,您可以使用以下脚本(读取 cmets),其中,我首先对列表L 进行排序,然后在基础上对字符串进行分组连续月份(检查连续月份我写了一个函数):

    def is_cm(d1, d2):
        """ is consecutive month pair?
            : Assumption d1 is older day's date than d2
        """
        d1 = datetime.strptime(d1, '%m/%Y')
        d2 = datetime.strptime(d2, '%m/%Y') 
    
        y1, y2 = d1.year, d2.year
        m1, m2 = d1.month, d2.month
    
        if y1 == y2: # if years are same d2 should be in next month
            return (m2 - m1) == 1
        elif (y2 - y1) == 1: # if years are consecutive
            return (m1 == 12 and m2 == 1)
    

    它的工作原理如下:

    >>> is_cm('1/2012', '2/2012')
    True # yes, consecutive
    >>> is_cm('12/2012', '1/2013')
    True # yes, consecutive
    >>> is_cm('1/2015', '12/2012') # None --> # not consecutive
    >>> is_cm('12/2012', '2/2013')
    False # not consecutive
    

    拆分代码的代码:

    def result(dl):
        """
        dl: dates list - a iterator of 'month/year' strings
        type: list of strings
    
        returns: list of lists of strings
        """
        #Sort list:
        s_dl = sorted(dl, key=lambda d: datetime.strptime(d, '%m/%Y'))
        r_dl = [] # list to be return
        # split list into list of lists
        t_dl = [s_dl[0]] # temp list
        for d in s_dl[1:]:
            if not is_cm(t_dl[-1], d): # check if months are not consecutive
                r_dl.append(t_dl)
                t_dl = [d]
            else:
                t_dl.append(d)
        return r_dl
    
    result(L)
    

    不要忘记包含from datetime import datetime,这个技巧我相信你可以很容易地更新一个新的日期列表,其中日期是其他格式的。

    在@9000 提示之后,如果您想检查旧脚本检查@codepad,我可以简化我的排序函数并删除旧答案。

    【讨论】:

    • 添加生成最终嵌套列表的脚本链接@codepad
    • 如果您从日期开始不再考虑数字对,您可以简化您的解决方案 :)
    • @9000 谢谢如果您有进一步的建议,请帮助我改进我的答案。
    【解决方案4】:

    在这种特定情况下(元素不多)的简单解决方案就是迭代所有月份:

    year = dates[0].split('/')[1]
    result = []
    current = []
    for i in range(1, 13):
        x = "%i/%s" % (i, year)
        if x in dates:
            current.append(x)
            if len(current) == 1:
                result.append(current)
        else:
            current = []
    

    【讨论】:

      【解决方案5】:

      好吧,这是一个没有 itertools 的工具,而且我可以在不影响可读性的情况下尽可能短。诀窍在于使用zip。这基本上是@moe 的答案展开了一点。

      def parseAsPair(piece):
        """Transforms things like '7/2014' into (2014, 7) """
        m, y = piece.split('/')
        return (int(y), int(m))
      
      def goesAfter(earlier, later):
        """Returns True iff earlier goes right after later."""
        earlier_y, earlier_m = earlier
        later_y, later_m = later
        if earlier_y == later_y:  # same year?
          return later_m == earlier_m + 1 # next month
        else: # next year? must be Dec -> Jan
          return later_y == earlier_y + 1 and earlier_m == 12 and later_m == 1
      
      def groupSequentially(months):
        result = []  # final result
        if months:
          sorted_months = sorted(months, key=parseAsPair)
          span = [sorted_months[0]]  # current span; has at least the first month
          for earlier, later in zip(sorted_months, sorted_months[1:]):
            if not goesAfter(parseAsPair(earlier), parseAsPair(later)):
              # current span is over
              result.append(span)
              span = []
            span.append(later)
          # last span was not appended because sequence ended without breaking
          result.append(span)
        return result
      

      尝试一下:

      months =['1/2013', '7/2013', '2/2013', '3/2013', '4/2014', '12/2013',
               '10/2013', '11/2013', '1/2014', '2/2014']
      
      print groupSequentially(months)  # output wrapped manually
      
      [['1/2013', '2/2013', '3/2013'], 
       ['7/2013'], 
       ['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'], 
       ['4/2014']]
      

      如果我们最后将parseAsPair 映射到列表上,我们可以节省一些性能和认知负担。然后每次调用parseAsPair 都可以从groupSequentially 中删除,但我们必须再次将结果转换为字符串。

      【讨论】:

        猜你喜欢
        • 2019-10-05
        • 1970-01-01
        • 2013-07-07
        • 1970-01-01
        • 2018-07-26
        • 1970-01-01
        • 2020-10-25
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多