【问题标题】:Detecting if for-loop item is the last when yielding items?产生项目时检测for循环项目是否是最后一个?
【发布时间】:2017-02-15 11:36:49
【问题描述】:

我正在处理一个巨大的 postgresql 数据库,为此我创建了一个“获取”函数。

def fetch(cursor, batch_size=1e3):
    """An iterator that uses fetchmany to keep memory usage down"""
    while True:
        records = cursor.fetchmany(int(batch_size))
        if not records:
            break
        for record in records:
            yield record

对于每个项目我都在做一些处理,但现在我遇到了一个问题,在某些情况下最后一个项目将被省略,因为我正在对项目进行一些比较。一旦最后一项没有比较结果,什么都不会做。

connection = psycopg2.connect(<url>)
cursor = connection.cursor()

cursor.execute(<some query>)

temp_today = 0

for row in fetch(cursor):
    item = extract_variables(row)
    date = item['datetime']
    today = date.date()
    if temp_today is 0:
        # do something with first row
        temp_today = date
    # -----------------------------------------
    # I feel like I am missing a statement here
    # something like:
    # if row == rows[-1]:
    #     do something with last row..
    # -----------------------------------------
    elif temp_today.date() == today:
        # do something with every row where 
        # the date is the same
    else:
        # do something with every row where
        # the dates ain't the same

当我使用产量时,我如何处理最后一项?

使用 yield 对我来说非常重要,因为我正在处理一个非常大的数据集,如果我不这样做,我会耗尽内存。

【问题讨论】:

  • 应该可以从游标中获取结果集中的行数吧?然后您可以将计数器(枚举)与该数字进行比较。
  • ... as I am doing some comparison between the items 您可以在数据库中执行此操作(通过使用窗口函数,或通过一些自连接)

标签: python postgresql python-3.x iterator yield


【解决方案1】:

你可以定义另一个生成器,这样你就可以迭代返回的项目和前一个(如果有的话):

def pair( sequence):
    previous = None
    for item in sequence:
        yield (item, previous)
        previous = item

for item, previous_item in pair( mygenerator( args))
    if previous_item is None:
        # process item: first one returned
    else:
        # you can compare item and previous_item

【讨论】:

    【解决方案2】:

    感谢 cmets 的@Peter Smit,我使用了以下解决方案:

    connection = psycopg2.connect(<url>)
    cursor = connection.cursor()
    
    cursor.execute(<some query>)
    
    temp_today = 0
    parsed_count = 0
    cursor_count = cursor.rowcount
    
    for row in fetch(cursor):
        item = extract_variables(row)
        date = item['datetime']
        today = date.date()
        if temp_today is 0:
            # do something with first row
            temp_today = date
        elif parsed_count == cursor_count:
            # do something with the last row
        elif temp_today.date() == today:
            # do something with every row where 
            # the date is the same
        else:
            # do something with every row where
            # the dates ain't the same
    

    【讨论】:

      猜你喜欢
      • 2017-02-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-11-26
      • 2022-12-17
      • 2011-04-28
      相关资源
      最近更新 更多