【问题标题】:Iterate through S3 objects, rather than just all of the key /bucket in an object遍历 S3 对象,而不仅仅是对象中的所有键 /bucket
【发布时间】:2021-11-27 23:44:46
【问题描述】:

下面是一个常用的共享函数,用于遍历存储桶中的所有对象,但如果我只想遍历特定键怎么办,即假设 S3 URI 是:s3://test-data-lake/test1/测试2/

测试二后有五个json文件即s3://test-data-lake/test1/test2/test1.json..

如何更改此代码以处理上述问题?

def iterate_bucket_items(bucket):
    """
    Generator that iterates over all objects in a given s3 bucket

    See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 
    for return data format
    :param bucket: name of s3 bucket
    :return: dict of metadata for an object
    """


    client = boto3.client('s3')
    paginator = client.get_paginator('list_objects_v2')
    page_iterator = paginator.paginate(Bucket=bucket)

    for page in page_iterator:
        if page['KeyCount'] > 0:
            for item in page['Contents']:
                yield item


for i in iterate_bucket_items(bucket='my_bucket'):
    print i

【问题讨论】:

  • 为了避免分页的需要,可以使用 Bucket Resource 接口而不是Client 接口。例如:objects = s3.Bucket('mybucket').objects.filter(Prefix='test1/test2/')
  • 下面似乎有效,你!

标签: python python-3.x amazon-web-services amazon-s3 boto3


【解决方案1】:

你可以使用Prefix:

def iterate_bucket_items(bucket, prefix=''):
    """
    Generator that iterates over all objects in a given s3 bucket

    See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 
    for return data format
    :param bucket: name of s3 bucket
    :return: dict of metadata for an object
    """


    client = boto3.client('s3')
    paginator = client.get_paginator('list_objects_v2')
    page_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)

    for page in page_iterator:
        if page['KeyCount'] > 0:
            for item in page['Contents']:
                yield item


for i in iterate_bucket_items(bucket='my_bucket', prefix='test1/test2/'):
    print(i)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2023-04-05
    • 1970-01-01
    • 2019-07-01
    • 1970-01-01
    • 1970-01-01
    • 2020-01-17
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多