【发布时间】:2021-11-27 23:44:46
【问题描述】:
下面是一个常用的共享函数,用于遍历存储桶中的所有对象,但如果我只想遍历特定键怎么办,即假设 S3 URI 是:s3://test-data-lake/test1/测试2/
测试二后有五个json文件即s3://test-data-lake/test1/test2/test1.json..
如何更改此代码以处理上述问题?
def iterate_bucket_items(bucket):
"""
Generator that iterates over all objects in a given s3 bucket
See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2
for return data format
:param bucket: name of s3 bucket
:return: dict of metadata for an object
"""
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket)
for page in page_iterator:
if page['KeyCount'] > 0:
for item in page['Contents']:
yield item
for i in iterate_bucket_items(bucket='my_bucket'):
print i
【问题讨论】:
-
为了避免分页的需要,可以使用 Bucket
Resource接口而不是Client接口。例如:objects = s3.Bucket('mybucket').objects.filter(Prefix='test1/test2/') -
下面似乎有效,你!
标签: python python-3.x amazon-web-services amazon-s3 boto3