【发布时间】:2021-12-11 23:26:27
【问题描述】:
我有一个 Kinesis 集群,它通过 Lambda 将数据推送到 Amazon Redshift。
目前我的 lambda 代码如下所示:
client = boto3.client('redshift-data')
for tx in txs:
query = # prepare an INSERT query here
resp = client.execute_statement(
ClusterIdentifier=redshift_cluster_id,
Database=redshift_db,
DbUser=redshift_user,
Sql=query
)
问题在于,一旦我尝试扩大 kinesis(更多分片)或 lambda(从单个分片并发处理) - 我就会明白:
[ERROR] ActiveStatementsExceededException: An error occurred (ActiveStatementsExceededException) when calling the ExecuteStatement operation: Active statements exceeded the allowed quota (200).
Traceback (most recent call last):
File "/opt/python/lib/python3.8/site-packages/codeguru_profiler_agent/aws_lambda/profiler_decorator.py", line 52, in profiler_decorate
return function(event, context)
File "/opt/python/lib/python3.8/site-packages/codeguru_profiler_agent/aws_lambda/lambda_handler.py", line 91, in call_handler
return handler_function(event, context)
File "/var/task/lambda_function.py", line 71, in lambda_handler
resp = client.execute_statement(
File "/var/runtime/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
从 AWS 文档中,我收集到这意味着我正在尝试并行运行太多 execute_statements。
我该如何解决这个问题?使用 Redshift 的唯一方法是通过批处理记录并将它们全部插入在一起吗?
【问题讨论】:
-
你真的不能使用这样的插入来更新红移——它太慢了,除非你的音量非常低。您需要将批处理写入 s3,然后使用 readshift 批量加载进程。
标签: python amazon-web-services aws-lambda amazon-redshift amazon-kinesis