同时使用 S3 和 Redshift 时的凭据问题答案

【问题标题】：Credential problems when using both S3 and Redshift同时使用 S3 和 Redshift 时的凭据问题
【发布时间】：2020-01-02 20:56:47
【问题描述】：

我正在运行一个 Spark SQL 程序，从 S3 和 Redshift 获取数据，加入数据，然后写回 EMR 上的 Redshift。我遇到了一个凭证问题，一旦我查询 Redshift，我就无法再访问 EMR，并且我的程序错误如下：

pyspark.sql.utils.IllegalArgumentException: u'AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).'

连接到redshift的代码是：

df.write \
 .format("com.databricks.spark.redshift") \
 .option("url", rs_jdbc + ":" + rs_port + "/" + rs_db + "?user=" + rs_username + "&password=" + rs_password) \
 .option("dbtable", table) \
 .option("tempdir", s3_temp_out) \
 .mode("error") \
 .save(mode='append')

对此的任何帮助将不胜感激

【问题讨论】：

请添加访问密钥和密钥以触发会话。 sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", AWS_ACCESS_KEY) sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", AWS_SECRET_KEY)
谢谢，这似乎可以设置“fs.s3.awsAccessKeyId”和“fs.s3.awsSecretAccessKey”，请随时添加答案，我会接受

标签： amazon-s3 pyspark amazon-redshift pyspark-sql amazon-emr

【解决方案1】：

我不建议使用访问密钥和密钥。最好使用here描述的相应角色的arn。

让 Redshift 担任 IAM 角色（最安全）：您可以授予 Redshift 在 COPY 或 UNLOAD 操作期间担任 IAM 角色的权限，以及然后配置此库以指示 Redshift 使用该角色：
Create an IAM role granting appropriate S3 permissions to your bucket.
Follow the guide Authorizing Amazon Redshift to Access Other AWS Services On Your Behalf to configure this role's trust policy in order
允许 Redshift 担任此角色。按照使用 IAM 角色授权 COPY 和 UNLOAD 操作指南中的步骤将该 IAM 角色与您的 Redshift 关联簇。将此库的 aws_iam_role 选项设置为角色的 ARN。

【讨论】：