【问题标题】:DynamoDB backup via AWS Data Pipeline and EMR通过 AWS Data Pipeline 和 EMR 进行 DynamoDB 备份
【发布时间】:2016-09-29 07:02:56
【问题描述】:

我们正在尝试通过 AWS Data Pipeline 将 DynamoDB 表备份到 S3。我们为此使用由 AWS (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part2.html) 提供的默认模板。但是,作业总是失败并出现错误。更改 EMR 版本不会更改错误消息。

任何人都知道会导致此错误的原因:

31 May 2016 09:57:10,013 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.EmrActivity@523f31f2
31 May 2016 09:57:10,086 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.EmrActivity: EMR transform starting.
31 May 2016 09:57:10,093 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client waiting for cluster to enter ready state for jobflow id 'j-2TUYGWQ1PYAHC'.
31 May 2016 09:57:10,094 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client checking if cluster is ready for jobflow with id 'j-2TUYGWQ1PYAHC'.
31 May 2016 09:57:10,226 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client reports that cluster with jobflow id 'j-2TUYGWQ1PYAHC' is ready.
31 May 2016 09:57:10,320 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client adding steps with request '{JobFlowId: j-2TUYGWQ1PYAHC,Steps: [{Name: df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4,ActionOnFailure: CONTINUE,HadoopJarStep: {Properties: [],Jar: s3://dynamodb-emr-eu-west-1/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,Args: [org.apache.hadoop.dynamodb.tools.DynamoDbExport, s3://my-db-backup.dev01.rule//2016-05-30-12-58-18, my-db.dev01.rule, 0.25]}}]}'
31 May 2016 09:58:10,506 [WARN] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: EMR job flow named 'df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18' with jobFlowId 'j-2TUYGWQ1PYAHC' is in status 'WAITING' because of the step 'df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' failures 'null'
31 May 2016 09:58:10,507 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' with jobFlowId 'j-2TUYGWQ1PYAHC' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' is in status 'FAILED' with reason 'null'
31 May 2016 09:58:10,507 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: Collecting steps stderr logs for cluster with AMI 2.4.8
31 May 2016 09:58:10,517 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.LogMessageUtil: Returning tail errorMsg :Exception in thread "main" java.lang.NoClassDefFoundError: com/amazon/ws/emr/core/InstanceInfo
    at org.apache.hadoop.dynamodb.DynamoDBUtil.getDynamoDBEndpoint(DynamoDBUtil.java:268)
    at org.apache.hadoop.dynamodb.DynamoDBClient.initConfigurations(DynamoDBClient.java:369)
    at org.apache.hadoop.dynamodb.DynamoDBClient.<init>(DynamoDBClient.java:88)
    at org.apache.hadoop.dynamodb.DynamoDBClient.<init>(DynamoDBClient.java:83)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.setTableProperties(DynamoDbExport.java:93)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.run(DynamoDbExport.java:75)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.main(DynamoDbExport.java:30)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.lang.ClassNotFoundException: com.amazon.ws.emr.core.InstanceInfo
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 13 more
31 May 2016 09:58:10,517 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: Collecting steps logs for cluster with AMI/ReleaseLabel 2.4.8
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelperFactory: Getting the helper for version 1.0.3
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Uploading step log details
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: path to step logss3n://my-db.dev01.rule-logs/df-09387105FF7URCW5QOR/EmrClusterForBackup/@EmrClusterForBackup_2016-05-30T12:58:18/@EmrClusterForBackup_2016-05-30T12:58:18_Attempt=2/j-2TUYGWQ1PYAHC/steps
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: step log file /mnt/taskRunner/output/logs/df-09387105FF7URCW5QOR/TableBackupActivity/@TableBackupActivity_2016-05-30T12:58:18/@TableBackupActivity_2016-05-30T12:58:18_Attempt=4/hadoop.jobs.log
31 May 2016 09:58:10,522 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done uploading hadoop log details
31 May 2016 09:58:10,763 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Field value updated 
31 May 2016 09:58:10,763 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done updating the field with value 
31 May 2016 09:58:10,767 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @TableBackupActivity_2016-05-30T12:58:18_Attempt=4
31 May 2016 09:58:10,767 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.TaskPoller: Work EmrActivity took 1:0 to complete

【问题讨论】:

  • 看起来 EMR 作业缺少依赖项。由于数据管道是一项托管服务,因此您无能为力。联系 AWS 支持。
  • 您使用什么版本的 EMR?

标签: amazon-web-services amazon-dynamodb emr amazon-data-pipeline


【解决方案1】:

您可能正在使用 EMR 4.x。我建议你用 AMI 3.8.0 试试。如果您仍然遇到问题,请告诉我们。

【讨论】:

    【解决方案2】:

    我有一个问题:您是从 Web 控制台运行管道还是有程序? 我问的原因,请检查所有字段是否正确填写。可能是你错过了区域,它找不到带有空参数的方法签名,应该是String (ex. eu-west-1).

    来自https://github.com/awslabs/emr-dynamodb-connector/blob/master/emr-dynamodb-tools/src/main/java/org/apache/hadoop/dynamodb/tools/DynamoDBExport.java,您可以追踪您的代码流。但是请记住,此类可能已过时,因此行可能不匹配。但它让您大致了解那里会发生什么。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-30
      相关资源
      最近更新 更多