【发布时间】:2012-08-10 17:01:18
【问题描述】:
我将 DynamoDB 表导出到 s3 作为备份方式(通过 EMR)。导出时,我将数据存储为 lzo 压缩文件。我的配置单元查询如下,但基本上我遵循了http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html
上的“使用数据压缩将 Amazon DynamoDB 表导出到 Amazon S3 存储桶”我现在想做相反的事情 - 将我的 LZO 文件放回配置单元表中。你怎么做到这一点?我期待看到一些 hive configuration property 输入,但没有。我用谷歌搜索并找到了一些提示,但没有确定的,也没有任何工作。
s3 中的文件格式为:s3://[mybucket]/backup/year=2012/month=08/day=01/000000.lzo
这是我执行导出的 HQL:
SET dynamodb.throughput.read.percent=1.0;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;
CREATE EXTERNAL TABLE hiveSBackup (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "${DYNAMOTABLENAME}",
"dynamodb.column.mapping" = "id:id,periodStart:periodStart,allotted:allotted,remaining:remaining,created:created,seconds:seconds,served:served,modified:modified");
CREATE EXTERNAL TABLE s3_export (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://<mybucket>/backup';
INSERT OVERWRITE TABLE s3_export
PARTITION (year="${PARTITIONYEAR}", month="${PARTITIONMONTH}", day="${PARTITIONDAY}")
SELECT * from hiveSBackup;
任何想法如何从 s3 中获取它,解压缩并放入 hive 表??
【问题讨论】:
标签: amazon-web-services hive elastic-map-reduce emr lzo