停止 hive 将临时文件写入 s3答案

【问题标题】：Stopping hive from writing temp files to s3停止 hive 将临时文件写入 s3
【发布时间】：2014-01-06 21:06:49
【问题描述】：

在执行 INSERT OVERWRITE TABLE 查询时，如何阻止 Hive 将临时文件写入 s3。

我在 hive-default.xml 中找到了一个属性

<property>
    <name>hive.exec.skips3scratch</name>
    <value>true</value>
    <description>Do not write temp files to S3 scratch space. This will
        increase the performance by avoiding multiple writes in S3, but can
        corrupt the table or partition being written to, esp. if the job
        fails.
    </description>
</property>

我在 hive-site.xml 中设置了这个，但它似乎仍然将临时文件写入 s3。

我有什么遗漏的吗？

【问题讨论】：

我发现了这个 community.cloudera.com/t5/Batch-SQL-Apache-Hive/… 那个蜂巢属性似乎是 Hive 的亚马逊风味独有的。

标签： hadoop amazon-s3 hive cloudera

【解决方案1】：

在阅读 cloudera 论坛页面后，这是一种可能的解决方法：

对于在 S3 中使用数据定义的 Hive 表，为了执行 'INSERT OVERWRITE TABLE ...'，使用 'like' 创建一个临时本地表，将数据写入本地 hdfs，然后使用 distcp 移动数据到 s3。

注意：要记住的事情。 Hive 的 EMR 版本已经过修改，可以与 S3 很好地配合使用。 Apache Hive 将从 S3 读取数据，但在写入 S3 时存在问题（因为它尝试将临时文件写入 s3 并且在读取它们时存在问题）。上述方法是解决此问题的一种方法。

来源：http://community.cloudera.com/t5/Batch-SQL-Apache-Hive/hive-s3-andhive-exec-skips3-scratch/td-p/641

【讨论】：