在 Hive 中如何实现排序（Order by）？

【问题标题】：How does sorting(Order by) be implemented in Hive?在 Hive 中如何实现排序（Order by）？
【发布时间】：2012-02-28 04:49:19
【问题描述】：

我们知道 hive 在排序作业开始之前不会进行采样。它只是利用 MapReduce 的排序机制并在 reduce 端执行合并排序，并且只使用一个 reduce。因为 reduce 收集 mapper 输出的所有数据这种情况，假设一台运行reduce的机器只有100GB的磁盘，如果数据太大而无法放入磁盘怎么办？

【问题讨论】：

标签： sorting hadoop sql-order-by mapreduce hive

【解决方案1】：

Hive的并行排序机制还在开发中，见here。

设计良好的数据仓库或数据库应用程序将避免这种全局排序。如果需要，请尝试使用Pig 或 Terasort(http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html)

【讨论】：