在联合之前或之后加入表格？答案

【问题标题】：Joining tables before or after union all?在联合之前或之后加入表格？
【发布时间】：2017-06-18 10:34:44
【问题描述】：

我有 3 张桌子。 2 个相似。

table1/table2
col1 string
col2 string
col3 integer

table3
51 columns. strings, ints, doubles, dates

我很好奇哪个会更快。

with s1 as(
  Select *
  from table1
  union all
  Select *
  from table2
)
select *
from s1
inner join table3 t3
on s1.col1 = t3.col4

或

with s1 as(
  Select *
  from table1 t1
  inner join table3
  on t1.col1 = t3.col4
),s2 as(
  Select *
  from table2 t2
  inner join table3
  on t2.col1 = t3.col4
)
Select *
from s1
union all
Select *
from s2

表没有分区或索引。我想知道这对两者都有什么作用
蜂巢和甲骨文。

编辑 02.02.2017 我试图在蜂巢中检查它。几乎同时开始。

union before join
Time taken: 539.593 seconds

jbu
Time taken: 603.071 seconds

不幸的是，我决定在几个小时后检查结果

jbu
Time taken: 308.205 seconds

结果根据集群的繁忙程度而有所不同（（

【问题讨论】：

我认为最好在 dba.stackexchange 上问这个问题
简单的答案是测试它——确保在测试之间清除缓冲区缓存。当您说“未编入索引”时，您的意思是什至没有支持主键的索引吗？（假设你有一个 PK - 也就是说）。
@Brite 我无法在 orace 中进行测试。因为数据在hdfs中。 hive 中没有 PK（让我们假设表 1/2 在 oracle 中也没有 PK。我明天将在 hive 中测试它，但那是我发现 =P 之前的一整天

标签： sql oracle join hive union

【解决方案1】：

绝对第一个更快。 with s1 as( Select * from table1 union all Select * from table2 ) select * from s1 inner join table3 t3 on s1.col1 = t3.col4

【讨论】：

您愿意解释一下原因吗？就目前而言，这看起来像是涂鸦，而不是答案。

【解决方案2】：

我唯一能看到的是第二个查询扫描表 2 两次。但是没有任何执行计划信息，这只是一个猜测。正如其他人所说，为什么不测试并让我们知道；或许可以分享一下执行计划！

【讨论】：