原文:presto-join-enumeration

Query improvements

Incorporating join enumeration into Presto means that your queries can automatically run faster without manual adjustments. Such manual adjustments are often not possible at all because they are not expressible in SQL language (e.g: join distribution type selection).

Prerequisite :data source needs to have statistics collected (currently the Hive connector is the only connector that provides statistics). presto cbo statistics

Case 1: Broadcast vs distributed

Query: SELECT * FROM large_table l, small_table s WHERE l.key = s.key

The example below :

  • shows how our algorithm selects join operator type (distributed vs broadcast) based on table sizes and cost model.

Presto join enumeration笔记

Case 2: Join sides reordering

Query: SELECT * FROM small_table s, large_table l WHERE s.key = l.key

This example displays :

  • how the algorithm flips join sides so that the table with smaller memory footprint is kept in distributed join operator.

Presto join enumeration笔记

Case 3: Join tree reordering

Query: SELECT * FROM table_a a, table_b b, table_c c WHERE a.key=b.key AND a.key=c.key

Presto join enumeration笔记

  •  In this case join enumeration algorithm will reorder the entire join tree to avoid large and costly intermediate result.
  •  It also shows that our algorithm can intelligently derive new predicates (“b.key=c.key“) so that new joins can be explored.

Join enumeration

Join enumeration is the process of enumerating and evaluating different join orders with the goal of finding an optimal execution plan. To do this efficiently, Presto join enumerator utilizes dynamic programming and divide-and-conquer technique. Such algorithms split larger problems into smaller ones. For instance, they assume that the best way to join tables (a,b,c) with table d involves finding the best join order for tables (a,b,c) first. This assumption allows to reuse join enumeration result for (a,b,c) in all join enumerations that have (a,b,c) as a subcomponent, e.g:(a,b,c,d) or (a,b,c,e).

Presto join enumeration笔记

Top-down approach

Dynamic programming is usually implemented using a top-down (recursive) or bottom-up (iterative) approach. Presto join enumerator uses the former technique:

  1. In the top-down approach relation sets (i.e. tables) are partitioned into two partitions.
  2. For each partition the best join ordering is found recursively.
  3. Then the partitioning with the lowest total cost is selected as a global result.
  4. Memoization is used in order to reduce the search time.

Search space

Most optimizers (Presto included) skip cross-joins during join enumeration. It’s a good heuristic that drastically reduces the search space. Cross-joins are usually not part of an optimal join ordering and enumerating them greatly slows down query optimization process.

Some optimizers only consider left-deep join trees. This reduces search space, but could lead to suboptimal plans. We have found that the cost of exploring bushy-tree joins for a relatively small number of relations does not bring a significant overhead. Therefore, in order to to find the best plan Presto join enumerator explores both left-deep and bushy tree joins.

Presto join enumeration笔记

补充(TODO 找时间学习)

Commonly used classes of join trees:

  • left-deep tree
  • right-deep tree
  • zigzag tree
  • bushy tree 

Presto algorithm design

Presto join enumeration works in the following stages:

1)

  • First, join nodes that can be reordered are collected into a special multi-join node. 
  • Multi-join node contains aggregated information about reorderable joins. This includes source relations and predicates from all joins. It’s worth noting that for result correctness, not all joins are freely reorderable so are not collected into the multi-join node. We won’t go into those details today.

Presto join enumeration笔记

2a) Then, the best join ordering is found recursively. The procedure is as follows:

  •  For a given relation set RS find all partitionings into two partitions. We only consider partitionings where there is a join predicate between the relations (i.e. no cross joins)..
  • For each partition the best join ordering is then found recursively.

Presto join enumeration笔记

2b) For each partitioning construct a join J between the two partitions. Consider different physical properties of a join (e.g. distributed vs. broadcast). Choose the properties with the lowest cost.

Presto join enumeration笔记

2c) Choose a partitioning with the lowest total cost. Corresponding join J is then memorized along with a total cost and a set of relations that are joined together, e.g: (a,b,c). Every time join enumeration will try to resolve best join ordering for (a,b,c) it can reuse an already computed result.

Presto join enumeration笔记

An interesting feature of our algorithm is that we derive equality predicates for joins using a so called equality inference algorithm. The input of the algorithm are predicates of all reorderable joins. The algorithm can deduce new predicates for relations that didn’t have predicates between them initially. More on that later.

相关文章: