calcite 启发式优化器(HepPlanner)原理与自定义优化规则实现

HepPlanner

HepPlanner是基于RBO 模型的一种优化器,单纯基于规则,对关系代数进行优化

思考🤔

AST 转换为RelNode,其实是一个树形结构,再由RelNode 生成执行计划,给人的第一感觉,如果要加速查询的话,也就是去优化这个树结构的执行顺序,很直接就会想到在树中根据规则去优化,这里的规则要表明,目标要优化哪种结构的子树。以下图2为例,对于两个连续的Filter 所表示的子树可以进行优化,合并为一个。

流程

在这里插入图片描述

规则

规则定义

以FilterMergeRule 规则为例,此规则是把多个Filter 进行合并

在这里插入图片描述

定义匹配

从代码中能看出来匹配规则是filterClass 的输入是filterClass,相当于连续两个过滤条件

实现转换方法

转换做法,把底下一个Filter的输入 压入栈中,再把两个Filter一起压入栈中

@Override public void onMatch(RelOptRuleCall call) {
    final Filter topFilter = call.rel(0);
    final Filter bottomFilter = call.rel(1);

    //示例如何重写计划
    final RelBuilder relBuilder = call.builder();
    relBuilder.push(bottomFilter.getInput())
        .filter(bottomFilter.getCondition(), topFilter.getCondition());

    call.transformTo(relBuilder.build());
  }

  /** Rule configuration. */
  @Value.Immutable
  public interface Config extends RelRule.Config {
    Config DEFAULT = ImmutableFilterMergeRule.Config.of()
        .withOperandFor(Filter.class);

    @Override default FilterMergeRule toRule() {
      return new FilterMergeRule(this);
    }

    /** Defines an operand tree for the given classes. */
    default Config withOperandFor(Class<? extends Filter> filterClass) {
      //匹配规则
      return withOperandSupplier(b0 ->
          b0.operand(filterClass).oneInput(b1 ->
              b1.operand(filterClass).anyInputs()))
          .as(Config.class);
    }
  }

核心代码分析

生成有向无环图

 private HepRelVertex addRelToGraph(
      RelNode rel) {
    // Check if a transformation already produced a reference
    // to an existing vertex.
    if (graph.vertexSet().contains(rel)) {
      return (HepRelVertex) rel;
    }

    // Recursively add children, replacing this rel's inputs
    // with corresponding child vertices.
    final List<RelNode> inputs = rel.getInputs();
    final List<RelNode> newInputs = new ArrayList<>();
    for (RelNode input1 : inputs) {
      HepRelVertex childVertex = addRelToGraph(input1);
      newInputs.add(childVertex);
    }

    // 后面省略
  }

findBestExp

 private void applyRules(HepProgram.State programState,
      Collection<RelOptRule> rules, boolean forceConversions) {
    final HepInstruction.EndGroup.State group = programState.group;
    if (group != null) {
      checkArgument(group.collecting);
      Set<RelOptRule> ruleSet = requireNonNull(group.ruleSet, "group.ruleSet");
      ruleSet.addAll(rules);
      return;
    }

    LOGGER.trace("Applying rule set {}", rules);

    final boolean fullRestartAfterTransformation =
        programState.matchOrder != HepMatchOrder.ARBITRARY
            && programState.matchOrder != HepMatchOrder.DEPTH_FIRST;

    int nMatches = 0;

    boolean fixedPoint;
    do {
    // 遍历每一个图结点
      Iterator<HepRelVertex> iter =
          getGraphIterator(programState, requireNonNull(root, "root"));
      fixedPoint = true;
      while (iter.hasNext()) {
        HepRelVertex vertex = iter.next();
        for (RelOptRule rule : rules) {
             // 遍历每一个规则
          HepRelVertex newVertex =
              applyRule(rule, vertex, forceConversions);
          if (newVertex == null || newVertex == vertex) {
            continue;
          }
          ++nMatches;
          if (nMatches >= programState.matchLimit) {
            return;
          }
          if (fullRestartAfterTransformation) {
            iter = getGraphIterator(programState, requireNonNull(root, "root"));
          } else {
            // To the extent possible, pick up where we left
            // off; have to create a new iterator because old
            // one was invalidated by transformation.
            iter = getGraphIterator(programState, newVertex);
            if (programState.matchOrder == HepMatchOrder.DEPTH_FIRST) {
              nMatches =
                  depthFirstApply(programState, iter, rules, forceConversions, nMatches);
              if (nMatches >= programState.matchLimit) {
                return;
              }
            }
            // Remember to go around again since we're
            // skipping some stuff.
            fixedPoint = false;
          }
          break;
        }
      }
    } while (!fixedPoint);
  }

应用规则

 private @Nullable HepRelVertex applyRule(
      RelOptRule rule,
      HepRelVertex vertex,
      boolean forceConversions) {
    if (!graph.vertexSet().contains(vertex)) {
      return null;
    }
    RelTrait parentTrait = null;
    List<RelNode> parents = null;
    if (rule instanceof ConverterRule) {
      // Guaranteed converter rules require special casing to make sure
      // they only fire where actually needed, otherwise they tend to
      // fire to infinity and beyond.
      ConverterRule converterRule = (ConverterRule) rule;
      if (converterRule.isGuaranteed() || !forceConversions) {
        if (!doesConverterApply(converterRule, vertex)) {
          return null;
        }
        parentTrait = converterRule.getOutTrait();
      }
    } else if (rule instanceof CommonRelSubExprRule) {
      // Only fire CommonRelSubExprRules if the vertex is a common
      // subexpression.
      List<HepRelVertex> parentVertices = getVertexParents(vertex);
      if (parentVertices.size() < 2) {
        return null;
      }
      parents = new ArrayList<>();
      for (HepRelVertex pVertex : parentVertices) {
        parents.add(pVertex.getCurrentRel());
      }
    }

    final List<RelNode> bindings = new ArrayList<>();
    final Map<RelNode, List<RelNode>> nodeChildren = new HashMap<>();
    boolean match =
        matchOperands(
            rule.getOperand(),
            vertex.getCurrentRel(),
            bindings,
            nodeChildren);

    if (!match) {
      return null;
    }

    HepRuleCall call =
        new HepRuleCall(
            this,
            rule.getOperand(),
            bindings.toArray(new RelNode[0]),
            nodeChildren,
            parents);

    // Allow the rule to apply its own side-conditions.
    if (!rule.matches(call)) {
      return null;
    }
    //前面主要判断有没有规则能匹配上,根据规则里定义的节点关系

    //应用规则进行转换
    fireRule(call);

    if (!call.getResults().isEmpty()) {
      return applyTransformationResults(
          vertex,
          call,
          parentTrait);
    }

    return null;
  }

示例

前面是通过一种通用的方式来进行规则转换,下面给出了一种直接转换关系的示例(FilterMergeRule规则),可以加深对整个流程的理解。

示例1(FilterMergeRule 规则)

代码直接转换

        RelBuilder builder = RelBuilder.create(config().build());
        RelNode root = builder.scan("test_user")
                .filter(builder.equals(builder.field(0), builder.literal(10)))
                .filter(builder.equals(builder.field(1), builder.literal("ef")))
                .build();

        System.out.println("优化前\n" + RelOptUtil.toString(root));


        /**
         * 通过自己的代码改写逻辑计划
         */

        Filter topFilter= (Filter) root;
        Filter bottomFilter=(Filter) root.getInput(0);

        RelNode build = builder.push(bottomFilter.getInput())
                .filter(bottomFilter.getCondition(), topFilter.getCondition()).build();
        System.out.println("优化后\n" + RelOptUtil.toString(build));
输出
优化前
LogicalFilter(condition=[=($1, 'ef')])
  LogicalFilter(condition=[=($0, 10)])
    LogicalTableScan(table=[[test_user]])

优化后
LogicalFilter(condition=[AND(=($0, 10), =($1, 'ef'))])
  LogicalTableScan(table=[[test_user]])

示例2(FILTER_INTO_JOIN规则)

这是一个经典的谓词下推的例子,把后面的where 条件,下推到查询数据层面,用图可以表示为
在这里插入图片描述

代码直接转换
   RelBuilder builder = RelBuilder.create(config().build());
        RelNode root = builder.scan("test_user")
                .scan("test_address").join(JoinRelType.INNER, builder.field("id"))
                .filter(builder.equals(builder.field("id"), builder.literal(10)))
                .build();


        System.out.println("优化前\n" + RelOptUtil.toString(root));
        LogicalFilter filter = (LogicalFilter) root;
        LogicalJoin join = (LogicalJoin) root.getInput(0);


        /**
         * 直接根据规则进行转换
         */
        RelNode build = builder.push(join.getLeft())
                .filter(filter.getCondition())
                .push(join.getRight())
                .join(join.getJoinType(), join.getCondition())
                .build();
        System.out.println("优化后\n" + RelOptUtil.toString(build));
输出
优化前
LogicalFilter(condition=[=($0, 10)])
  LogicalJoin(condition=[$0], joinType=[inner])
    LogicalTableScan(table=[[test_user]])
    LogicalTableScan(table=[[test_address]])

优化后
LogicalJoin(condition=[$0], joinType=[inner])
  LogicalFilter(condition=[=($0, 10)])
    LogicalTableScan(table=[[test_user]])
  LogicalTableScan(table=[[test_address]])

总结

HepPlanner 模型直接基于规则来对关系代数进行优化,没有考虑到实现的数据量对计划的影响,而且也容易受规则顺序的影响,实际最后的计划可能不是最优的。后面是介绍基于成本CBO的优化器VolcanoPlanner