Skip to main content

Query Planner Internals

This page is the internal companion to the user-facing Query Planning and Explaining Queries And Commands docs. It describes how CamusDB turns SQL text into a physical plan and then executes that plan against the KV layer.

Mental Model

CamusDB's planner sits between the SQL layer and the ordered transactional KV store provided by Kahuna. Its job is to turn a declarative SQL query into a concrete execution plan:

  • Which scan to use.
  • Which predicates can be absorbed into scan bounds.
  • Whether a sort can be skipped.
  • How joins should probe the right side.
  • Where aggregation, HAVING, projection, DISTINCT, and limit stages belong.

CamusDB is still primarily heuristic-driven. A small cost model exists, but it currently annotates plans and influences only a narrow scan-choice decision.

Pipeline

Every SELECT follows this high-level pipeline:

  1. Parse SQL text into NodeAst.
  2. Build a typed logical query model.
  3. Bind names, sources, and aliases against the catalog.
  4. Produce a physical plan tree.
  5. Execute the plan through either the single-table executor or the join path.

The important data structures are:

StageMain typePurpose
ParseNodeAstRaw syntax tree.
Logical modelSelectQueryStructured representation of source, projections, filters, grouping, ordering, limit, offset, and distinct.
Bound modelBoundSelectQueryLogical query plus resolved tables, aliases, and name rules.
Physical planQueryPlan / PhysicalPlanNodeChosen operator tree plus flattened step list for the legacy single-table path.
ResultsQueryResultRowOutput rows emitted to the caller.

Logical And Physical Plans

The logical query model captures what the query asks for. The physical plan captures how CamusDB will obtain it.

Examples:

  • Logical: "rows from robots where year = 2024."
  • Physical: "range scan year_idx from 2024 to the next key boundary."

CamusDB keeps expressions such as predicates and projection expressions as AST subtrees deep into execution. The structured models mainly describe query shape and source relationships.

Two Execution Paths

CamusDB currently has two execution paths:

PathUsed whenMain executor
Single-table linear pathExactly one sourceQueryExecutor
Tree-based multi-source pathJoins or derived tablesQueryJoinExecutor

Both paths are intended to agree on user-visible semantics. The multi-source path uses a tree recursively and then reuses a shared post-scan pipeline for aggregation, sorting, projection, DISTINCT, and limit stages.

Parse And Build

The parser accepts:

  • SELECT [DISTINCT]
  • WHERE
  • GROUP BY
  • HAVING
  • ORDER BY
  • LIMIT / OFFSET
  • JOIN and comma joins
  • Derived tables
  • Scalar / IN / NOT IN / EXISTS subqueries
  • IN (...) and NOT IN (...) value-list membership
  • EXPLAIN (LOGICAL|PHYSICAL|ANALYZE)

SelectQueryCreator turns the parsed tree into a SelectQuery record with:

  • Source
  • Projections
  • Where
  • GroupBy
  • Having
  • OrderBy
  • Limit
  • Offset
  • IsDistinct

Binding

QueryBinder resolves source names, opens table descriptors, detects alias collisions, and builds the row-name resolver that determines whether a column reference is valid, ambiguous, or must be qualified.

Binding also validates:

  • Projection references.
  • GROUP BY / projection consistency.
  • ORDER BY references.
  • Join ON predicates.

EXISTS support is also prepared here through the subquery registry used by execution.

Single-Table Planning

The single-table planner runs in broad phases:

1. Scan selection

It analyzes the predicate and tries to choose:

  • Full table scan.
  • Forced index scan.
  • Unique index point lookup.
  • Index range scan.
  • Index IN-list scan when repeated probes beat a wider scan.

PredicateAnalyzer splits predicates into:

  • Indexable comparisons.
  • Column-to-column comparisons.
  • Residual conjuncts.

IndexScanSelector then scores usable indexes using heuristic rules such as:

  • Unique full equality beats everything else.
  • Non-unique equality and equality-prefix matches are strong candidates.
  • Equality prefix plus next-column range can drive composite range scans.
  • Matching ORDER BY prefixes can win even without a filtering predicate.
  • Indexed IN (...) lists can compete with range scans and full scans.

2. Filter absorption

Once scan bounds are chosen, comparisons already implied by the scan are removed from the runtime filter. The remaining predicate becomes the residual execution filter.

3. Sort elision

If the chosen scan already produces rows in the requested order, the planner sets OutputOrdering and skips the explicit SortNode.

3b. DISTINCT strategy

For SELECT DISTINCT, the planner decides between:

  • DistinctNode in streaming mode when the projected columns are simple identifiers, all NOT NULL, and covered by scan ordering.
  • DistinctNode in hash mode when streaming is unsafe or impossible.

Streaming distinct can also satisfy matching ORDER BY without a separate SortNode.

4. Limit pushdown

If the query shape is simple enough, the planner pushes row-count limits into the scan so the KV read can stop early.

5. Operator chain construction

The planner builds a leaf-to-root operator chain. Typical patterns include:

Query shapeOperator chain
Plain selectScan -> [Filter] -> [Sort] -> [Limit] -> [Aggregate] -> [Having] -> [Project]
Grouped queryScan -> [Filter] -> Aggregate -> [Having] -> [Sort] -> [Project] -> [Limit]
Distinct queryScan -> [Filter] -> [Aggregate] -> [Having] -> [Project] -> Distinct -> [Sort] -> [Limit]

Join Planning

JoinQueryPlanner handles multi-source queries.

Its work includes:

  • Optional join-order reordering.
  • Predicate pushdown for single-source predicates.
  • Separation of post-join predicates from scan-local predicates.
  • Building either NestedLoopJoinNode or IndexNestedLoopJoinNode.
  • Building SemiJoinNode variants for eligible indexed IN / NOT IN subqueries.
  • Representing derived tables as DerivedTableScanNode.

If the right-side join key has an index, CamusDB can use indexed join probing instead of scanning the full right side repeatedly.

Physical Plan Nodes

Common node types include:

NodeMeaning
TableScanNodeFull table or forced index scan.
IndexLookupNodeUnique-index point lookup.
IndexRangeScanNodeBounded or unbounded index range.
FilterNodeResidual predicate.
AggregateNodeAggregate/grouping stage.
HavingFilterNodePost-aggregate filter.
SortNodeIn-memory sort.
ProjectNodeProjection and alias shaping.
DistinctNodeDuplicate elimination.
LimitNodeLimit/offset stage.
SemiJoinNodeIndexed IN / NOT IN subquery rewrite.
NestedLoopJoinNodeJoin without indexed right-side probe.
IndexNestedLoopJoinNodeJoin with indexed right-side probe.
DerivedTableScanNodeScan of a derived table source.

QueryPlan also carries a flattened Steps view for the linear executor. The tree and linear list reference the same node instances.

Execution

Single-table path

QueryExecutor walks the flattened step list and chains IAsyncEnumerable<QueryResultRow> operators.

Scan operators read from KvTableStore, decode rows with RowEncoder, apply inline filters, and honor scan row limits.

Other stages include:

  • QuerySorter
  • QueryAggregator
  • QueryFilterer for HAVING
  • QueryProjector
  • QueryDistincter
  • QueryLimiter

Join path

QueryJoinExecutor walks the plan tree recursively:

  • Table scans read a source with any pushed-down filter.
  • Derived tables materialize the inner query once.
  • Nested loop joins merge left and right rows and evaluate ON.
  • Indexed nested loops probe the right-side index per outer row.

The merged result then passes through the shared post-scan pipeline.

EXPLAIN Internals

PlanRenderer is the shared canonical renderer for planner diagnostics.

It supplies:

  • Stable node names.
  • Node detail strings.
  • Depth-first node walking used by SQL EXPLAIN.
  • Optional distributed-planning metadata such as OutputOrdering and decomposability.

ExplainExecutor uses the planned tree to return one row per plan node.

EXPLAIN (ANALYZE) additionally enables runtime stats collection and executes the query to populate counters such as:

  • actual_rows
  • rows_read
  • kv_lookups
  • kv_scan_entries
  • actual_time_ms on the root node

EXPLAIN (ANALYZE) is currently limited to non-join queries.

Statistics And Costing

StatisticsManager keeps lightweight advisory table statistics in Kahuna:

  • Row count per table.
  • Per-index entry counts.
  • Running min/max bounds for indexed columns.

CostEstimator annotates plan nodes with estimated cardinality and cost. Those annotations are exposed through EXPLAIN, and the planner can use them for a small set of decisions, especially choosing a full scan over an unselective index range and comparing IN (...) probe plans against wider scans.

This is still intentionally narrow. CamusDB does not yet use a broad cost-based search across all join orders and operator alternatives.

Optimizations Present Today

The planner already includes several concrete passes:

  • Projection pushdown with RequiredColumns.
  • Sort elision through OutputOrdering.
  • Limit pushdown into scans.
  • Filter absorption from scan bounds.
  • Join predicate pushdown.
  • Heuristic join reordering.
  • Small cost-based veto for low-selectivity index range scans.

Cost Model Status

The cost model is intentionally narrow today.

It does two things:

  1. Annotates plan nodes with estimated cardinality and cost.
  2. Replaces some low-value index range scans with full table scans.

What it does not yet do:

  • Full cost-based join ordering.
  • Broad index-choice enumeration.
  • Mature distributed execution costing.
  • Reliable join estimates across the whole tree.

So the correct mental model is still: heuristics choose most of the plan, and the cost model refines one part of scan selection.

Distributed-Ready Metadata

Even though execution is not yet a full distributed query engine, physical plan nodes carry metadata for future distributed planning:

  • OutputOrdering
  • EstimatedCardinality
  • Cost
  • PartitionLocality
  • CanDecomposeToLocalPlusMerge

This is why the planner can already talk about sort elision, local-vs-merge capability, and plan annotations without requiring a complete distributed query executor yet.

Current Gaps

Important current limitations from the source design:

  • EXPLAIN (ANALYZE) for joins is missing.
  • EXPLAIN (LOGICAL) is mostly cosmetic today.
  • Join costs are still rough.
  • Descending-order exploitation is limited.
  • The planner is not yet a broad cost-based optimizer.

These are the main boundaries for future planner work.

See Query Planning for user-facing capabilities, EXPLAIN for plan inspection, and Architecture for the broader system layout.