[Optimizer] PostgreSQL Optimizer의 Stuctures

PostgreSQL의 planner가 동작하는 3단계

1. Preprocessing

Target list, limit clauses의 단순화 ex) '2+2'를 4로 변경
Normalizing boolean expression 'NOT (NOT a)' 를 'a' 로 변경
Flattening AND/OR

2. Cheapest access path 구하기

Access path와 cost를 저장하기 위해 RelOptInfo 구조체를 생성
- RelOptInfo 구조체는 make_one_rel() 함수에서 생성
  - baserestrictinfo : WHERE clauses 저장
  - indexlist : related indexes 저장
- RelOptInfo는 PlannerInfo 구조체의 simple_rel_array에 저장
All possible access path의 Cost를 estimation하고 RelOptInfo에 access path를 저장
- Path를 생성하고, seq scan 비용을 계산하여 Path에 기록한 뒤 RelOptInfo의 pathlist에 저장
- Related index가 있으면, index access path를 생성하고 index scan path를 저장
- bitmap scan이 가능하면 bitmap scan path를 생성하고 bitmap scan path를 저장
RelOptInfo의 pathlist에서 cheapest access path를 획득
LIMIT, ORDER BY, ARREGISFDD cost를 estimation

3. Plan tree 생성

PlannerInfo

typedef struct PlannerInfo
{
	NodeTag		type;
	Query	   	*parse;			/* the Query being planned */
	PlannerGlobal 	*glob;			/* global info for current planner run */
	Index		query_level;		/* 1 at the outermost Query */
	struct PlannerInfo *parent_root;	/* NULL at outermost Query */

	/*
	 * plan_params contains the expressions that this query level needs to
	 * make available to a lower query level that is currently being planned.
	 * outer_params contains the paramIds of PARAM_EXEC Params that outer
	 * query levels will make available to this query level.
	 */
	List		*plan_params;	/* list of PlannerParamItems, see below */
	Bitmapset  	*outer_params;

	/*
	 * simple_rel_array holds pointers to "base rels" and "other rels" (see
	 * comments for RelOptInfo for more info).  It is indexed by rangetable
	 * index (so entry 0 is always wasted).  Entries can be NULL when an RTE
	 * does not correspond to a base relation, such as a join RTE or an
	 * unreferenced view RTE; or if the RelOptInfo hasn't been made yet.
	 */
	struct RelOptInfo **simple_rel_array;	/* All 1-rel RelOptInfos */
	int		simple_rel_array_size;	/* allocated size of array */

	/*
	 * simple_rte_array is the same length as simple_rel_array and holds
	 * pointers to the associated rangetable entries.  This lets us avoid
	 * rt_fetch(), which can be a bit slow once large inheritance sets have
	 * been expanded.
	 */
	RangeTblEntry **simple_rte_array;	/* rangetable as an array */

	/*
	 * all_baserels is a Relids set of all base relids (but not "other"
	 * relids) in the query; that is, the Relids identifier of the final join
	 * we need to form.  This is computed in make_one_rel, just before we
	 * start making Paths.
	 */
	Relids		all_baserels;

	/*
	 * nullable_baserels is a Relids set of base relids that are nullable by
	 * some outer join in the jointree; these are rels that are potentially
	 * nullable below the WHERE clause, SELECT targetlist, etc.  This is
	 * computed in deconstruct_jointree.
	 */
	Relids		nullable_baserels;

	/*
	 * join_rel_list is a list of all join-relation RelOptInfos we have
	 * considered in this planning run.  For small problems we just scan the
	 * list to do lookups, but when there are many join relations we build a
	 * hash table for faster lookups.  The hash table is present and valid
	 * when join_rel_hash is not NULL.  Note that we still maintain the list
	 * even when using the hash table for lookups; this simplifies life for
	 * GEQO.
	 */
	List		*join_rel_list;	/* list of join-relation RelOptInfos */
	struct HTAB 	*join_rel_hash; /* optional hashtable for join relations */

	/*
	 * When doing a dynamic-programming-style join search, join_rel_level[k]
	 * is a list of all join-relation RelOptInfos of level k, and
	 * join_cur_level is the current level.  New join-relation RelOptInfos are
	 * automatically added to the join_rel_level[join_cur_level] list.
	 * join_rel_level is NULL if not in use.
	 */
	List	**join_rel_level;	/* lists of join-relation RelOptInfos */
	int	join_cur_level; 	/* index of list being extended */
	List	*init_plans;		/* init SubPlans for query */
	List	*cte_plan_ids;		/* per-CTE-item list of subplan IDs */
	List	*multiexpr_params;	/* List of Lists of Params for MULTIEXPR subquery outputs */
	List	*eq_classes;		/* list of active EquivalenceClasses */
	List	*canon_pathkeys; 	/* list of "canonical" PathKeys */
	List	*left_join_clauses;	/* list of RestrictInfos for
					 * mergejoinable outer join clauses w/nonnullable var on left */
	List	*right_join_clauses;	/* list of RestrictInfos for
					 * mergejoinable outer join clauses w/nonnullable var on right */
	List	*full_join_clauses;	/* list of RestrictInfos for mergejoinable full join clauses */
	List	*join_info_list; 	/* list of SpecialJoinInfos */
	List	*append_rel_list;	/* list of AppendRelInfos */
	List	*rowMarks;		/* list of PlanRowMarks */
	List	*placeholder_list;	/* list of PlaceHolderInfos */
	List	*fkey_list;		/* list of ForeignKeyOptInfos */
	List	*query_pathkeys; 	/* desired pathkeys for query_planner() */
	List	*group_pathkeys; 	/* groupClause pathkeys, if any */
	List	*window_pathkeys;	/* pathkeys of bottom window, if any */
	List	*distinct_pathkeys;	/* distinctClause pathkeys, if any */
	List	*sort_pathkeys;		/* sortClause pathkeys, if any */
	List	*initial_rels;		/* RelOptInfos we are now trying to join */

	/* Use fetch_upper_rel() to get any particular upper rel */
	List	*upper_rels[UPPERREL_FINAL + 1]; /* upper-rel RelOptInfos */

	/* Result tlists chosen by grouping_planner for upper-stage processing */
	struct PathTarget *upper_targets[UPPERREL_FINAL + 1];

	/*
	 * grouping_planner passes back its final processed targetlist here, for
	 * use in relabeling the topmost tlist of the finished Plan.
	 */
	List    *processed_tlist;

	/* Fields filled during create_plan() for use in setrefs.c */
	AttrNumber *grouping_map;	/* for GroupingFunc fixup */
	List	   *minmax_aggs;	/* List of MinMaxAggInfos */
	MemoryContext planner_cxt;	/* context holding PlannerInfo */
	double	   total_table_pages;	/* # of pages in all tables of query */
	double	   tuple_fraction; 	/* tuple_fraction passed to query_planner */
	double	   limit_tuples;	/* limit_tuples passed to query_planner */
	bool	   hasInheritedTarget;	/* true if parse->resultRelation is an inheritance child rel */
	bool	   hasJoinRTEs;		/* true if any RTEs are RTE_JOIN kind */
	bool	   hasLateralRTEs; 	/* true if any RTEs are marked LATERAL */
	bool	   hasDeletedRTEs; 	/* true if any RTE was deleted from jointree */
	bool	   hasHavingQual;	/* true if havingQual was non-null */
	bool	   hasPseudoConstantQuals; /* true if any RestrictInfo has pseudoconstant = true */
	bool	   hasRecursion;	/* true if planning a recursive WITH item */

	/* These fields are used only when hasRecursion is true: */
	int	   wt_param_id;	        /* PARAM_EXEC ID for the work table */
	struct Path *non_recursive_path;/* a path for non-recursive term */

	/* These fields are workspace for createplan.c */
	Relids	   curOuterRels;	/* outer rels above current node */
	List	   *curOuterParams; 	/* not-yet-assigned NestLoopParams */

	/* optional private data for join_search_hook, e.g., GEQO */
	void	   *join_search_private;
} PlannerInfo;

Path

Path 타입은 추가 정보가 필요하지 않은 단순한 플랜 (ex> sequential scan path)에 이용되는 타입이다.

pathtype
- NodeTag of the Plan node we could build from this Path
- tag identifying scan/join method
param_info
- "param_info", if not NULL, links to a ParamPathInfo that identifies outer relation(s) that provide parameter values to each scan of this path.
- That means this path can only be joined to those rels by means of nestloop joins with this path on the inside. Also note that a parameterized path is responsible for testing all "movable" joinclauses involving this rel and the specified outer rel(s).
pathkeys
- List of PathKey nodes
- Path의 Output rows의 정렬 순서를 나타냄

typedef struct Path
{
  NodeTag   type;
  NodeTag   pathtype;   /* tag identifying scan/join method */
  RelOptInfo *parent;     /* the relation this path can build */
  ParamPathInfo *param_info;  /* parameterization info, or NULL if none */

/* estimated size/costs for path (see costsize.c for more info) */
  double    rows;     /* estimated number of result tuples */
  Cost    startup_cost; /* cost expended before fetching any tuples */
  Cost    total_cost;   /* total cost (assuming all tuples fetched) */

  List     *pathkeys;   /* sort ordering of path's output */
  /* pathkeys is a List of PathKey nodes; see above */
} Path;

저작자표시 비영리 변경금지 (새창열림)

'Database > PostgreSQL' 카테고리의 다른 글

PostgreSQL 9.3.4 설치 및 초기화 (0)	2023.01.09
[Optimizer] Correlation of PostgreSQL (0)	2021.03.08
PostgreSQL에서 TPC-H 사용하기 (0)	2018.04.10

PostgreSQL의 planner가 동작하는 3단계

1. Preprocessing

2. Cheapest access path 구하기

3. Plan tree 생성

PlannerInfo

Path

'Database > PostgreSQL' 카테고리의 다른 글

티스토리툴바