Query Optimization in DBMS
The query optimizer (also known as the optimizer) is database software that identifies the most efficient way (like by reducing time) for a SQL statement to access data
Introduction to Query Optimization in DBMS
The process of selecting an efficient execution plan for processing a query is known as query optimization.
Following query parsing which is a process by which this decision making is done that for a given query, calculating how many different ways there are in which the query can run, then the parsed query is delivered to the query optimizer, which generates various execution plans to analyze the parsed query and select the plan with the lowest estimated cost. The catalog manager assists the optimizer in selecting the optimum plan to perform the query by generating the cost of each plan.
Query optimization is used to access and modify the database in the most efficient way possible. It is the art of obtaining necessary information in a predictable, reliable, and timely manner. Query optimization is formally described as the process of transforming a query into an equivalent form that may be evaluated more efficiently. The goal of query optimization is to find an execution plan that reduces the time required to process a query. We must complete two major tasks to attain this optimization target.
The first is to determine the optimal plan to access the database, and the second is to reduce the time required to execute the query plan.
Purpose of the Query Optimizer in DBMS
The optimizer tries to come up with the best execution plan possible for a SQL statement.
Among all the candidate plans reviewed, the optimizer chooses the plan with the lowest cost. The optimizer computes costs based on available facts. The cost computation takes into account query execution factors such as I/O, CPU, and communication for a certain query in a given context.
For example, there is a query that requests information about students who are in leadership roles, such as being a class representative. If the optimizer statistics show that 50% of students are in positions of leadership, the optimizer may decide that a full table search is the most efficient. However, if data show that just a small number of students are in positions of leadership, reading an index followed by table access by row id may be more efficient than a full table scan.
Because the database has so many internal statistics and tools at its disposal, the optimizer is frequently in a better position than the user to decide the best way to execute a statement. As a result, the optimizer is used by all SQL statements.
The optimizer is made up of three parts: the transformer, the estimator, and the plan generator. The figure below depicts those components.
- Query Transformer The query transformer determines whether it is advantageous to rewrite the original SQL statement into a semantically equivalent SQL statement at a lower cost for some statements.
When a plausible alternative exists, the database compares the costs of each alternative and chooses the one with the lowest cost. The query transformer shown in the query below can be taken as an example of how query optimization is done by transforming an OR-based input query into a UNION ALL-based output query.
The given query is transformed using query transformer
The estimator is the optimizer component that calculates the total cost of a given execution plan.
To determine the cost, the estimator employs three different methods:
- Selectivity: The query picks a percentage of the rows in the row set, with 0 indicating no rows and 1 indicating all rows. Selectivity is determined by a query predicate, such as WHERE the last name LIKE X%, or by a mix of predicates. As the selectivity value approaches zero, a predicate gets more selective, and as the value nears one, it becomes less selective (or more unselective).
For example, The row set can be a base table, a view, or the result of a join. The selectivity is tied to a query predicate, such as last_name = 'Prakash', or a combination of predicates, such as last_name = 'Prakash' AND job_id = 'SDE'.
- Cardinality: The cardinality of an execution plan is the number of rows returned by each action. This input is shared by all cost functions and is essential for determining the best strategy. Cardinality in DBMS can be calculated using DBMS STATS table statistics or after taking into account the impact of predicates (filter, join, and so on), DISTINCT or GROUP BY operations, and so on. In an execution plan, the Rows column displays the estimated cardinality.
For example, if the optimizer estimates that a full table scan will yield 100 rows, then the cardinality estimate for this operation is 100. The cardinality estimate appears in the execution plan's Rows column.
- Cost: This metric represents the number of units of labor or resources used. The query optimizer uses disc I/O, CPU utilization, and memory usage as units of effort. For example, if the plan for query A has a lower cost than the plan for query B, then the following outcomes are possible: A executes faster than B, A executes slower than B or A executes in the same amount of time as B.
- Plan Generator
The plan generator investigates multiple plans for a query block by experimenting with various access paths, join methods, and join orders.
Because of the different combinations that the database can utilize to generate the same outcome, many plans are available. The plan with the lowest cost is chosen by the optimizer.
Automatic Tuning Optimizer
Depending on how it is invoked, the optimizer performs different actions.
The database offers the following optimization types:
- Normal optimization The optimizer parses the SQL and produces an execution plan. For most SQL statements, the usual mode gives a reasonable plan. The optimizer when operating under normal mode it has stringent time limits, usually a fraction of a second, during which it must identify an optimal plan.
- SQL Tuning Advisor optimization The optimizer is known as Automatic Tuning Optimizer when SQL Tuning Advisor invokes it by taking one or more SQL statements as an input. In this situation, the optimizer conducts further analysis to improve the plan generated in regular mode. The optimizer produces a set of activities, along with their reasoning and predicted reward, to produce a considerably better plan.
Methods of Query Optimization in DBMS
There are two methods of query optimization. They are as follows.
Cost-Based Query Optimization in DBMS
Query optimization is the process of selecting the most efficient way to execute a SQL statement. Because SQL is a nonprocedural language, the optimizer can merge, restructure, and process data in any sequence.
The Optimizer allocates a cost in numerical form for each step of a feasible plan for a given query and environment, and then discovers these values together to get a cost estimate for the plan or possible strategy. The Optimizer aims to find the plan with the lowest cost estimate after evaluating the costs of all feasible plans. As a result, the Optimizer is sometimes known as the Cost-Based Optimizer.
- Execution Plans:
An execution plan specifies the best way to execute a SQL statement.
The plan describes the steps taken by Oracle Database to execute a SQL statement. Each step physically retrieves or prepares rows of data from the database for the statement's user.
An execution plan shows the total cost of the plan, which is stated on line 0, as well as the cost of each individual operation. A cost is an internal unit that appears solely in the execution plan to allow for plan comparisons. As a result, the cost value cannot be fine-tuned or adjusted.
- Query Blocks The optimizer receives a parsed representation of a SQL statement as input. Each SELECT block in the original SQL statement is internally represented by a query block. A query block can be a statement at the top level, a subquery, or an unmerged view. Let’s take an example where the SQL statement that follows is made up of two query sections. The inner query block is the subquery in parentheses. The remainder of the outer query block of the SQL statement obtains the names of employees in the departments whose IDs were supplied by the subquery. The query form specifies how query blocks are connected.
- Query Sub Plans
The optimizer creates a query sub-plan for each query block.
From the bottom up, the database optimizes query blocks separately. As a result, the database optimizes the innermost query block first, generating a sub-plan for it, before generating the outer query block, which represents the full query.
The number of query block plans is proportional to the number of items in the FROM clause. As the number of objects rises, this number climbs exponentially. The possibilities for a join of five tables, for example, are far higher than those for a connection of two tables.
- Analogy for the Optimizer
An online trip counselor is one analogy for the optimizer.
A biker wishes to find the most efficient bicycle path from point A to point B. A query is analogous to the phrase "I need the quickest route from point A to point B" or "I need the quickest route from point A to point B via point C". To choose the most efficient route, the trip advisor employs an internal algorithm that takes into account factors such as speed and difficulty. The biker can sway the trip advisor's judgment by saying things like "I want to arrive as quickly as possible" or "I want the simplest route possible.”
In this example, an execution plan is a possible path generated by the travel advisor. Internally, the advisor may divide the overall route into multiple subroutes (sub plans) and compute the efficiency of each subroute separately. For example, the trip advisor may estimate one subroute to take 15 minutes and be of medium difficulty, another subroute to take 22 minutes and be of low difficulty, and so on.
Based on the user-specified goals and accessible facts about roads and traffic conditions, the advisor selects the most efficient (lowest cost) overall route. The better the guidance, the more accurate the statistics. For example, if the advisor is not kept up to date on traffic delays, road closures, and poor road conditions, the proposed route may prove inefficient (high cost).
Adaptive Query Optimization in DBMS
Adaptive query optimization allows the optimizer to make run-time changes to execution plans and uncover new information that can lead to improved statistics.
When existing facts are insufficient to produce an ideal strategy, adaptive optimization comes in handy. The image below depicts the feature set for adaptive query optimization.
Adaptive Query Plans The optimizer can defer the final plan decision for a statement with an adaptive plan until execution time.
Purpose of Adaptive Query Plans: The optimizer's ability to alter a plan based on information gained during execution can significantly increase query performance
Because the optimizer occasionally chooses an inferior default plan due to a cardinality misestimate, adaptive plans are important. The capacity to modify the plan during execution based on actual execution statistics leads to a more optimal end plan. The optimizer uses the final plan for further executions after selecting it, ensuring that the poor plan is not reused.
- How Adaptive Query Plans Work
An adaptive plan is made up of several predefined sub plans and an optimizer statistics collector.
A sub-plan is a section of a plan that the optimizer can use as an alternative during execution. A nested loops join, for example, might be converted to a hash join during execution. An optimizer statistics collector is a row source that is added at crucial points in a plan to collect run-time statistics. These statistics assist the optimizer in making a final choice amongst numerous sub plans.
During statement execution, the statistics collector collects execution information and buffers some rows received by the sub-plan. The optimizer selects a sub-plan based on the information collected by the collector. At this point, the collector stops collecting statistics and buffering rows, and permits rows to pass through instead. On subsequent executions of the child cursor, the optimizer continues to use the same plan unless the plan ages out of the cache, or a different optimizer feature (for example, adaptive cursor sharing or statistics feedback) invalidates the plan.
- Adaptive Query Plans: Parallel Distribution Methods
Parallel execution typically necessitates data redistribution in order to conduct operations like as parallel sorts, aggregations, and joins.
Oracle Database supports a wide range of data dissemination mechanisms. The approach is selected by the database based on the number of rows to be distributed and the number of concurrent server processes involved in the operation. Consider the following potential scenarios:
- Few rows are distributed by many concurrent server processes. The database has the option of using the broadcast distribution method. Each row in the result set is received by each simultaneous server process in this situation.
- Few parallel server processes disseminate a large number of rows. If a data skew is found during the data redistribution, the statement's performance may suffer. To ensure that each parallel server process receives an equal number of rows, the database is more likely to use a hash distribution.
All these methods need not be always true. It also depends on the table size, column size, type of selection, projection, join sort, constraints, indexes, statistics etc. The above optimization describes the best way of optimizing the queries.
- Query Optimization in DBMS is the process of selecting the most efficient way to execute a SQL statement. Because SQL is a nonprocedural language, the optimizer can merge, restructure, and process data in any sequence.
- Query Optimization in DBMS has a significant application in database designing.
- There are two types of query optimization in DBMS: Cost-Based Optimization and Adaptive Query Optimization.
- We also covered the purpose of Query Optimization in DBMS and techniques in database designing as it reduces the system resources required to fulfill a query, and ultimately provides the user with the correct result set faster and cost-efficiently.