• Aucun résultat trouvé

Cost-Based Optimization

Dans le document Oracle Essentials Oracle Database 11g (Page 130-133)

To improve the optimization of SQL statements, Oracle introduced the cost-based optimizerin Oracle7. As the name implies, the cost-based optimizer does more than simply look at a set of optimization rules; instead, it selects the execution path that requires the least number of logical I/O operations. This approach avoids the error

Query Optimization | 111 discussed in the previous section. After all, the cost-based optimizer would know which table was bigger and would select the right table to begin the query, regard-less of the syntax of the SQL statement.

Oracle8 and later versions, by default, use the cost-based optimizer to identify the optimal execution plan. And, since Oracle Database 10g, the cost-based optimizer is the only supported optimizer. To properly evaluate the cost of any particular execu-tion plan, the cost-based optimizer uses statistics about the composiexecu-tion of the relevant data structures. These statistics are automatically gathered by default since the Oracle Database 10g release into the Automatic Workload Repository (AWR).

Among the statistics gathered in the AWR are database segment access and usage statistics, time model statistics, system and session statistics, SQL statements that produce the greatest loads, and Active Session History (ASH) statistics.

How statistics are used

The cost-based optimizer finds the optimal execution plan by assigning an optimiza-tion score for each of the potential execuoptimiza-tion plans using its own internal rules and logic along with statistics that reflect the state of the data structures in the database.

These statistics relate to the tables, columns, and indexes involved in the execution plan. The statistics for each type of data structure are listed in Table 4-1.

Figure 4-4. The effect of optimization choices 1 logical I/O

per join

LARGETAB SMALLTAB

LARGETAB

SMALLTAB 1 logical I/O

per join 10 logical I/Os

10,000 logical I/Os

Total 20 logical I/Os

Total

20,000 logical I/Os

112 | Chapter 4: Oracle Data Structures

Oracle Database 10gand more current database releases also collect overall system sta-tistics, including I/O and CPU performance and utilization. These statistics are stored in the data dictionary, described in this chapter’s final section, “Data Dictionary Tables.”

You can see that these statistics can be used individually and in combination to determine the overall cost of the I/O required by an execution plan. The statistics reflect both the size of a table and the amount of unused space within the blocks; this space can, in turn, affect how many I/O operations are needed to retrieve rows. The index statistics reflect not only the depth and breadth of the index tree, but also the uniqueness of the values in the tree, which can affect the ease with which values can be selected using the index.

The accuracy of the cost-based optimizer depends on the accuracy of the statistics it uses, so updating statistics has always been a must.

Formerly, you would have used the SQL statement ANALYZE to com-pute or estimate these statistics. When managing an older release, many database administrators also used a built-in PL/SQL package, DBMS_STATS, that contains a number of procedures that helped automate the process of collecting statistics.

Stale statistics can lead to database performance problems, which is why database statistics gathering has been automated by Oracle. This statis-tics gathering can be quite granular. For example, as of Oracle Database 10g, youcan enable automatic statistics collection for a table, which can be based on whether a table is either stale (which means that more than 10 percent of the objects in the table have changed) or empty.

Table 4-1. Database statistics

Data structure Type of statistics

Table Number of rows

Number of blocks Number of unused blocks

Average available free space per block Number of chained rows

Average row length

Column Number of distinct values per column Second-lowest column value Second-highest column value Column density factor Index Depth of index B*-tree structure

Number of leaf blocks Number of distinct values

Average number of leaf blocks per key Average number of data blocks per key Clustering factor

Query Optimization | 113 The use of statistics makes it possible for the cost-based optimizer to make a much more well-informed choice of the optimal execution plan. For instance, the opti-mizer could be trying to decide between two indexes to use in an execution plan that involves a selection based on a value in either index. The rule-based optimizer might very well rate both indexes equally and resort to the order in which they appear in the WHERE clause to choose an execution plan. The cost-based optimizer, how-ever, knows that one index contains 1,000 entries while the other contains 10,000 entries. It even knows that the index that contains 1,000 values contains only 20 unique values, while the index that contains 10,000 values has 5,000 unique values.

The selectivity offered by the larger index is much greater, so that index will be assigned a better optimization score and used for the query.

In Oracle9i, youhave the option of allowing the cost-based optimizer to use CPU speed as one of the factors in determining the optimal execution plan. An initializa-tion parameter turns this feature on and off. As of Oracle Database 10g, the default cost basis is calculated on the CPU cost plus the I/O cost for a plan.

Even with all the information available to it, the cost-based optimizer did have some noticeable initial flaws. Aside from the fact that it (like all software) occasionally had bugs, the cost-based optimizer used statistics that didn’t provide a complete picture of the data structures. In the previous example, the only thing the statistics tell the opti-mizer about the indexes is the number of distinct values in each index. They don’t reveal anything about the distribution of those values. For instance, the larger index can con-tain 5,000 unique values, but these values can each represent two rows in the associated table, or one index value can represent 5,001 rows while the rest of the index values rep-resent a single row. The selectivity of the index can vary wildly, depending on the value used in the selection criteria of the SQL statement. Fortunately, Oracle 7.3 introduced support for collecting histogram statistics for indexes to address this exact problem. You could create histograms using syntax within the ANALYZE INDEX command when yougathered statistics yourself in Oracle versions prior to Oracle Database 10g. This syntax is described in your Oracle SQL reference documentation.

Dans le document Oracle Essentials Oracle Database 11g (Page 130-133)