govind

Sunday, November 8, 2009

Dynamic Sampling in Oracle

I started my career with oracle 7.0 and i remember, in oracle7.x/8.x version, optimizer use Rule based optimizer if one or all of the tables does not have any statistics in the join query. In oracle9i R2, dynamic sampling query is introduced and it has ability to sample the table and collect the better statistics dynamically during the hard parse. This sampling takes place only at hard parse time and is used to dynamically generate better statistics for the optimizer to use, hence the name dynamic sampling.

Oracle optimizer use the default statistic values when table does not have any statistics and dynamic sampling query feature is disabled. The default statistics values are well documented in Performance Tuning Guide.

How do we set Dynamic sampling parameter? There are couple of ways, we can set the parameter.

1. We can set this in session level or instance level.
2. Dynamic sampling query hint in query level. (Hint)

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> alter session set OPTIMIZER_DYNAMIC_SAMPLING = 2;

Session altered.

SQL> alter system set OPTIMIZER_DYNAMIC_SAMPLING = 2;

System altered.

SQL> select /*+ dynamic_sampling(emp 0) */ * from scott.emp;

When does dynamic sampling is useful?
1. It is useful when table data is drastically changing and statistic becomes stale very often.
2. Global temporary table is good place to use dynamic sampling feature.

Dynamic sampling Levels? Optimizer Dynamic Sampling has 11 levels between 0 to 10. What are these levels? The answer is pretty straight forward and it is clearly documented in Oracle Performance Tuning Guide. In Oracle9i R2, the default setting for dynamic sampling level is 1. At this setting, the optimizer will tend to use dynamic sampling only in some cases when a table is unanalyzed. This includes the case of global temporary tables without statistics but doesn’t ensure that dynamic sampling will be used against all unanalyzed tables. In Oracle Database 10g Release 1 and above, the default setting is 2. This setting ensures that the optimizer will have a good estimate whenever it optimizes a query that accesses a table with no statistics.

The source of this Article is Asktom. Please read this link and this link has very pretty good information about Dynamic sampling.

Wednesday, November 4, 2009

Restoring old statistics

I see some DBA's, they won't backup the statistics before collect the new statistics. If we don't have backup, and optimizer behaves differently with new statistics, then we are in trouble. If we have old statistic backup, then we can resolve the performance issue by restoring the old statistics. Oracle highly recommends to take the backup of current statistics before we collect new statistics.

Oracle10g has a new feature which will save the statistics before overwriting the new statistics.

There are couple of ways, we can backup the statistics.

1. We can export the statistics in a table and import back when ever we need. I discussed this approach in different thread. This approach is mostly useful when we transfer the statistics from one server to another server. Please refer this thread Post

2. Since Oracle10g, whenever statistics are modified, old version of statistics are saved automatically before overwriting the new statistics. However automatic saving of old statistics will not take place if you use ANALYZE in 10g for statistics collection.

I am going to discuss how the statistics are saved and how do we restore the old statistics in Approach 2.

Where does oracle store the statistics? Oracle DBA_OPTSTAT_OPERATIONS table contains a log history of statistics collected on the database at various level such as table, schema or database. We can restore older statistics anytime by making use of START_TIME and END_TIME values from DBA_OPTSTAT_OPERATIONS table.

How does Oracle maintain the Statistics History? GET_STATS_HISTORY_AVAILABILITY is a Function in DBMS_STATS package that returns oldest timestamp where statistics history is available. Users cannot restore statistics to a time stamp older than the oldest time stamp. Here is the query to find the statistics history starting timestamp.
SQL> select dbms_stats.get_stats_history_availability from dual;

GET_STATS_HISTORY_AVAILABILITY
---------------------------------------------------------------------------
06-OCT-09 09.52.45.351000000 PM -05:00

SQL>

Oracle maintain the history statistics based on the retention value. Oracle Purge the statistics automatically if it exceeds the statistics retention setting. By defualt, oracle purge the statistics if it is older then 31 days. We can also alter this retention period by using DBMS_STATS.ALTER_STATS_HISTORY_RETENTION procedure. PURGE_STATS is procedure in DBMS_STATS package that enables you to manually purge old statistics beyond a time stamp.

SQL> execute dbms_stats.alter_stats_history_retention(20);

PL/SQL procedure successfully completed.

As per the above code, oracle keep 20 days statistics history.

SQL> execute dbms_stats.purge_stats('06-OCT-09 09.52.45.351000000 PM -05:00');

PL/SQL procedure successfully completed.

SQL>

As per the above code, oracle purge the statistics prior to 06-OCT-09.

How do we recover the statistics? Oracle RESTORE_TABLE_STATS procedure restores statistics of a table as of a specified timestamp. It also restores the statistics of associated indexes and columns. Let me take EMP table in scott Schema and restore previous statistics.

Here are the steps to restore the previous timestamp statistics:

Step1: Let us check the past statistics and what date the statistics were collected.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select STATS_UPDATE_TIME from dba_tab_stats_history where table_name='EMP';

STATS_UPDATE_TIME
---------------------------------------------------------------------------
30-OCT-09 06.25.54.809000 PM -04:00
07-NOV-09 07.23.23.504000 AM -05:00
07-NOV-09 07.27.31.709000 AM -05:00
07-NOV-09 07.29.54.324000 AM -05:00

SQL> select last_analyzed from dba_tables where
2 table_name='EMP' and owner='SCOTT';

LAST_ANAL
---------
11-NOV-09

SQL> select sysdate from dual;

SYSDATE
---------
11-NOV-09

SQL>

Step2 : Let us restore the statistics which we collected on 30-OCT-2009.

SQL> execute dbms_stats.restore_table_stats('SCOTT','EMP','30-OCT-09 06.25.54.809000 PM -04:00');

PL/SQL procedure successfully completed.

SQL> select last_analyzed from dba_tables where
2 table_name='EMP' and owner='SCOTT';

LAST_ANAL
---------
30-OCT-09

SQL>

At what circumstances, do we restore the Statistics? We have performance issue in the database and DB was fine yesterday and since yesterday, there is no major change on database. Oracle10g automatic statistic gathering is turned on and last statistics gathering was close to the time that users started complanining about performance. At this circumstances, may be new statistics could be a culprit. We can restore the old statistics and see if we have any performance issue.

Restriction on restoring the statistics:

1. It does not have ability to restore user-defined statistics.

2. Old statistics will not be saved if we use ANALYZE command to gather statistics. Hence, we can recover the statistics.

Here is the input(Help) to write this topic.

Monday, November 2, 2009

Automatic Statistics Gathering in Oracle10g

Oracle10g has one of the automated feature which is called Automatic Statistics gathering. Oracle gathers statistics on all database objects automatically and maintains those statistics in a regularly scheduled maintenance job. When the DB is created, job is automatically created and started to collect statistics on database object with missing or stale statistics. The job GATHER_STATS_JOB calls the procedure dbms_stats.gather_database_stats_job_proc to gather statistics during the maintenance window.

What does GATHER_DATABASE_STATS_JOB_PROC do? Unfortunately, the package body is wrapped, so without access to the original source code, the implementation details are obscured. This procedure collects statistics on database objects when the object has no previously gathered statistics or the existing statistics are stale because the underlying object has been modified significantly (more than 10% of the rows).

This is an internal procedure, but its operates in a very similar fashion to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The primary difference is that the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure prioritizes the database objects that require statistics, so that those objects which most need updated statistics are processed first. This ensures that the most-needed statistics are gathered before the maintenance window closes.

How do we enable this feature? Automatic statistics gathering is enabled by default when a database is created, or when a database is upgraded from an earlier database release. The following condition should be met to enable this feature.

1. GATHER_STATS_JOB should be enabled.
2. statistics_level should be TYPICAL or ALL.

We can enable the job as below when the job is already disabled for any reason.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> connect sys/password@orcl as sysdba
Connected.
SQL> BEGIN
2 DBMS_SCHEDULER.ENABLE('GATHER_STATS_JOB');
3 END;
4 /

PL/SQL procedure successfully completed.

SQL>

What time does automatic statistic collection run? By default this job runs within a maintenance window between 10 P.M to 6 A.M week nights and all day on weekends. But in oracle11g, i believe, the maintenance window is not same as Oracle10g.

Do we need to collect statistics in other then the scheduled window? In few Scenario's, it would be required to gather statistics manually.

1. In some cases, we might need to collect the statistics in specific objects other than this maintenance window. If the table data are changed rapidly and optimizer behaves differently due to data changes, then we need to gather statistics manually. We can not wait till the maintenance window open.

2. If the table is bigger and it is failing in the maintenance window for some reason, then you can lock that table during the maintenance window and unlock the table and gather the statistics manually and lock back again.

3. If you have specific table and you want to gather statistics with different option, then you can lock those tables during the maintenance window and gather statistics manually during day time.

Should we change the parameter values in Automatic statistic collection? Oracle recommends not to change the default values for automatic statistics collection. In case, if you need to gather statistics something differently, then lock the table during the maintenance window and unlock the table and gather statistics manually with specfic parameter and lock the table back again.

We have get_param function to find out the parameter values. Also set_param procedure to change the default values for dbms_stats pacakge parameters.

Friday, October 30, 2009

Compute Oracle Index Statistics

Many times, we forget to gather index statistics after creating the index. Optimizer will not use the index effectively if index does not have statistics.

Prior to Oracle9i, we have to issue two commands for creating index and gathering index statistics.

SQL> create index idx on test(object_name);

Index created.

SQL> exec dbms_stats.gather_index_stats(null, 'IDX');

PL/SQL procedure successfully completed.

SQL>

Starting in Oracle 9i, we have a compute statistics clause. We can both creating index and gathering index statistics in one command.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production

SQL> create table test as select * from user_objects;

Table created.

SQL> create index idx on test(object_name);

Index created.

SQL> select table_name, num_rows, last_analyzed
2 from user_tables
3 where table_name ='TEST'
4 /

TABLE_NAME NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
TEST

SQL> drop index idx;

Index dropped.

SQL> create index idx on test(object_name) compute statistics;

Index created.

SQL> select table_name, num_rows, last_analyzed
2 from user_tables
3 where table_name ='TEST'
4 /

TABLE_NAME NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
TEST 184 31-OCT-09

SQL>

In Oracle10g, we do not need to use compute statistics clause at all. Oracle gather statistics while creating index automatically.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> create index idx on test(object_name);

Index created.

SQL> select num_rows, last_analyzed from user_ind_statistics where index_name ='IDX';

NUM_ROWS LAST_ANAL
---------- ---------
50484 30-OCT-09

SQL>

Statistics Lock in Oracle10g

Oracle10g has one of the useful feature that we can lock the table statistics. When statistics on a table are locked, all the statistics depending on the table, including table statistics, column statistics, histograms and statistics on all dependent indexes, are considered to be locked. No one can gather statistics on a table when it is locked. But ofcourse, we can overwrite the statistics with force option. How does it useful for DBA's on day to day activities? At what circumstances, this feature is useful?

We can use this feature in the following circumstances...

1. There are tables where you want to setup gathering statistics manually. You can stop gathering statistics during the regular schedule by locking the statistics.

2. Some cases, Queries works fine with old statistics. You can avoid gathering statistics at this situation.

3. Some time, tables are bigger and automatic gathering statistics might fail silently. In this scenario, we might need to lock the table and collect the statistics seperately. Refer these links Post1, Post2, Post3

4. Sometime, gathering statistics, creating histograms takes very long time on bigger table and we can avoid such a bigger table while collecting statistics for all the tables in schema or DB level.

5. For some reason, if we want to use any specific parameter to gather statistics on particular table, then we can use this option to lock the statistics and gather statistics in different time.

How do we lock the statistics?

Connected to:Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> execute dbms_stats.lock_table_stats('SCOTT','EMP');

PL/SQL procedure successfully completed.

SQL> execute dbms_stats.lock_schema_stats('SCOTT');

PL/SQL procedure successfully completed.

SQL>

Index Creation on Locked tables : In oracle10g, when you create index, the statistics also will be generated automatically. (Please refer my another Post for gathering statistics while creating index) . When the table is locked, statistics will not be generated while creating the index. We need to use FORCE option to gather the statistics while creating index for locked objects.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> create table test as select * from dba_objects;

Table created.

SQL> exec dbms_stats.lock_table_stats(null, 'TEST');

PL/SQL procedure successfully completed.

SQL> create index idx on test(object_name);

Index created.

SQL> select num_rows, last_analyzed from user_ind_statistics where index_name ='IDX';

NUM_ROWS LAST_ANAL
---------- ---------

SQL> drop index idx;

Index dropped.

SQL> create index idx on test(object_name) compute statistics;
create index idx on test(object_name) compute statistics
*
ERROR at line 1:
ORA-38029: object statistics are locked

SQL> create index idx on test(object_name);

Index created.

SQL> exec dbms_stats.gather_index_stats(null, 'IDX');
BEGIN dbms_stats.gather_index_stats(null, 'IDX'); END;

*
ERROR at line 1:
ORA-20005: object statistics are locked (stattype = ALL)
ORA-06512: at "SYS.DBMS_STATS", line 10640
ORA-06512: at "SYS.DBMS_STATS", line 10664
ORA-06512: at line 1

SQL> exec dbms_stats.gather_index_stats(null, 'IDX',force=>true);

PL/SQL procedure successfully completed.

SQL> select num_rows, last_analyzed from user_ind_statistics where index_name ='IDX';

NUM_ROWS LAST_ANAL
---------- ---------
50484 30-OCT-09

SQL> alter index idx rebuild compute statistics;
alter index idx rebuild compute statistics
*
ERROR at line 1:
ORA-38029: object statistics are locked

SQL> alter index idx rebuild;

Index altered.

SQL> exec dbms_stats.gather_index_stats(null, 'IDX',force=>true);

PL/SQL procedure successfully completed.

SQL>

Monday, October 26, 2009

PLSQL_OPTIMIZE_LEVEL

I came to Austin for interview and i am on my way to Newjersy. I am hanging in the Austin Airport and i have another three hours for my flight departure. I thought, i write about one of the new Oracle10g optimization parameter PLSQL_OPTIMIZE_LEVEL.

This parameter determine the optimization level to compile the PLSQL code. The higher setting, oracle use more efforts to compile the code. This parameter will eliminate the dead code and moving the code out of the loop which does the same thing for each iteration. This has three valid values, which are 0,1 and 2. But default value for this parameter is 2.

Let us discuss about each value for this parameter. Please note, Oracle has not provided any detail level example for each value of this parameter. So i can't demonstrate exactly what oracle does for each level. Indeed, we can see the performance improvement in each level.

PLSQL_OPTIMIZE_LEVEL = 0 The value 0 works some what as pre 10g release. Oracle documentation says it works better than 9i. Let me write a procedure and run in oracle10g with value 0.

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> alter session set plsql_optimize_level =0;

Session altered.

SQL> set serveroutput on
SQL> create or replace procedure test as
2 a integer;
3 b integer;
4 c integer;
5 d integer;
6 v_time integer;
7 begin
8 v_time := Dbms_Utility.GET_CPU_TIME();
9 for j in 1..10000000 loop
10 a:= 100;
11 b:= null;
12 c:= nvl(b,1)+a;
13 end loop;
14 Dbms_Output.Put_Line(Dbms_Utility.GET_CPU_TIME()-v_time);
15 end;
16 /

Procedure created.

SQL> execute test;
770

PL/SQL procedure successfully completed.

The above procedure runs in 770 mseconds in oracle10g with plsql_optimize_level=0.

PLSQL_OPTIMIZE_LEVEL = 1 It eliminates unnecessary computation and exceptions. Since oracle has not given any example for the each value, i guess, it removes the statement b:=NULL in the TEST procedure. It does not make any sense to assign NULL value for each iteration of the loop in the TEST Procedure.

SQL> alter procedure test compile plsql_optimize_level = 1;

Procedure altered.

SQL> execute test;
502

PL/SQL procedure successfully completed.

The above procedure executes in 502 seconds and it is better then the plsql_optimize_level=0.

PLSQL_OPTIMIZE_LEVEL = 2 It moves the unnecessary dead code relatively far from its original location. I guess, it moves the assignment statement out of loop. Since it assigns the same value for each iteration which is not meaningful. Be aware that, some time, oracle takes long time to compile the procedure when we have value 2. Since oracle has to rewrite the code during the compilation stage and not during the execution stage.

SQL> alter procedure test compile plsql_optimize_level = 2;

Procedure altered.

SQL> execute test;
301

PL/SQL procedure successfully completed.

SQL>

The above procedure runs in 301 seconds and performance is far better then the value 1.

Monday, October 19, 2009

How to use histogram in Oracle

I would like to write about Oracle Histogram today. Histogram is very nice feature to help cost based optimizer to make right decision.

What is Histogram? Histograms are feature in CBO and it helps to optimizer to determine how data are skewed(distributed) with in the column. Histogram is good to create for the column which are included in the WHERE clause where the column is highly skewed. Histogram helps to optimizer to decide whether to use an index or full-table scan or help the optimizer determine the fastest table join order.

What are the advantage of Histogram? Histograms are useful in two places.

1. Histograms are useful for Oracle optimizer to choose the right access method in a table.

2. It is also useful for optimizer to decide the correct table join order. When we join multiple tables, histogram helps to minimize the intermediate result set. Since the smaller size of the intermediate result set will improve the performance.

Type of Histograms: Oracle uses two types of histograms for column statistics: height-balanced histograms and frequency histograms.

1. Height - balanced Histograms : The column values are divided into bands so that each band contains approximately the same number of rows. For instances, we have 10 distinct values in the column and only five buckets. It will create height based(Height balanced) histograms and it will evenly spread values through the buckets. A height-based histogram is when there are more distinct values than the number of buckets and the histogram statistics shows a range of rows across the buckets

2. Frequency Histograms : Each value of the column corresponds to a single bucket of the histogram. This is also called value based histogram. Each bucket contains the number of occurrences of that single value. Frequency histograms are automatically created instead of height-balanced histograms when the number of distinct values is less than or equal to the number of histogram buckets specified.

Method_opt Parameter: This is the parameter which tells about creating histogram while collecting the statistics. The default is FOR ALL COLUMNS SIZE AUTO in Oracle10g. But in oracle9i, the default is FOR ALL COLUMN SIZE 1 which will turn off the histogram collection.

FOR ALL [INDEXED HIDDEN] COLUMNS [size_clause]
FOR COLUMNS [size clause] columnattribute [size_clause] [,columnattribute [size_clause]...]

size_clause is defined as size_clause := SIZE {integer REPEAT AUTO SKEWONLY}

integer : Number of histogram buckets. Must be in the range [1,254]

REPEAT : Collects histograms only on the columns that already have histograms.

AUTO : Oracle determines the columns to collect histograms based on data distribution and the workload of the columns. We have a table called sys.col_usage$ that stores information about column usage. dbms_stats use this information to determine whether histogram is required for the columns.

SKEWONLY : Oracle determines the columns to collect histograms based on the data distribution of the columns.

Let me demonstrate how optimizer works with and without histogram as below two scenario. We take the emp table for this demonstration. The table has around 3.6 million records. The table emp_status column is highly skewed. It has two distinct values(Y,N). We have bitmap index on emp_status column.

Scenario 1 Let us generate the statistics without any histogram and see what kind of execution path optimizer is using. Without the histogram, oracle assume that, the data is evenly distributed and optimizer think that, we will have around 1.8 million record for emp_status Y and around another 1.8 million records for emp_status N.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select count(*),emp_status from scott.emp
2 group by emp_status;

COUNT(*) E
---------- -
1 N
3670016 Y

SQL> execute DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT', TABNAME => 'EMP',ESTIMATE_PERCENT =>
10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1',CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> select ename from scott.emp where emp_status='Y';

3670016 rows selected.

--------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------
0 SELECT STATEMENT 1832K 15M 5374 (5) 00:01:05
* 1 TABLE ACCESS FULL EMP 1832K 15M 5374 (5) 00:01:05
--------------------------------------------------------------------------

SQL> select ename from scott.emp where emp_status='N';

Conclusion: Optimizer is using full table scan for the query which returns 3670016 records as well as it using full table scan for query which returns just only one record. This is obvisouly incorrect. This problem will be resolved by collecting histogram. Let us see in the next scenario.

Scenario 2 : Let us generate the statistics with histogram and see what kind of execution path optimizer is using. FOR COLUMN SIZE 2 EMP_STATUS will create two bucket for column emp_status. If we are not sure the distinct number of values in the column, then we can use AUTO option to collect histogram. With this histogram, oracle optimizer knows that, the column emp_status is highly skewed and it has two bucket and one bucket has around 3.6 million records with emp_status Y and another bucket has only one record with emp_status N. Now depends upon the query, optimizer decides whether to use index or Full table scan.

SQL> execute DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SCOTT', TABNAME => 'EMP',ESTIMATE_PERCENT =>
10, METHOD_OPT => 'FOR COLUMNS SIZE 2 EMP_STATUS',CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> select ename from scott.emp where emp_status='Y';

3670016 rows selected.

--------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------
0 SELECT STATEMENT 3681K 31M 5375 (5) 00:01:05
* 1 TABLE ACCESS FULL EMP 3681K 31M 5375 (5) 00:01:05
--------------------------------------------------------------------------

SQL> select ename from scott.emp where emp_status='N';

--------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------
0 SELECT STATEMENT 1 9 1 (0) 00:00:01
1 TABLE ACCESS BY INDEX ROWID EMP 1 9 1 (0) 00:00:01
2 BITMAP CONVERSION TO ROWIDS
* 3 BITMAP INDEX SINGLE VALUE IDX_EMP
--------------------------------------------------------------------------

Conclusion: Optimizer is using full table scan for the query which returns 3670016 records. At the same time, optimizer is using index scan when for other query which returns one record. This scenario, the optimizer choose the right execution plan based on the query WHERE clause.

Data dictionary objects for Histogram:
user_histograms
user_part_histograms
user_subpart_histograms
user_tab_histograms
user_tab_col_statistics