Wednesday, November 18, 2009

Effective table selectivity index selectivity and index column sequence

/*********
-- file://A:/SQL/CBO/index_selectivity.sql

| Modification History:
| Date Who What
| 18-Nov-2009: Charlie(Yi): Create the file,
|

Effective table selectivity selectivity and index column sequence

Goal
----
Research the order of the index columns and the effective index selectivity.

Solution
--------
Create a table with number column and date column,
Create 2 indexes on the table, one with ID in index first position, one with date_col on first position.
Benchmark the query with equal scan and range scan predicate on each index.

Outcome
-------
The logical reads on index access is: BLevel + leaf_blocks * effective index selectivity.

* LIO is 5 when access index with column ID on first position
select 2 + Ceil(1520 * 0.2 * (40 * 100)/500000) from dual;

* LIO is 15 when access index with date_col on first position,
select 2 + Ceil(1520 * (40 * 100)/500000) from dual;


"Because as soon as we have a range scan on a column used somewhere in the
middle of an index definition or fail to supply a test on such a column, the predicates on later
columns don’t restrict the selection of index leaf blocks that we have to examine."

From page74 of book Cost Based Oracle Fundamentals

For 2nd query, the effective index selectivity has to be calculated from the predicate on just the
date_col column. Because the test on date_col is range-based, the predicates on ID do not
restrict the number of index leaf blocks we have to walk.

Note
----
In plan output, Buffers = Logical Reads.

*******/

Setup
-----

create table date_index_sel
(
id number,
date_col date,
n2 number
)
nologging;

insert --+ append
into date_index_sel(id,date_col)
select mod(rownum,5), sysdate + rownum/100
from dual
connect by level <= 500000;

commit;

create index date_index_sel_i1 on date_index_sel(id,date_col) nologging;
create index date_index_sel_i2 on date_index_sel(date_col,id) nologging;

exec dbms_stats.gather_table_stats(user,'date_index_sel');

select INDEX_NAME, BLEVEL, LEAF_BLOCKS, DISTINCT_KEYS, CLUSTERING_FACTOR, NUM_ROWS
from user_indexes
where index_name like 'DATE_INDEX_SEL%';

INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS

DATE_INDEX_SEL_I1 2 1520 500000 5475 500000
DATE_INDEX_SEL_I2 2 1520 500000 1095 500000


set serveroutput off
set linesize 370
set pagesize 0

REM TEST1, date_col is NOT the first index column, range scan on date_col,

select --+ index(a,date_index_sel_i1) gather_plan_statistics
count(*) from date_index_sel a
where id=3 
and date_col between sysdate + 100 and sysdate + 140;

--SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL, 'basic iostats memstats last partition'));
SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL, 'basic cost allstats last partition'));

Plan hash value: 341633530

-------------------------------------------------------------------------------
| Id  | Operation          | Name              |Cost (%CPU)| A-Rows | Buffers |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    3 (100)|      1 |       5 |
|   1 |  SORT AGGREGATE    |                   |           |      1 |       5 |
|*  2 |   FILTER           |                   |           |    800 |       5 |
|*  3 |    INDEX RANGE SCAN| DATE_INDEX_SEL_I1 |    3   (0)|    800 |       5 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYSDATE+100<=SYSDATE+140)
   3 - access("ID"=3 AND "DATE_COL">=SYSDATE+100 AND "DATE_COL"<=SYSDATE+140)

REM TEST2, date_col is the first index column,

select --+ index(a,date_index_sel_i2) gather_plan_statistics
count(*) from date_index_sel a
where id=3 
and date_col between sysdate + 100 and sysdate + 140
;

SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL, 'basic cost iostats memstats last partition'));

Plan hash value: 1386725119

-------------------------------------------------------------------------------
| Id  | Operation          | Name              | Cost (%CPU)|A-Rows | Buffers |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    15 (100)|     1 |      15 |
|   1 |  SORT AGGREGATE    |                   |            |     1 |      15 |
|*  2 |   FILTER           |                   |            |   800 |      15 |
|*  3 |    INDEX RANGE SCAN| DATE_INDEX_SEL_I2 |    15   (0)|   800 |      15 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYSDATE+100<=SYSDATE+140)
   3 - access("DATE_COL">=SYSDATE+100 AND "ID"=3 AND "DATE_COL"<=SYSDATE+140)
       filter("ID"=3)

***.

No comments: