Wednesday, January 07, 2009

cross rows comparing

Goal
----
Try many approaches to do cross rows comparing.

Solution
--------
1. Hash Join
eagle’s home http://www.dbafan.com/blog/?p=176

2. Lead()/Lag() Analytic function

3. SQL Model

Outcome
-------
Hash Join is a good option, Lead() Analytic function is better, less db time, 8 times less Latches.
I could not make SQL Model work efficiently.
For large data set, SQL Model asks much more PGA and temporary tablespace to do sort.
Reduce the columns from Measure will help a little.

Who can help to tune SQL Model?

Setup
-----

drop table scott.t_range ;
create table scott.t_range nologging
as
select level id, level * 10 low_value, level * 10 + Trunc(dbms_random.value(1,11)) high_value
from dual
connect by level <= 300000
order by dbms_random.value;

exec dbms_stats.gather_table_stats('SCOTT','T_RANGE');

Notes: do not do it in production database.

SQL code
--------
1. Hash Join

drop table scott.t_range_tmp;
create table scott.t_range_tmp nologging as
select rownum line_no,low_value,high_value
from
(
select low_value,high_value
from scott.t_range
order by low_value asc
);

select a.low_value,a.high_value,b.low_value,b.high_value
from scott.t_range_tmp a,scott.t_range_tmp b
where a.line_no=b.line_no - 1 and a.high_value>=b.low_value;

2. Lead()/Lag() Analytic function

select * from
(
select id,low_value,high_value,
Lead(low_value,1) OVER (ORDER BY low_value) next_low_value
from scott.t_range
)
where next_low_value <= high_value;

3. SQL Model, slow, need tuning

with a as (SELECT rownum line_no,low_value,high_value FROM scott.t_range order by low_value asc)
select line_no,low_value,next_low_value
FROM a
MODEL
--RETURN UPDATED ROWS
MAIN simple_model
--PARTITION BY (country)
DIMENSION BY (line_no)
MEASURES (low_value,high_value, 0 AS next_low_value)
RULES
(next_low_value[ANY] = low_value[cv(line_no) + 1])
)
where next_low_value <= high_value;

Benchmark
---------
* RunStats

SQL> set arraysize 1000

exec runStats_pkg.rs_start;

drop table scott.t_range_tmp;
create table scott.t_range_tmp nologging as
select rownum line_no,low_value,high_value
from
(
select low_value,high_value
from scott.t_range
order by low_value asc
);

select count(*) from (
select a.low_value,a.high_value,b.low_value,b.high_value
from scott.t_range_tmp a,scott.t_range_tmp b
where a.line_no=b.line_no - 1 and a.high_value>=b.low_value
);

exec runStats_pkg.rs_middle;

select count(*) from
(
select * from
(
select id,low_value,high_value,
Lead(low_value,1) OVER (ORDER BY low_value) next_low_value
from scott.t_range
)
where next_low_value <= high_value
);

exec runStats_pkg.rs_stop(20);

Run1 ran in 270 hsecs
Run2 ran in 201 hsecs
run 1 ran in 134.33% of the run2 time

Name Run1 Run2 Diff
LATCH.JS queue state obj latch 90 60 -30
STAT...redo ordering marks 33 1 -32
STAT...CPU used when call star 209 177 -32
STAT...workarea memory allocat -32 0 32
STAT...CPU used by this sessio 209 177 -32
STAT...consistent changes 82 47 -35
LATCH.redo allocation 39 0 -39
LATCH.begin backup scn array 41 0 -41
LATCH.child cursor hash table 45 0 -45
LATCH.redo writing 46 0 -46
STAT...data blocks consistent 47 0 -47
STAT...rollbacks only - consis 47 0 -47
STAT...CR blocks created 47 0 -47
STAT...deferred (CURRENT) bloc 55 8 -47
STAT...DB time 225 177 -48
STAT...cluster key scans 51 0 -51
STAT...commit cleanouts succes 67 0 -67
STAT...Elapsed Time 272 203 -69
STAT...session cursor cache hi 79 4 -75
LATCH.SQL memory manager worka 81 6 -75
STAT...commit cleanouts 75 0 -75
STAT...index fetch by key 76 0 -76
LATCH.messages 95 12 -83
STAT...cluster key scan block 99 0 -99
STAT...calls to kcmgas 111 1 -110
STAT...opened cursors cumulati 121 4 -117
LATCH.checkpoint queue latch 118 0 -118
STAT...parse count (total) 123 4 -119
STAT...physical write IO reque 126 0 -126
STAT...execute count 131 5 -126
LATCH.dml lock allocation 130 0 -130
LATCH.cache buffers lru chain 140 0 -140
STAT...enqueue requests 157 1 -156
STAT...enqueue releases 158 0 -158
STAT...consistent gets - exami 245 12 -233
STAT...physical read IO reques 238 0 -238
LATCH.undo global data 274 14 -260
LATCH.library cache lock 263 0 -263
LATCH.multiblock read objects 282 0 -282
LATCH.enqueues 411 128 -283
STAT...buffer is not pinned co 305 0 -305
STAT...calls to get snapshot s 318 9 -309
STAT...bytes sent via SQL*Net 1,098 662 -436
LATCH.enqueue hash chains 609 130 -479
STAT...redo entries 568 19 -549
LATCH.shared pool 696 38 -658
STAT...db block changes 817 68 -749
LATCH.library cache pin 817 50 -767
STAT...db block gets from cach 944 43 -901
LATCH.simulator hash latch 978 1 -977
STAT...bytes received via SQL* 2,070 1,058 -1,012
LATCH.library cache 1,155 62 -1,093
STAT...physical reads direct t 1,275 0 -1,275
STAT...physical writes direct 1,275 0 -1,275
STAT...physical reads direct 1,275 0 -1,275
STAT...recursive calls 1,509 1 -1,508
LATCH.row cache objects 1,535 18 -1,517
STAT...physical reads cache pr 1,640 0 -1,640
STAT...db block gets direct 1,793 0 -1,793
STAT...physical reads cache 1,793 0 -1,793
STAT...free buffer requested 1,894 1 -1,893
LATCH.session allocation 2,030 0 -2,030
STAT...db block gets 2,737 43 -2,694
STAT...physical reads 3,068 0 -3,068
STAT...physical writes 3,068 0 -3,068
STAT...physical writes direct 3,068 0 -3,068
STAT...physical writes non che 3,068 0 -3,068
STAT...table scan blocks gotte 5,605 1,795 -3,810
STAT...no work - consistent re 5,663 1,795 -3,868
STAT...consistent gets 6,147 1,824 -4,323
STAT...consistent gets from ca 6,147 1,824 -4,323
LATCH.object queue header oper 5,777 2 -5,775
STAT...session logical reads 8,884 1,867 -7,017
LATCH.cache buffers chains 22,633 3,872 -18,761
STAT...sorts (rows) 322,072 302,880 -19,192
STAT...undo change vector size 28,368 3,372 -24,996
STAT...session pga memory -65,536 0 65,536
STAT...redo size 100,132 4,880 -95,252
STAT...table scan rows gotten 930,309 300,000 -630,309

Run1 latches total versus Run2 -- difference and pct
Run1 Run2 Diff Pct
38,553 4,446 -34,107 867.14%


exec runStats_pkg.rs_start;
select count(*) from
(
with a as (SELECT rownum line_no,low_value,high_value FROM scott.t_range order by low_value asc)
select line_no,low_value,next_low_value
FROM a
MODEL
--RETURN UPDATED ROWS
MAIN simple_model
--PARTITION BY (country)
DIMENSION BY (line_no)
MEASURES (low_value,high_value, 0 AS next_low_value)
RULES
(next_low_value[ANY] = low_value[cv(line_no) + 1])
)
where next_low_value <= high_value
;

exec runStats_pkg.rs_middle;
select count(*) from
(
select * from
(
select id,low_value,high_value,
Lead(low_value,1) OVER (ORDER BY low_value) next_low_value
from scott.t_range
)
where next_low_value <= high_value
);

exec runStats_pkg.rs_stop(20);

Reference
---------
Oracle® Database Data Warehousing Guide
11g Release 1 (11.1)
Part Number B28313-02

21 SQL for Analysis and Reporting
22 SQL for Modeling

3 comments:

casoul said...

great!

Samuel Ma said...

nice post!
where can i find Resource of the Package runStats_pkg?

Unknown said...

http://asktom.oracle.com/tkyte/runstats.html

You may download the source code here.