Friday, October 29, 2010

effective index selectivity 2

File name...: A:\SQL\CBO\index_selectivity2.sql
Usage.......: @file_name
Description.: Why we need to avoid implicit data conversions on index columns in predicate.
Notes.......: index on VarChar2 columns
Parameters..:
Package.....: ._pkg

Modification History:

Date Who What
29-Oct-2010: Charlie(Yi): Create the file,

Goal
----
Show how implicit data conversions impact Effective Index Selectivity.
Prove that developers rely on implicit conversions is a bad practice.

Solution
--------
Create index on VarChar2 column(s), put number value in the predicate that use this index,
cause implicitly data type conversion.
E.g. TO_NUMBER(index_column) = n

Spec
----
cost = blevel +
ceiling(leaf_blocks * effective index selectivity) +
ceiling(clustering_factor * effective table selectivity)

Logical Reads(LIO) of index access = BLevel + leaf_blocks * effective index selectivity

Buffers is LIO in function dbms_xplan.display_cursor output.

See Page 62(91) of Book: [Cost-Based Oracle Fundamentals]

Result
------
cost = 1 , LIO = 2 , No data conversion
cost = 26, LIO = 28, data conversion, from char to number

Data flow
---------

Test case
---------
* ,
* ,

Setup
-----
See below SQL code.

Reference
---
QA..: You can send feedbacks or questions about this script to charlie.zhu1 gmail.com
blog: http://mujiang.blogspot.com/

ALTER SESSION SET STATISTICS_LEVEL=TYPICAL;

create table t
(
c1 VarChar2(5),
c2 VarChar2(7)
)
nologging;

create table t1
(
c1 varchar2(5),
n1 number(5)
)
nologging;

insert into t1(c1,n1) values('501',501);

insert /*+ append */ into t(c1,c2)
select mod(rownum,5), rownum
from dual
connect by level <=50000;

commit;

create unique index t_u1 on t(c1,c2) nologging;

exec dbms_stats.gather_table_stats(user,'t');
exec dbms_stats.gather_table_stats(user,'t1');

set serveroutput off
set linesize 200
ALTER SESSION SET STATISTICS_LEVEL=ALL;

REM -- use number value on VarChar2 index columns,

select * from t
where c1='1' and c2=(select n1 from t1);

SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL, '+cost iostats memstats last partition'));

----------------------------------------------------------
| Id  | Operation          | Name | Cost (%CPU)| Buffers |
----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    29 (100)|      35 |
|*  1 |  INDEX RANGE SCAN  | T_U1 |    26   (0)|      35 |
|   2 |   TABLE ACCESS FULL| T1   |     3   (0)|       7 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("C1"='1')
       filter(TO_NUMBER("C2")=)

REM -- use string value on VarChar2 index columns,

select * from t
where c1='1' and c2=(select c1 from t1);

SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL, 'cost iostats memstats last partition'));

----------------------------------------------------------
| Id  | Operation          | Name | Cost (%CPU)| Buffers |
----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     4 (100)|       9 |
|*  1 |  INDEX UNIQUE SCAN | T_U1 |     1   (0)|       9 |
|   2 |   TABLE ACCESS FULL| T1   |     3   (0)|       7 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("C1"='1' AND "C2"=)
.

Thursday, September 30, 2010

Throttle big result set

Goal

To process a big query result set from database, many a time the application server is limited by memory footprint, so we need to throttle the output chunk by chunk.

Solution
Q: How do you eat an elephant? A: One piece at a time.
The staging table will put into a tablespace with EXTENT MANAGEMENT LOCAL UNIFORM SIZE 4M. you may choose other uniform size based on your result size limit.
And then we will read data one extent at a time, after finish one extent, mark it as processed.
When client application crashed or failed, we can restart at failed extent.

We may slow make the chunk across 2 or many extents to make it more flexible.

Reference
On the use of DBMS_ROWID.rowid_create
[http://bit.ly/bphHJb]

Setup

drop table rowid_range_job purge;

create table rowid_range_job
tablespace data_auto nologging
as
select e.extent_id, e.block_id, e.block_id+blocks-1 block_id_end,
 cast(dbms_rowid.rowid_create( 1, o.data_object_id, e.file_id, e.block_id, 0 ) as urowid) min_rowid,
 cast(dbms_rowid.rowid_create( 1, o.data_object_id, e.file_id, e.block_id+e.blocks-1, 10000 ) as urowid) max_rowid,
 cast(o.object_name as varchar2(30)) table_name,
 cast(null as varchar2(30)) partition_name,
 cast('isbn_extract' as varchar2(30)) job_name,
 cast(null as date)         process_date,
 cast(null as number(1))    is_processed
from dba_extents e, user_objects o
 where o.object_name = 'T'
   and e.segment_name = 'T'
   and e.owner = user
   and e.segment_type = 'TABLE'
 order by e.extent_id
;

drop table t purge;
create table t
(
n1 number(10),
d1 date,
c1 varchar2(2000)
);

insert --+ append
 into t(n1,d1,c1)
select rownum, sysdate, rpad('a',1200)
from dual
connect by level <= 10000;
commit;

Generate and check rowid range split,

with data as
(
select e.extent_id, e.block_id, e.block_id+blocks-1,
       dbms_rowid.rowid_create( 1, o.data_object_id, e.file_id, e.block_id, 0 ) min_rowid,
       dbms_rowid.rowid_create( 1, o.data_object_id, e.file_id, e.block_id+e.blocks-1, 10000 ) max_rowid
  from dba_extents e,
       user_objects o
 where o.object_name = 'T'
   and e.segment_name = o.object_name
   and e.owner = user
   and e.segment_type = 'TABLE'
)
select extent_id, count(*) cnt
  from data, T t
 where t.rowid between data.min_rowid and data.max_rowid
 group by rollup (extent_id)
;

delete rowid_range_job where job_name = 'test_output_job';

INSERT INTO rowid_range_job
   SELECT   e.extent_id, e.block_id, e.block_id + blocks - 1 block_id_end,
            CAST
               (DBMS_ROWID.rowid_create (1,
                                         o.data_object_id,
                                         e.file_id,
                                         e.block_id,
                                         0
                                        ) AS UROWID
               ) min_rowid,
            CAST
               (DBMS_ROWID.rowid_create (1,
                                         o.data_object_id,
                                         e.file_id,
                                         e.block_id + e.blocks - 1,
                                         32000
                                        ) AS UROWID
               ) max_rowid,
            CAST (o.object_name AS VARCHAR2 (30)) table_name,
            CAST (NULL AS VARCHAR2 (30)) partition_name,
            CAST ('test_output_job' AS VARCHAR2 (30)) job_name,
            CAST (NULL AS DATE) process_date, 0 is_processed
       FROM dba_extents e, user_objects o
      WHERE o.object_name LIKE 'T'
        AND e.segment_name = o.object_name
        AND e.owner = USER
        AND e.segment_type = 'TABLE'
   ORDER BY o.object_name, e.extent_id;

commit;

Query rowid extent range split metadata

select EXTENT_ID,
BLOCK_ID,
BLOCK_ID_END,
MIN_ROWID,
MAX_ROWID
--,TABLE_NAME
--,PARTITION_NAME
--,JOB_NAME
--,PROCESS_DATE
--,IS_PROCESSED 
from invdb.rowid_range_job where job_name = 'test_output_job';

with data as
(
  select extent_id, block_id, block_id_end, min_rowid, max_rowid, table_name, job_name, process_date, is_processed
  from rowid_range_job
  where job_name = 'test_output_job'
    and table_name = 'T'
)
select extent_id, count(*) cnt
  from data, T t
 where t.rowid between data.min_rowid and data.max_rowid
 group by rollup (extent_id)
/

Monday, August 16, 2010

IN_EXISTS Correlated Subquery

Goal

To show how IN and EXISTS Correlated Subquery works, and how they changed in Oracle 11g CBO SQL Engine. Also covered the NOT IN and NOT EXISTS difference.

Setup

drop table t1 cascade constraints purge;
drop table t2 purge;

create table t1(x number);
create table t2(y number);

insert into t1(x) values(1);
insert into t1(x) values(2);
insert into t1(x) values(Null);

insert into t2(y) values(1);
insert into t2(y) values(Null);
commit;

create index t1_x on t1(x);
create index t2_y on t2(y);

IN

t2 is small, index on t1.x, the subquery ( select y from T2 ) is small, probable full scan t2

Select * from T1 where x in
 ( select y from T2 );
=
select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

Small t2,

set autot trace exp

Select /*+ cardinality(t1,50000) */ * from T1 where x in
 ( select /*+ cardinality(t2, 200) */ y from T2 );
------------------------------------------
| Id  | Operation         | Name | Rows  |
------------------------------------------
|   0 | SELECT STATEMENT  |      |   224 |
|   1 |  NESTED LOOPS     |      |   224 |
|   2 |   SORT UNIQUE     |      |   200 |
|   3 |    INDEX FULL SCAN| T2_Y |   200 |
|*  4 |   INDEX RANGE SCAN| T1_X |     1 |
------------------------------------------

Small t1

Select /*+ cardinality(t1,500) */ * from T1 where x in
 ( select /*+ cardinality(t2, 20000) */ y from T2 );
------------------------------------------
| Id  | Operation         | Name | Rows  |
------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 |
|   1 |  NESTED LOOPS SEMI|      |   500 |
|   2 |   INDEX FULL SCAN | T1_X |   500 |
|*  3 |   INDEX RANGE SCAN| T2_Y | 20000 |
------------------------------------------

Exists

t1 is small, index on t2(y), full scan t1,

select * from t1 
 where exists ( select null from t2 where y = x );
=
   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

Small t2, big t1.

select /*+ cardinality(t1,50000) */ * from t1 
 where exists ( select /*+ cardinality(t2, 700) */ null from t2 where y = x );
------------------------------------------
| Id  | Operation         | Name | Rows  |
------------------------------------------
|   0 | SELECT STATEMENT  |      |   782 |
|   1 |  NESTED LOOPS     |      |   782 |
|   2 |   SORT UNIQUE     |      |   700 |
|   3 |    INDEX FULL SCAN| T2_Y |   700 |
|*  4 |   INDEX RANGE SCAN| T1_X |     1 |
------------------------------------------

Small t1, big t2.

select /*+ cardinality(t1,500) */ * from t1 
 where exists ( select /*+ cardinality(t2, 70000) */ null from t2 where y = x );
------------------------------------------
| Id  | Operation         | Name | Rows  |
------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 |
|   1 |  NESTED LOOPS SEMI|      |   500 |
|   2 |   INDEX FULL SCAN | T1_X |   500 |
|*  3 |   INDEX RANGE SCAN| T2_Y | 70000 |
------------------------------------------

If both the subquery and the outer table are huge — either might work as well as the
other — depends on the indexes and other factors.

Not in and Not exists are different.

select * from t1 outer
where outer.x not in (select y from t2);

is NOT the same as

select * from t1 outer
where not exists (select null from t2 where t2.y = outer.x);

UNLESS the expression "y" is not null. That said:

select * from t1 outer
where outer.x not in (select y from t2 where y is not null);

is the same as

select * from t1 outer
where not exists (select null from t2 where t2.y = outer.x);

If t2.y is null-able,

select * from t1 outer
where outer.x not in (select y from t2 where y is null);

== equals ==


select * from t1 outer
where outer.x not in (2,3,Null);

return no rows.

select * from t1 outer
where outer.x not in (2,3);

Returns some rows.

IN & EXISTS, AskTom, http://bit.ly/aLvOeS

Set Stats

declare
 m_distcnt  number;
 m_density  number;
 m_nullcnt  number;
 m_avgclen  number;

begin

 m_distcnt := 200100;
 m_density := 0.002;
 m_nullcnt := 0;
 m_avgclen := 5;

 dbms_stats.set_column_stats(
  ownname  => user,
  tabname  => 't1',
  colname  => 'x',
  distcnt  => m_distcnt,
  density  => m_density,
  nullcnt  => m_nullcnt,
  avgclen  => m_avgclen
 );

 dbms_stats.set_column_stats(
  ownname  => user,
  tabname  => 't2',
  colname  => 'y',
  distcnt  => m_distcnt,
  density  => m_density,
  nullcnt  => m_nullcnt,
  avgclen  => m_avgclen
 );

  dbms_stats.set_table_stats( user, 't1', numrows => 2000000, numblks => 1000000);
  dbms_stats.set_table_stats( user, 't2', numrows => 2000000, numblks => 1000000);

end;
/

select  table_name,
 num_rows,
 blocks
from
 user_tables
where
 table_name in ('T1','T2')
;

select
 num_distinct,
 low_value,
 high_value,
 density,
 num_nulls,
 num_buckets,
 histogram
from
 user_tab_columns
where
 table_name in ('T1','T2')
and column_name in ('X','Y')
;

Tuesday, May 18, 2010

Collection_Array

PL/SQL has three collection types, Tom often demos with Nested Table and Associative Array.
Associative Array is most flexible on element indexing. Nested Table is good enough for Bulk Fetching.

But Bryn Llewellyn, Oracle PL/SQL Product Manager likes to use VArray in his demo.
"the collection is best declared as a varray with a maximum size equal to Batchsize."
See http://www.oracle.com/technology/tech/pl_sql/pdf/doing_sql_from_plsql.pdf
Probably he knows that VARRAY implemented with the most efficient internal storage, and delivery the best performance, when correctly used; it may meet most of the requirements for loop batch bulk fetching.
To find out the details, you may do a simple RunStats benchmark.

Associative Array type(or index-by table)

TYPE population IS TABLE OF NUMBER INDEX BY VARCHAR2(64);

Nested Tables

TYPE nested_type IS TABLE OF VARCHAR2(30);

invoke EXTEND method to add elements later

Collection of ADT = UDT, Abstract datatype, User defined datatype:

CREATE OR REPLACE TYPE INVDB.NUMBER_TAB_TYPE is table of number;
/

select ... from TABLE(ADT_Table_Instance);

Comments, Bulk fetch into ADT is not efficient, you may see the workaround in paper doing_sql_from_plsql.pdf

Variable-size array (varray)

-- Code_30 Many_Row_Select.sql
Batchsize constant pls_integer := 1000;

type Result_t is record(PK t.PK%type, v1 t.v1%type);
type Results_t is varray(1000) of Result_t;
Results Results_t;

Concept

Associative Array: sparse array.
Nested Table or ADT/UDT : dense array

REM Associative Array
declare
   type date_aat is table of date index by binary_integer;
   l_data date_aat;
begin
   l_data(-200) := sysdate;
   l_data(+200) := sysdate+1;
end;
/

collection(elements), e.g. collection_instance(element_unique_subscript_index_number)
record_name.field_name

Monday, May 03, 2010

Benchmark with RunStats

There are many approaches and tools to benchmark Oracle application.
E.g.
* mystat.sql and mystat2.sql
* SQL session trace and tkprof
* SET AUTOT[RACE] {OFF | ON | TRACE[ONLY]} [EXP[LAIN]] [STAT[ISTICS]]
* ASH/AWR
* SQL hint /*+ gather_plan_statistics */ and dbms_xplan.display_cursor(NULL,NULL, 'iostats memstats last partition');
* ALTER SESSION SET STATISTICS_LEVEL=ALL;
* Real-Time SQL Monitoring

My favorite one is Tom's RunStats. Here is the one I enhanced from Tom's original version.

/*
Goal
----
Persistent the benchmark stats, and then developer can query the report later.

Solution
--------
Add an IP column to utility.run_stats_save table, get computer IP by SYS_CONTEXT function.

Reference
---------
file://a:/Tuning/Trace/RunStatsSave.sql

http://tkyte.blogspot.com/2009/10/httpasktomoraclecomtkyte.html

How to build a simple test harness (RUNSTATS) (HOWTO) to test two different approaches from a performance perspective.

Runstats.sql
This is the test harness I use to try out different ideas. It shows two vital sets of statistics for me
The elapsed time difference between two approaches. It very simply shows me which approach is faster by the wall clock
How many resources each approach takes. This can be more meaningful then even the wall clock timings. For example, if one approach is faster then the other but it takes thousands of latches (locks), I might avoid it simply because it will not scale as well.
The way this test harness works is by saving the system statistics and latch information into a temporary table. We then run a test and take another snapshot. We run the second test and take yet another snapshot. Now we can show the amount of resources used by approach 1 and approach 2.
Requirements

In order to run this test harness you must at a minimum have:

Access to V$STATNAME, V$MYSTAT, v$TIMER and V$LATCH
You must be granted select DIRECTLY on SYS.V_$STATNAME, SYS.V_$MYSTAT, SYS.V_$TIMER and SYS.V_$LATCH. It will not work to have select on these via a ROLE.
The ability to create a table -- run_stats -- to hold the before, during and after information.
The ability to create a package -- rs_pkg -- the statistics collection/reporting piece
You should note also that the LATCH information is collected on a SYSTEM WIDE basis. If you run this on a multi-user system, the latch information may be technically "incorrect" as you will count the latching information for other sessions - not just your session. This test harness works best in a simple, controlled test environment.

*/

CREATE USER utility
  IDENTIFIED BY ?
  DEFAULT TABLESPACE USERS quota unlimited on users
ACCOUNT UNLOCK;

grant unlimited tablespace to utility;

create role schema_admin;

grant create session, create table, create view, 
 create procedure, create trigger, create any directory,
 CREATE SEQUENCE, CREATE TYPE, CREATE SYNONYM,
 create materialized view, create dimension,
 SELECT_CATALOG_ROLE,
 create database link,create public database link, drop public database link,
 create job
to schema_admin;

grant schema_admin to utility;
-- grant create procedure, create table to utility;

grant select on SYS.V_$STATNAME to utility;
grant select on SYS.V_$MYSTAT to utility;
grant select on SYS.V_$TIMER to utility;
grant select on SYS.V_$LATCH to utility;

drop table utility.run_stats;
drop table utility.run_stats_save;

create global temporary table utility.run_stats
( runid varchar2(15),
  name varchar2(80),
  value int )
on commit preserve rows;

-- Store the stats for later reporting
create table utility.run_stats_save
( 
runid varchar2(15),
name varchar2(80),
value int,
IP varchar2(30),
hostname varchar2(30)
) tablespace users;

create or replace view utility.stats
as select 'STAT...' || a.name name, b.value
      from v$statname a, v$mystat b
     where a.statistic# = b.statistic#
    union all
    select 'LATCH.' || name,  gets
      from v$latch
 union all
 select 'STAT...Elapsed Time', hsecs from v$timer;
/*
SYS_CONTEXT

The SYS_CONTEXT function is able to return the following host and IP address information for the current session:

TERMINAL - An operating system identifier for the current session. This is often the client machine name. 
HOST - The host name of the client machine. 
IP_ADDRESS - The IP address of the client machine. 
SERVER_HOST - The host name of the server running the database instance. 

SELECT SYS_CONTEXT('USERENV','HOST') FROM dual;
----------
GATES2\SKY

SELECT terminal, machine FROM v$session 
where sid = (select sid from v$mystat where rownum <= 1);
*/

CREATE OR REPLACE package utility.runstats_pkg
as
  TYPE print_tab     IS TABLE OF varchar2(200);
  --l_print dbms_sql.VARCHAR2_TABLE; 
    procedure rs_start;
    procedure rs_middle;
    procedure rs_stop( p_difference_threshold in number default 0 );
    procedure rs_report( p_difference_threshold in number default 0, p_host in varchar2 default Null );
  function rs_report( p_difference_threshold in number default 0, p_host in varchar2 default Null )
  return print_tab PIPELINED DETERMINISTIC;
end;
/

CREATE OR REPLACE package body utility.runstats_pkg
as

g_start number;
g_run1  number;
g_run2  number;
g_host varchar2(30);
g_ip varchar2(30);

procedure rs_start
is
begin

  g_host := substr(SYS_CONTEXT('USERENV','TERMINAL'),1,30);
  g_ip := substr(SYS_CONTEXT('USERENV','IP_ADDRESS'),1,30);
  delete from run_stats_save where hostname = g_host;

    execute immediate 'truncate table run_stats';
    delete from run_stats;

    insert into run_stats
    select 'before', stats.* from stats;

    g_start := dbms_utility.get_time;
end;

procedure rs_middle
is
begin
    g_run1 := (dbms_utility.get_time-g_start);

    insert into run_stats
    select 'after 1', stats.* from stats;
    g_start := dbms_utility.get_time;

end;

procedure rs_stop(p_difference_threshold in number default 0)
is
begin
    g_run2 := (dbms_utility.get_time-g_start);

    dbms_output.put_line
    ( 'Run1 ran in ' || g_run1 || ' hsecs' );
    dbms_output.put_line
    ( 'Run2 ran in ' || g_run2 || ' hsecs' );
    dbms_output.put_line
    ( 'run 1 ran in ' || round(g_run1/g_run2*100,2) ||
      '% of the time' );
    dbms_output.put_line( chr(9) );

    insert into run_stats
    select 'after 2', stats.* from stats;

    insert into run_stats_save (RUNID,NAME,VALUE,IP,HOSTNAME)
    select RUNID,NAME,VALUE, g_ip, g_host from run_stats;

    commit;

    dbms_output.put_line
    ( rpad( 'Name', 40 ) || lpad( 'Run1', 14 ) ||
      lpad( 'Run2', 14 ) || lpad( 'Diff', 14 ) );

    for x in
    ( select rpad( a.name, 40 ) ||
             to_char( b.value-a.value, '99999,999,999' ) ||
             to_char( c.value-b.value, '99999,999,999' ) ||
             to_char( ( (c.value-b.value)-(b.value-a.value)), '99999,999,999' ) data
        from run_stats a, run_stats b, run_stats c
       where a.name = b.name
         and b.name = c.name
         and a.runid = 'before'
         and b.runid = 'after 1'
         and c.runid = 'after 2'
         -- and (c.value-a.value) > 0
         and abs( (c.value-b.value) - (b.value-a.value) )
               > p_difference_threshold
       order by abs( (c.value-b.value)-(b.value-a.value)), data
    ) loop
        dbms_output.put_line( x.data );
    end loop;

    dbms_output.put_line( chr(9) );
    dbms_output.put_line
    ( 'Run1 latches total versus run2 -- difference and pct' );
    dbms_output.put_line
    ('.'|| lpad( 'Run1', 14 ) || lpad( 'Run2', 14 ) ||
      lpad( 'Diff', 14 ) || lpad( 'Pct', 11 ) );

    for x in
    ( select '.'||
             to_char( run1, '99999,999,999' ) ||
             to_char( run2, '99999,999,999' ) ||
             to_char( diff, '99999,999,999' ) ||
             to_char( round( run1/run2*100,2 ), '99,999.99' ) || '%' data
        from ( select sum(b.value-a.value) run1, sum(c.value-b.value) run2,
                      sum( (c.value-b.value)-(b.value-a.value)) diff
                 from run_stats a, run_stats b, run_stats c
                where a.name = b.name
                  and b.name = c.name
                  and a.runid = 'before'
                  and b.runid = 'after 1'
                  and c.runid = 'after 2'
                  and a.name like 'LATCH%'
                )
    ) loop
        dbms_output.put_line( x.data );
    end loop;
end;

procedure rs_report( p_difference_threshold in number default 0, p_host in varchar2 default Null )
is
begin
  g_host := substr(SYS_CONTEXT('USERENV','TERMINAL'),1,30);
  g_ip := substr(SYS_CONTEXT('USERENV','IP_ADDRESS'),1,30);

    g_run2 := (dbms_utility.get_time-g_start);

    dbms_output.put_line
    ( 'Run1 ran in ' || g_run1 || ' hsecs' );
    dbms_output.put_line
    ( 'Run2 ran in ' || g_run2 || ' hsecs' );
    dbms_output.put_line
    ( 'run 1 ran in ' || round(g_run1/g_run2*100,2) ||
      '% of the time' );
    dbms_output.put_line( chr(9) );

    dbms_output.put_line
    ( rpad( 'Name', 40 ) || lpad( 'Run1', 14 ) ||
      lpad( 'Run2', 14 ) || lpad( 'Diff', 14 ) );

    for x in
    ( select rpad( a.name, 40 ) ||
             to_char( b.value-a.value, '99999,999,999' ) ||
             to_char( c.value-b.value, '99999,999,999' ) ||
             to_char( ( (c.value-b.value)-(b.value-a.value)), '99999,999,999' ) data
        from run_stats_save a, run_stats_save b, run_stats_save c
       where a.name = b.name
         and b.name = c.name
         and a.runid = 'before'
         and b.runid = 'after 1'
         and c.runid = 'after 2'
         -- and (c.value-a.value) > 0
         and abs( (c.value-b.value) - (b.value-a.value) )
               > p_difference_threshold
         and a.hostname = g_host
         and a.hostname = b.hostname
         and a.hostname = c.hostname
       order by abs( (c.value-b.value)-(b.value-a.value))
    ) loop
        dbms_output.put_line( x.data );
    end loop;

    dbms_output.put_line( chr(9) );
    dbms_output.put_line
    ( 'Run1 latches total versus run2 -- difference and pct' );
    dbms_output.put_line
    ( lpad( 'Run1', 14 ) || lpad( 'Run2', 14 ) ||
      lpad( 'Diff', 14 ) || lpad( 'Pct', 11 ) );

    for x in
    ( select to_char( run1, '99999,999,999' ) ||
             to_char( run2, '99999,999,999' ) ||
             to_char( diff, '99999,999,999' ) ||
             to_char( round( run1/run2*100,2 ), '99,999.99' ) || '%' data
        from ( select sum(b.value-a.value) run1, sum(c.value-b.value) run2,
                      sum( (c.value-b.value)-(b.value-a.value)) diff
                 from run_stats_save a, run_stats_save b, run_stats_save c
                where a.name = b.name
                  and b.name = c.name
                  and a.runid = 'before'
                  and b.runid = 'after 1'
                  and c.runid = 'after 2'
                  and a.name like 'LATCH%'
         and a.hostname = g_host
         and a.hostname = b.hostname
         and a.hostname = c.hostname
               )
    ) loop
        dbms_output.put_line( x.data );
    end loop;
end;

-- select * from TABLE(runstats_pkg.rs_report);

function rs_report( p_difference_threshold in number default 0, p_host in varchar2 default Null )
return print_tab PIPELINED DETERMINISTIC
as
begin
  g_host := substr(SYS_CONTEXT('USERENV','TERMINAL'),1,30);
  g_ip := substr(SYS_CONTEXT('USERENV','IP_ADDRESS'),1,30);

    g_run2 := (dbms_utility.get_time-g_start);

    PIPE ROW
    ( 'Run1 ran in ' || g_run1 || ' hsecs' );
    PIPE ROW
    ( 'Run2 ran in ' || g_run2 || ' hsecs' );
    PIPE ROW
    ( 'run 1 ran in ' || round(g_run1/g_run2*100,2) ||
      '% of the time' );
    PIPE ROW( chr(9) );

    PIPE ROW
    ( rpad( 'Name', 40 ) || lpad( 'Run1', 14 ) ||
      lpad( 'Run2', 14 ) || lpad( 'Diff', 14 ) );

    for x in
    ( select rpad( a.name, 40 ) ||
             to_char( b.value-a.value, '99999,999,999' ) ||
             to_char( c.value-b.value, '99999,999,999' ) ||
             to_char( ( (c.value-b.value)-(b.value-a.value)), '99999,999,999' ) data
        from run_stats_save a, run_stats_save b, run_stats_save c
       where a.name = b.name
         and b.name = c.name
         and a.runid = 'before'
         and b.runid = 'after 1'
         and c.runid = 'after 2'
         -- and (c.value-a.value) > 0
         and abs( (c.value-b.value) - (b.value-a.value) )
               > p_difference_threshold
         and a.hostname = g_host
         and a.hostname = b.hostname
         and a.hostname = c.hostname
       order by abs( (c.value-b.value)-(b.value-a.value)), abs(c.value-b.value)
    ) loop
        PIPE ROW( x.data );
    end loop;

    PIPE ROW( chr(9) );
    PIPE ROW
    ( 'Run1 latches total versus run2 -- difference and pct' );
    PIPE ROW
    ( lpad( 'Run1', 14 ) || lpad( 'Run2', 14 ) ||
      lpad( 'Diff', 14 ) || lpad( 'Pct', 11 ) );

    for x in
    ( select to_char( run1, '99999,999,999' ) ||
             to_char( run2, '99999,999,999' ) ||
             to_char( diff, '99999,999,999' ) ||
             to_char( round( run1/run2*100,2 ), '99,999.99' ) || '%' data
        from ( select sum(b.value-a.value) run1, sum(c.value-b.value) run2,
                      sum( (c.value-b.value)-(b.value-a.value)) diff
                 from run_stats_save a, run_stats_save b, run_stats_save c
                where a.name = b.name
                  and b.name = c.name
                  and a.runid = 'before'
                  and b.runid = 'after 1'
                  and c.runid = 'after 2'
                  and a.name like 'LATCH%'
         and a.hostname = g_host
         and a.hostname = b.hostname
         and a.hostname = c.hostname
               )
    ) loop
        PIPE ROW( x.data );
    end loop;
end;

end;
/

grant execute on utility.runstats_pkg to public;
create or replace public synonym runstats_pkg for utility.runstats_pkg; 

/*
--Usage: to benchmark two approaches
--you may just leave approach 2 code part empty, to get resources of code 1 take.

set serveroutput on

execute runStats_pkg.rs_start;
 
execute runStats_pkg.rs_middle;
 
execute runStats_pkg.rs_stop;

begin
runStats_pkg.rs_start;
 for c in ()
 loop
   Null;
 end loop;
runStats_pkg.rs_middle;
 for c in ()
 loop
   Null;
 end loop;
runStats_pkg.rs_stop;

end;

--To get the report after benchmark:
select * from TABLE(runstats_pkg.rs_report);
OR
exec runStats_pkg.rs_report(10);

*/

Monday, April 26, 2010

Find and delete duplicate rows by Analytic Function

The intuitive way will be create a temp table, with Min(RowID) and Count(*)>1, then join it back to target table to do the delete.

You can get duplicate rows by Analytic SQL:

SELECT rid, deptno, job, rn
  FROM
  (SELECT /*x parallel(a) */
        ROWID rid, deptno, job,
        ROW_NUMBER () OVER (PARTITION BY deptno, job ORDER BY empno) rn
   FROM scott.emp a
  )
WHERE rn <> 1;

Get duplicate row count with Count(*) > 0.

SELECT /*x parallel(a,8) */
 MAX(ROWID) rid, deptno, job, COUNT(*)
FROM scott.emp a
GROUP BY deptno, job
HAVING COUNT(*) > 1;

To delete them:

DELETE FROM scott.emp
WHERE ROWID IN
 (
  SELECT rid
    FROM (SELECT /*x parallel(a) */
                 ROWID rid, deptno, job,
                 ROW_NUMBER () OVER (PARTITION BY deptno, job ORDER BY empno) rn
            FROM scott.emp a)
  WHERE rn <> 1
);

Monday, April 19, 2010

Reclaim deleted LOB data storage

I was helping a client purge obsolete data and reclaim some space in an OLAP database.
We stuck on a BLOB column segment. It took me a couple hours to find the solution.

Here is the solution demo.

Create a table with BLOB column,

drop table t2 purge;

CREATE TABLE t2
(
 n1 NUMBER(10),
 d1 date,
 b1 BLOB,
 CONSTRAINT t2_PK PRIMARY KEY (n1)
  USING INDEX TABLESPACE index_auto
) 
TABLESPACE data_auto;

Displays the large objects (LOBs) contained in tables

select b.TABLE_NAME, b.COLUMN_NAME, b.SEGMENT_NAME, b.TABLESPACE_NAME, b.INDEX_NAME
from user_lobs b;

SELECT SEGMENT_NAME, segment_type, TABLESPACE_NAME, BYTES, BLOCKS, EXTENTS
FROM user_SEGMENTS
WHERE segment_type like '%LOB%'
ORDER BY SEGMENT_NAME;

SELECT b.TABLE_NAME, b.COLUMN_NAME, b.SEGMENT_NAME, b.TABLESPACE_NAME, b.INDEX_NAME
 ,s.bytes, s.blocks, s.extents
FROM user_lobs b, user_segments s
WHERE b.table_name = 'T2'
and b.column_name = 'B1'
and s.segment_type like 'LOB%'
and s.segment_name = b.segment_name;

Test Shrink a BASICFILE LOB segment only

truncate table t2;

declare
    l_blob blob;
    l_size number := 32700;
begin
  for i in 1 .. 15
  loop
    insert into t2(n1,b1) values (i, empty_blob() ) returning b1 into l_blob;
    dbms_lob.writeappend( l_blob, l_size, utl_raw.cast_to_raw(rpad('*',l_size,'*')));
  end loop;
  commit;
end;
/

delete t2;
commit;

SELECT b.TABLE_NAME, b.COLUMN_NAME, b.SEGMENT_NAME, b.TABLESPACE_NAME, b.INDEX_NAME
 ,s.bytes, s.blocks, s.extents
FROM user_lobs b, user_segments s
WHERE b.table_name = 'T2'
and b.column_name = 'B1'
and s.segment_type like 'LOB%'
and s.segment_name = b.segment_name;

     BYTES     BLOCKS    EXTENTS
---------- ---------- ----------
    720896         88         11

ALTER TABLE t2 MODIFY LOB (b1) (SHRINK SPACE);

SELECT b.TABLE_NAME, b.COLUMN_NAME, b.SEGMENT_NAME, b.TABLESPACE_NAME, b.INDEX_NAME
 ,s.bytes, s.blocks, s.extents
FROM user_lobs b, user_segments s
WHERE b.table_name = 'T2'
and b.column_name = 'B1'
and s.segment_type like 'LOB%'
and s.segment_name = b.segment_name;

     BYTES     BLOCKS    EXTENTS
---------- ---------- ----------
     65536          8          1

Note

Shrink command will generate about same size of redo/archive logs as LOB storage space size.

Reference.

Shrink a table and all of its dependent segments (including BASICFILE LOB segments):

ALTER TABLE t2 ENABLE ROW MOVEMENT;
ALTER TABLE t2 SHRINK SPACE CASCADE;

Shrink a BASICFILE LOB segment only:

ALTER TABLE t2 MODIFY LOB (b1) (SHRINK SPACE);

Monday, April 12, 2010

Pl/SQL Development Workflow

Here is the notes taking from book <<Oracle PL/SQL Best Practices>>.

Four steps of preparing an application, special for PL/SQL transactional database API.

Validate program requirements

ask lots of questions
what users ask for is not always the easiest way to solve a problem
consider other approaches, include business processes and programming algorithms

Implement header of the program

good name for the program, accurately represent the purpose of the program
inputs and outputs
overload sub-procedure ?

Define the test cases

Verify it works
how will I know when I am done with this program

Build test code

Testing for correctness:

Have you tested with good and all the different possibilities of bad data
Does the code do the right thing, ... and nothing more.

Tuesday, April 06, 2010

Why Transactional Database API approach?

.

We like (Transactional) Database API approach, because it:

..
We like Transactional Database API approach, because it:

DRY, do not repeat yourself

*. make software components modular, I am totally into modular programming.
*. software modules must carry out a very specific task (and be very efficient at carrying it out).
*. The same APIs are available to all applications in any Language that access the database. No duplication of effort.

Orthogonality

*. each software module should be loosely coupled (to limit dependencies)
-- Put SQL scattered willy-nilly around in Java/C*.JavaScript is high coupled.
-- Schema change, adding table/column should only be changed in database, one place.
-- B calls A, changed A, do not bother to touch B.
*. Make test simpler, easy to setup function and load test.
*. Easy to deploy.
E.g. PL/SQL installation only, no need to touch Java/C*.JavaScript mid tier and UI.
*. Defined clear interface contract.

*. It removes the need for triggers as all inserts, updates and deletes are wrapped in APIs. Instead of writing triggers you simply add the code into the API. I loathe triggers.

*. Clearly separate all database access code (APIs Are King)

*. To understand the consequences of database refactorings, it is important to be able to see how the database is used by the application. If SQL is scattered willy-nilly around the code base, this is very hard to do. As a result, it is important to have a clear database access layer to show where the database is being used and how. To do this we suggest Database API approach.
*. The underlying structure of the database is hidden from the users, so I can make structural changes without client applications being changed. The API implementation can be altered and tuned without affecting the client application.

Control and Responsibility, DevOPS

*. It prevents people who do not understand SQL writing stupid queries.
-- All SQL would be written by Database developers or DBAs, reducing the likelihood of dodgy queries.
*. SDLC: 80% is maintenance, Dealing with Change.
-- Changing the database schema.
-- Migrating the data in the database.
-- Online data fix,
-- Changing the database access code / data process logic.
*. Troubleshooting and firefighting
-- Database developer and DBA can easily get and fix the SQL. Do not bother Java programmers.
*. Tuning SQL. Do not bother C# programmers.

*. Having a clear database layer (APIs Are King) has a number of valuable side benefits. It minimizes the areas of the system where developers need SQL knowledge to manipulate the database, which makes life easier to developers who often are not particularly skilled with SQL. For the database expert it provides a clear section of the code that he can look at to see how the database is being used. This helps in preparing indexes, database optimization, and also looking at the SQL to see how it could be reformulated to perform better. This allows the database expert to get a better understanding of how the database is used.

Profession = High Efficiency + High Quality

*. (ORM) Anything that generates SQL on-the-fly worries me, not just Java. I want to be able to cut and paste the SQL, not try and capture or trace it during a run.

*. More database features and functions
-- Partition
*. Less code, less bug, easy to maintain.

*. Eliminate SQL Parse in host language. Parse consume client host CPU and Server CPU and Latches. PL/SQL keep the SQL cursor cached and opened.
*. Eliminate data round trip; data type conversion, the context switch.
*. Tightly couple the data model and data process design. Database world favor of Up Front Big Design.

*. Maximum the data share and reuse.

Suggestion

*. Business logic, Validation and lots of IF statements can be put and refined in Java with advanced language features, such as OO.

...

This list goes on and on.

Our concept is "build the data API in the database, you call the data API".
The data API encapsulate a transaction in a bit of code. Here we agree - no SQL in the client application, just call stored procedures - well tuned and developed explicitly to do that one thing.

Database API has been layered by different UI technologies over time.

All about API's. The applications don't do table level(select/insert/update/delete) stuff, the apps don't even really know about tables.

On top of database schema we have an API that does the data stuff.
(generally, functions or Ref cursor to retrieve data, procedures to change data)

In the application we call this API but have our own application logic as well
(only application logic is in the application, data logic is - well, right where it belongs - next to the data, waiting for the 'next great programming paradigm to come along')

The fact that our UI is in Java isn't really relevant. You could pretty much see how you would use this package from C#, Java/JSP, Swing, VB, Pro*C, Forms, Powerbuilder, Perl, PHP, Python, a mobile phone, <whatever the heck you want>.

Reference

http://stackoverflow.com/questions/1588149/orm-for-oracle-pl-sql?lq=1

ORM is flawed

http://seldo.com/weblog/2010/07/12/in_defence_of_sql

Performance Anti-Patterns in Database-Driven Applications ,
http://www.infoq.com/articles/Anti-Patterns-Alois-Reitbauer

Misuse of O/R Mappers
Load More Data Then Needed
Inadequate Usage of Resources
One Bunch of Everything

http://tonyandrews.blogspot.ca/search/label/business%20logic

Monday, March 29, 2010

Setup StatsPack to monitor standby database performance

AWR/ADDM are not supported (and don't work) against a read only standby database, so they are not useful for diagnostic for an Active Data Guard environment.

A modifed version of statspack availabe on metalink (note 454848.1) can be deployed on the primary production database. This modified version uses a STDBYPERF schema which writes locally but reads remotely from PERFSTAT@standbydb.

Initial Setup (Oracle 11.1.0.7)

export ORACLE_SID=standbydb
cd $ORACLE_HOME/rdbms/admin
sql
CREATE SMALLFILE TABLESPACE PERFSTAT DATAFILE '+DATA1/perfstat01.dbf' SIZE 4G
 LOGGING 
 EXTENT MANAGEMENT LOCAL 
 SEGMENT SPACE MANAGEMENT AUTO;
@spdrop
@spcreate
exit;

sql
@sbcreate

--wait a second or two, the account has to propagate to the standby

@sbaddins

Configure Snapshots

Remember that PERFSTAT is not automatically purged/managed by the database, so this must be done by the DBA or scheduled.

-- workaround to avoid serious performance problem due to Bug 8323663
. setdb ordprod2
sql
execute dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KCCTS', estimate_percent => 20 );
execute dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KCFIO', estimate_percent => 20 );
execute dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KCBFWAIT', estimate_percent => 20 );
execute dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KCCFN', estimate_percent => 20 );
execute dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KCCFE', estimate_percent => 20 );


grant create job to stdbyperf;
grant manage scheduler to stdbyperf;

conn stdbyperf/***

-- if this takes more than ~40 seconds there is a performance issue that will need to be traced
exec statspack_ordprod1.snap;

-- automate stats collection every 30 min during Mon-Fri 7am-5pm

exec dbms_scheduler.disable('ordrpt_statspack_snap_daily');

begin
   dbms_scheduler.create_job
   (
      job_name => 'ordrpt_statspack_snap_daily',
      job_type => 'PLSQL_BLOCK',
      job_action => 'begin statspack_ordprod1.snap; end;',
      repeat_interval => 'FREQ=MINUTELY; INTERVAL=30; BYHOUR=7,8,9,10,11,12,13,14,15,16,17; BYDAY=MON,TUE,WED,THU,FRI',
      enabled => true,
      comments => 'Take statspack snapshot on remote standby db ordrpt daily Mon to Fri 7am to 5pm'
   );
end;
/

exec dbms_scheduler.enable('ordrpt_statspack_snap_daily');

Create Reports

Similar to AWR reports, run reports from caldbrac02

SQL> conn stbyperf/***

SQL> @?/rdbms/admin/sbreport

-- follows prompts to enter dbid, instanceid, snapshot intervals, output is text file

Author: Blair Boadway | Production DBA

Tuesday, March 16, 2010

Query paging row count

Do we need to return the accurate query row count?

We will always evaluate the query performance and cost to get the row count, and based on function/features priority and data accuracy requirements, then decide which option to go.

Also assess which option list below that the end users value more?

The quick response time to return the first page

to get the exact row count.

Here are some options I know so far:

Materialize the row count in MV(Summary table), schedule a regular job to refresh the summary data.
Estimate the row count, by CBO parse

Here is the idea, check the plan output Rows column.

The CBO (Cost Base Optimizer) estimate it when generate the SQL execution plan.
CBO get it from object(table/index) stats, e.g. dba_tables.num_rows .
But not touching the table.

scott@txdev_FLURRY> set autot trace exp
scott@txdev_FLURRY>
scott@txdev_FLURRY> select * from emp;
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    14 |   518 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| EMP  |    14 |   518 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Estimate the row count, with oracles SAMPLE clause
Hard limit in Transactional database API,

Show user that there could be more rows than hard code limit, and always enable [Next Page] button, and dynamic change or increase the count number as paging forward.

SELECT count(*) from scott.emp where rownum <= 100;

Wednesday, March 10, 2010

SOA - Transactional Database API - Database access guideline

"Applications come, applications go, they crash, they change, they are fleeting."
The data however -- that'll be here long.

Programming language come, programming languages go, DBMSs come, DBMSs go,

In fact, I've seen successful Cobol, Fortran, VB, Perl, PHP, Java, Python, whatever language; FoxPro, Sybase, DB2, Oracle, MySQL, Cassandra HBase, whatever database.

The data, well, Our POs(orders) and Clients data have been in a database for over 12 years now -- the applications that access (should say "have accessed in the past for they no longer exist") it come and they go and there will be many many many more in the future.

The solution is there: SOA (Modular programming)

SOA is a design principle, whereas web services is an implementation technology. You can build a Service-Oriented Application with Python, .Net, C, Java, or PL/SQL, T-SQL.

The basic building block... is the service. A service is a self-contained software module that performs a predetermined task: e.g. a business transaction to submit order, audit the changes, update inventory, update finance data, queue the changes to asynchronous batch notice buyers ...; that comes to the Transactional database API.

The main theme behind SOA is to find the appropriate modularity ... where the modules don't have too much tight coupling. such as UI, Business rule and DAL.

An typical bad example is the Buyer Order Tracker UI, the reporting UI rule is tightly coupled with data access, make it so hard to refactor the data access part. Buyer Order Track has made database performance plunged many times.

If we implement the SOA, the database handle the data access, and the Java handle the UI, it'll be a piece of cake to fix the Buyer Order Track database performance issue.

When we want to migrate from Oracle to Sybase, just write a new DB API on Sybase, UI application call the new API.
When we want to migrate from .Net to Java, just write a new UI module, that calls the same DB API to access the data.

Notes: many statements are referenced from AskTom