Thursday, January 22, 2009

Output from 10053 trace and dbms_xplan.display_cursor

Oracle 10.1.0.4 on RedHat Linux 3.0

*************************************
PARAMETERS WITH ALTERED VALUES
******************************
parallel_threads_per_cpu = 4
db_file_multiblock_read_count = 32
optimizer_mode = first_rows_10
cursor_sharing = similar
optimizer_index_caching = 25
query_rewrite_enabled = false
query_rewrite_integrity = stale_tolerated
optimizer_dynamic_sampling = 0
*************************************
PARAMETERS WITH DEFAULT VALUES
******************************
optimizer_features_enable = 10.1.0.4

***************************************
BASE STATISTICAL INFORMATION
***********************
Table stats Table: PO_ITEM Alias: PO_ITEM
TOTAL :: CDN: 81294 NBLKS: 3257 AVG_ROW_LEN: 132
Index stats
Index: PO_ITEM_POID COL#: 2
TOTAL :: LVLS: 2 #LB: 365 #DK: 80100 LB/K: 1 DB/K: 1 CLUF: 3191
Index: PO_ITEM_STATUS_RANDOM_UPDDT COL#: 4 6
TOTAL :: LVLS: 2 #LB: 507 #DK: 80100 LB/K: 1 DB/K: 1 CLUF: 45647
Index: PO_ITEM_STATUS_UPDDT COL#: 3 6
TOTAL :: LVLS: 2 #LB: 507 #DK: 80100 LB/K: 1 DB/K: 1 CLUF: 80100
***************************************
SINGLE TABLE ACCESS PATH
COLUMN: POID(NUMBER) Col#: 2 Table: PO_ITEM Alias: PO_ITEM
Size: 5 NDV: 80100 Nulls: 0 Density: 1.2484e-05 Min: 40 Max: 88988
COLUMN: STATUS_RAN(CHARACTER) Col#: 4 Table: PO_ITEM Alias: PO_ITEM
Size: 3 NDV: 20 Nulls: 0 Density: 6.1505e-06
Histogram: Freq #Bkts: 20 UncompBkts: 3985 EndPtVals: 20
TABLE: PO_ITEM Alias: PO_ITEM
Original Card: 81294 Rounded: 1 Computed: 0.00 Non Adjusted: 0.00
Access Path: table-scan Resc: 393 Resp: 393
Access Path: index (equal)
Index: PO_ITEM_POID
rsc_cpu: 24314 rsc_io: 4
ix_sel: 1.2670e-05 ix_sel_with_filters: 1.2670e-05
Access Path: index (scan)
Index: PO_ITEM_STATUS_RANDOM_UPDDT
rsc_cpu: 23778 rsc_io: 4
ix_sel: 6.2422e-06 ix_sel_with_filters: 6.2422e-06
Access Path: index (equal)
Index: PO_ITEM_POID
rsc_cpu: 17832 rsc_io: 3
ix_sel: 1.2670e-05 ix_sel_with_filters: 1.2670e-05
Access Path: index (index-only)
Index: PO_ITEM_STATUS_RANDOM_UPDDT
rsc_cpu: 17632 rsc_io: 3
ix_sel: 6.2422e-06 ix_sel_with_filters: 6.2422e-06
SORT resource Sort statistics
Sort width: 148 Area size: 524288 Max Area size: 26214400
Degree: 1
Blocks to Sort: 1 Row size: 21 Total Rows: 1
Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
Total IO sort cost: 0 Total CPU sort cost: 15225730
Total Temp space used: 0
BEST_CST: 4.00 PATH: 4 Degree: 1
***************************************
GENERAL PLANS
***********************
Join order[1]: PO_ITEM[PO_ITEM]#0
ORDER BY sort
SORT resource Sort statistics
Sort width: 148 Area size: 524288 Max Area size: 26214400
Degree: 1
Blocks to Sort: 1 Row size: 156 Total Rows: 1
Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
Total IO sort cost: 0 Total CPU sort cost: 15225730
Total Temp space used: 0
Best so far: TABLE#: 0 CST: 5 CDN: 1 BYTES: 132
(newjo-stop-1) k:0, spcnt:0, perm:1, maxperm:80000
SORT resource Sort statistics
Sort width: 148 Area size: 524288 Max Area size: 26214400
Degree: 1
Blocks to Sort: 1 Row size: 156 Total Rows: 1
Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
Total IO sort cost: 0 Total CPU sort cost: 15225730
Total Temp space used: 0
Final - All Rows Plan:
JOIN ORDER: 1
CST: 5 CDN: 1 RSC: 5 RSP: 5 BYTES: 132
IO-RSC: 4 IO-RSP: 4 CPU-RSC: 15249507 CPU-RSP: 15249507


SQL> SELECT /*+ gather_plan_statistics */ * FROM po_item WHERE poid = :a AND STATUS_random = :b ORDER BY poitemid;

SQL> SELECT * FROM table (Cast(dbms_xplan.display_cursor(NULL,NULL, 'runstats_last') as SYS.DBMS_XPLAN_TYPE_TABLE));

PLAN_TABLE_OUTPUT
-----------------

SQL_ID cp44zcmqma34f, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ * FROM po_item WHERE poid = :a AND STATUS_random = :b ORDER BY poitemid

Plan hash value: 3970202189

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers | Reads | Writes | A-Time |
------------------------------------------------------------------------------------------------------------------------------
| 1 | SORT ORDER BY | | 1 | 0 | 2377 | 0 | 0 |00:00:00.01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| PO_ITEM | 1 | 0 | 2377 | 0 | 0 |00:00:00.01 |
|* 3 | INDEX RANGE SCAN | PO_ITEM_STATUS_RANDOM_UPDDT | 1 | 4153 | 29 | 0 | 0 |00:00:00.02 |
------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1
2 - SEL$1 / PO_ITEM@SEL$1
3 - SEL$1 / PO_ITEM@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("POID"=:A)
3 - access("STATUS_RANDOM"=:B)


27 rows selected.

SQL> set autot on
SQL>
SELECT /* */ * FROM po_item WHERE poid = :a AND STATUS_random = :b ORDER BY poitemid;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3115544074

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 132 | 5 (20)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 132 | 5 (20)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| PO_ITEM | 1 | 132 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | PO_ITEM_POID | 1 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS_RANDOM"=:B)
3 - access("POID"=TO_NUMBER(:A))

Fix SQL index access path issue with 10053 trace and hack the stats

Goal
----
Find out why SQL optimizer select a bad index access path when there are 2 predicates,
and each predicate column got an index created on it.

# SQL
SELECT /* */ * FROM po_item WHERE poid = :a AND STATUS = :b ORDER BY poitemid;

INDEX PO_ITEM_POID ON PO_ITEM(POID)
INDEX PO_ITEM_STATUS_UPDDT ON PO_ITEM(STATUS, STATUSUPDDT)

It should access data by index on POID column, but it suddenly changed to use index on STATUS column.

Solution
--------
Understand CBO,experiment,observe and interpret.
With SQL session 10053 trace and dbms_xplan.display_cursor().

Result
------
When database collect histogram stats on a column, it calculate a new density for the column,
and it will be extreme smaller, CBO use density as selectivity.
After we hack the column density, database will clear the histogram stats,
the CBO will use 1/NumberOfDistinct, ignore density.

When we check the output of 10053 trace,
both IO costs are same, but index selectivity (ix_sel) and table selectivity (ix_sel_with_filters)
on Index: PO_ITEM_STATUS_RANDOM_UPDDT is smaller than than it on Index: PO_ITEM_POID.

Access Path: index (equal)
Index: PO_ITEM_POID
rsc_cpu: 17832 rsc_io: 3
ix_sel: 1.2670e-05 ix_sel_with_filters: 1.2670e-05
Access Path: index (index-only)
Index: PO_ITEM_STATUS_RANDOM_UPDDT
rsc_cpu: 17632 rsc_io: 3
ix_sel: 6.2422e-06 ix_sel_with_filters: 6.2422e-06

*) Fix: hack the density, make it accurate.


10053 trace output and dbms_xplan page link.


Experiment
----------

InBox
-----

Next Action
-----------

Complete
--------
create test table po_item
- evenly distribution Mod(), value from '01' to '20'
- random distribution dbms_random.value() '01' to '20'
- normal distribution dbms_random.normal() '01' to '99'
create index
collect table stats
collect column STATUS histogram stats
-- This step skipped: run SQL UPDATE to generate required data distribution
run SQL
- enable 10053 trace
- dbms_xplan.display_cursor()
- set autotrace on


*) Purchase_Order Item table

drop TABLE PO_ITEM;
CREATE TABLE PO_ITEM
(
POITEMID NUMBER(8),
POID NUMBER(8),
STATUS CHAR(2 BYTE),
STATUS_random CHAR(2 BYTE),
STATUS_normal CHAR(2 BYTE),
STATUSUPDDT DATE,
ITEMTITLE VARCHAR2(50 BYTE),
ITEMAUTHOR VARCHAR2(50 BYTE),
ITEMPRICE NUMBER(8,2)
)
nologging;

execute dbms_random.seed(0)

insert /*+ append */ into po_item
with g as (
select -- materialize
level
from dual
connect by level <= 2000
)
select
/*+ leading(v1,v2,v3) use_nl(v2) use_nl(v3) */
rownum
,Round(rownum/0.9,0)
,To_Char(Round(Mod(rownum,100)))
,To_Char(Round(dbms_random.value(1,20)))
,LTrim(To_Char(ABS(Mod(10 * dbms_random.normal,100)),'99'))
,To_Date('1999','YYYY') + rownum/86400
,rpad(rownum,50,'x')
,rpad(rownum,50,'*')
,Round(dbms_random.value(1,100),2)
from
g v1
,g v2
,g v3
where
rownum <= 80100;

commit;
exec dbms_stats.gather_table_stats(user,'PO_ITEM');

-- Index on POID column

CREATE INDEX PO_ITEM_POID ON PO_ITEM
(POID)
NoLogging;

-- Index on column STATUS and STATUS_UPD_DATE

CREATE INDEX PO_ITEM_STATUS_UPDDT ON PO_ITEM
(STATUS, STATUSUPDDT)
NoLogging;

-- Index on column STATUS_random and STATUS_UPD_DATE

CREATE INDEX PO_ITEM_STATUS_RANDOM_UPDDT ON PO_ITEM
(STATUS_random, STATUSUPDDT)
NoLogging;

exec dbms_stats.gather_table_stats(user,'PO_ITEM');

*) init parameter
alter system set optimizer_index_cost_adj=25;
alter system set optimizer_index_cost_adj=100;
alter system set optimizer_index_caching=0;
alter system set optimizer_index_caching=25;
alter system set optimizer_mode = FIRST_ROWS_10;
alter system flush shared_pool;
alter system flush BUFFER_CACHE;

set autot off
set serveroutput off
variable a NUMBER;
variable b VARCHAR2(2);

exec :a := 72036876;
exec :a := 70100;
exec :b := '98';
exec :b := '17';

*) enable 10053 trace

ALTER SESSION SET TRACEFILE_IDENTIFIER="i";
ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

SELECT /* */ * FROM po_item WHERE poid = :a AND STATUS = :b ORDER BY poitemid;
SELECT /* */ * FROM po_item WHERE poid = :a AND STATUS_random = :b ORDER BY poitemid;
SELECT * FROM table (DBMS_XPLAN.DISPLAY_CURSOR);

alter session set events '10053 trace name context off';

*) display SQL execution plan

-- uses the hint gather_plan_statistics to enable the generation of the execution statistics.
set linesize 300
SELECT /*+ gather_plan_statistics */ * FROM po_item WHERE poid = :a AND STATUS = :b ORDER BY poitemid;
SELECT * FROM table (Cast(dbms_xplan.display_cursor(NULL,NULL, 'runstats_last') as SYS.DBMS_XPLAN_TYPE_TABLE));

SELECT /*+ gather_plan_statistics */ * FROM po_item WHERE poid = :a AND STATUS_random = :b ORDER BY poitemid;
SELECT * FROM table (Cast(dbms_xplan.display_cursor(NULL,NULL, 'runstats_last') as SYS.DBMS_XPLAN_TYPE_TABLE));

set autot trace exp
SELECT /*+ gather_plan_statistics */ * FROM po_item WHERE poid = :a AND STATUS = :b ORDER BY poitemid;


-- gather histogram stats on column STATUS
begin
dbms_stats.gather_table_stats(
user,
'po_item',
cascade => true,
estimate_percent => 0.1,
method_opt => 'for columns size 254 status,status_random,status_normal size 100'
);
end;
/

-- hack column stats
begin
DBMS_STATS.SET_COLUMN_STATS (
ownname => USER,
tabname => 'PO_ITEM',
colname => 'STATUS',
-- distcnt => 20,
density => 0.05,
force => True
);
end;
/


-- Compare index access on right index
set autot trace

SELECT /*+ index(po_item,PO_ITEM_POID) */ * FROM PO_ITEM
WHERE poid = :a AND STATUS = :b ORDER BY poitemid;


Reference
---------
*) origianl table and SQL:
abedba.ABEPOITEMS
SELECT /* */ * FROM ABEPOITEMS WHERE ABEPOID = :a AND STATUS = :b ORDER BY ABEPOITEMID;

*) Get stats

-- Display table stats
set autot off
set SQLPATH=A:\maint\monitor;A:\adm\maint

@table scott po_item

-- check CBO parameter
select name,value,isdefault from v$sql_optimizer_env
where sql_id = 'cp44zcmqma34f';

-- Query to check the value of input hidden "parameter_name"
col indx format 9999
col inst_id heading "INST" format 9999
col ksppinm heading "NAME" format a30
col ksppdesc heading "DESC" format a40
col ksppstvl heading "CURR VAL" format a15
col ksppstdvl heading "DEFAULT VAL" format a15

set echo on
set autot off
select v.indx,v.inst_id,ksppinm p_name,ksppstvl curr_val,ksppstdvl default_val,ksppdesc p_desc
from x$ksppi i ,x$ksppcv v
where i.indx=v.indx and ksppinm like '%sort_elimination_cost_ratio%';

-- display real value for low_value and high_value on view user_tab_col_statistics
DECLARE
l_low_value user_tab_col_statistics.low_value%TYPE;
l_high_value user_tab_col_statistics.high_value%TYPE;
l_val1 ABEPOITEMS.STATUS%TYPE;
BEGIN
SELECT low_value, high_value
INTO l_low_value, l_high_value
FROM user_tab_col_statistics
WHERE table_name = 'PO_ITEM'
AND column_name = 'STATUS';

dbms_stats.convert_raw_value(l_low_value, l_val1);
dbms_output.put_line('low_value: ' || l_val1);
dbms_stats.convert_raw_value(l_high_value, l_val1);
dbms_output.put_line('high_value: ' || l_val1);

dbms_output.put_line('high_value: ' || l_low_value||', '||l_high_value);

END;
/

exec DBMS_STATS.UnLOCK_TABLE_STATS(user,'ABEPOitemS');

-- Verify data distribution
select status,count(*)
from abedba.po_item
group by status;

STATUS COUNT(*)
01 2795
04 619
05 252
10 1334827
11 7298222
12 232920
14 313380
15 1912919
16 8764
17 233443
18 116935
19 549803
24 748465
25 356031
35 12905639
45 7594712
50 235
51 75338
52 334234
53 22
54 22325
55 5110
56 12311
60 3
66 245
90 215
97 54017
98 83
--

*) hack object stats

-- hack index stats. --old 3931, new 40100
begin
DBMS_STATS.SET_INDEX_STATS (
ownname => USER,
indname => 'PO_ITEM_POID',
clstfct => 50);
end;
/

begin
DBMS_STATS.SET_INDEX_STATS (
ownname => 'SCOTT',
indname => 'PO_ITEM_STATUS_UPDDT',
clstfct => 1000);
end;
/

-- hack column stats
begin
DBMS_STATS.SET_COLUMN_STATS (
ownname => USER,
tabname => 'PO_ITEM',
colname => 'STATUS',
density => 0.05);
end;
/

*) Export and Import the table stats

begin
dbms_stats.create_stat_table(ownname => user,
stattab => 'MYSTATS',
tblspace => 'USERS');
end;
/

begin
DBMS_STATS.EXPORT_TABLE_STATS (
ownname => USER,
tabname => 'PO_ITEM',
stattab => 'MYSTATS');
end;
/

exec dbms_stats.delete_table_stats(ownname => user,tabname => 'PO_ITEM')

begin
DBMS_STATS.IMPORT_TABLE_STATS (
ownname => USER,
tabname => 'PO_ITEM',
stattab => 'MYSTATS',
force => true);
end;
/

*) others

-- Tune sort index over the access predicate index
alter session set "_sort_elimination_cost_ratio" = 0;

SELECT /*x index(po_item,PO_ITEM_STATUS_UPDDT) */ * FROM PO_ITEM
WHERE poid = :a AND STATUS = :b ORDER BY poitemid;

alter session set "_sort_elimination_cost_ratio" = 1;

SELECT /*x index(po_item,PO_ITEM_STATUS_UPDDT) */ * FROM PO_ITEM
WHERE poid = :a AND STATUS = :b ORDER BY poitemid;



QingDao, China

Wednesday, January 14, 2009

Overview asynchronous process change data

Goal
----
Split function :F1 from funcation :F2, asynchronous the process,
to gain scalability(performance and reliability) and availability.

:F1 = check deleted book
:F2 = upload inventory data

*) Business logic background for this example

In our 4 node RAC database, :F2 = upload inventory data runs on node 4,
:F1 = check deleted book runs on node 1 frequently, 50 times/second;
:F1 is hitting the data randomly, 30 rows a page will probably touch 30 data blocks.
This caused lots of Global Cache inter-connect traffic from node 4 to node 1.
Based on our AWR report, it is

Network band width:
= Network requests * db_block_size
= 218104 * 4096
= 893,353,984 (Transfer about 900Mbytes data in 15 minutes)

After implement asynchronous process, the GC transfer volume reduced 200 times for this function,

The F1 check deleted book performance SLA improved 100 times.


Figure-1: Topology process diagram

*) Keyword: Loose coupling, Function split, Partition data by function and range/key list.

Solution
--------
Queue the change data (deleted book listing_id) into a staging table,
batch process and merge it to target table (book_delete_check) every N minutes,
to reduce the network traffic.

*) Understanding: we will lose unprocessed data if we use SQL start date as end tag timestamp
*) Experiment and Observation:
- replace Synonym will not require DDL lock, not safe
- in MERGE SQL, DELETE share UPDATE predicate, and will not delete INSERT rows
- query gv$ view may fail in a RAC database when there is not enough parallel processes on each instance
*) Interpretation: use predicate stage_table.update_date < v$transaction.start_date

Approach
--------

*) mark rowid into a temporary table, process and delete by rowid in the temporary table
*) update_date < Least(SysDate,v$transaction.start_date)
*) SCN_TO_TIMESTAMP(ORA_ROWSCN) < sysdate
- http://www.ningoo.net/html/2009/about_ora_rowscn.html
*) Exchange one data partition with a swap table
*) RETURNING rows BULK COLLECT INTO UDT(User Define Type schema object collection)
*) Many partitions on Mod(to_number(to_char(update_date,'MI')),10)
- Partition table to spread the DDL/DML lock contention
- so N partitions on a centain interval based on your transaction volume and DML lock frequency
- exchange each partion to each swap table
- process the swap table
- truncate the swap table
- the DDL lock will not happen very often
- if partition locked, insert a row to stage_proc_lock table
- if partition locked 4 times continuously
# process one partition with one of above single table method
# delete the locked partition log row
*) Partition view on heap tables for Oracle Standard Edition
*) Oracle Stream
*) Oracle AQ
*) Other Messaging tool

Setup
-----

-- target table
drop table book_del_check;

create table book_del_check
(
listingsid number(20),
upd_dd number(2)
, constraint book_del_check_pk primary key (listingsid )
)
organization index
including upd_dd
overflow
;

-- staging table
-- Option n+2, for small TX volume
drop table book_delete_stage;

create table book_delete_stage(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date date default sysdate
)
ROWDEPENDENCIES
nologging;

create or replace view book_delete_stage_v
as
select listingsid, delete_flag, To_Number(To_Char(update_date,'DD')) upd_day, upd_time
from
(SELECT listingsid, delete_flag,update_date,
SCN_TO_TIMESTAMP(ORA_ROWSCN) upd_time,
Row_Number() OVER (PARTITION BY listingsid order by update_date desc) rn
FROM book_delete_stage b
)
where rn = 1
;


drop TYPE book_chg_tab;
drop TYPE book_chg_rec;

create or replace TYPE book_chg_rec AS OBJECT
(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date timestamp(3)
);
/

create or replace TYPE book_chg_tab IS TABLE OF book_chg_rec;
/

drop table gt_book_delete_stage_i;
drop table gt_book_delete_stage_d;
drop table gt_book_delete_stage_rid;

create global temporary table gt_book_delete_stage_i
(
listingsid number(20),
qty number(5)
)
ON COMMIT DELETE ROWS
;
create global temporary table gt_book_delete_stage_d
(
listingsid number(20),
qty number(5)
)
ON COMMIT DELETE ROWS
;

create global temporary table gt_book_delete_stage_rid
(
rid rowid,
listingsid number(20)
)
ON COMMIT DELETE ROWS
;

drop table stage_proc_lock;
create table stage_proc_lock
(
stage_table varchar2(30),
part_key number(2)
)
tablespace index_auto;

create index stage_proc_lock_i1 on stage_proc_lock(stage_table,part_key)
nologging tablespace index_auto;

DROP TABLE book_delete_stage_part;

CREATE TABLE book_delete_stage_part (
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date timestamp(3) default SysTimestamp,
part_mi number(2) default mod(to_number(to_char(sysdate,'MI')),10)
)
PARTITION BY LIST (part_mi) (
PARTITION data_part_0 VALUES (0),
PARTITION data_part_1 VALUES (1),
PARTITION data_part_2 VALUES (2),
PARTITION data_part_3 VALUES (3),
PARTITION data_part_4 VALUES (4),
PARTITION data_part_5 VALUES (5),
PARTITION data_part_6 VALUES (6),
PARTITION data_part_7 VALUES (7),
PARTITION data_part_8 VALUES (8),
PARTITION data_part_9 VALUES (9),
PARTITION dummy_null VALUES (NULL),
PARTITION dummy_other VALUES (DEFAULT)
)
tablespace data_auto;

DROP TABLE book_delete_stage_swap_0;
DROP TABLE book_delete_stage_swap_1;
DROP TABLE book_delete_stage_swap_2;
DROP TABLE book_delete_stage_swap_3;
DROP TABLE book_delete_stage_swap_4;
DROP TABLE book_delete_stage_swap_5;
DROP TABLE book_delete_stage_swap_6;
DROP TABLE book_delete_stage_swap_7;
DROP TABLE book_delete_stage_swap_8;
DROP TABLE book_delete_stage_swap_9;

CREATE TABLE book_delete_stage_swap_0 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_1 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_2 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_3 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_4 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_5 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_6 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_7 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_8 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_9 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;


-- Package
@BOOK_CHG_SYNC_PKG.SQL

alter package book_chg_sync_pkg compile;
-- to avoid ORA-4061
alter package list_trigger_pkg compile;

-- to avoid ORA-4061
-- Synonym
create or replace synonym BOOK_CHG_SYNC_api for BOOK_CHG_SYNC_PKG;

grant execute on book_chg_sync_api to app_role;

-- Schedule job
exec dbms_scheduler.stop_job(job_name=> 'merge_book_del_change_delta', force => True);
exec dbms_scheduler.DROP_job ( job_name => 'merge_book_del_change_delta' );

begin
dbms_scheduler.create_job
(
job_name => 'merge_book_del_change_delta',
job_type => 'PLSQL_BLOCK',
job_action => 'begin book_chg_sync_pkg.upd_delete_track; end;',
repeat_interval => 'FREQ=MINUTELY; INTERVAL=3',
enabled => true,
comments => 'To clear all the pending delta data, merge into book_del_check table'
);
end;
/

Idea
----
-- Option n, online alter synonym pointer, no DDL lock, inconsistent data, unsafe
drop table book_delete_stage;

drop table book_delete_stage_t1;

create table book_delete_stage_t1(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date date default sysdate
)
tablespace data_auto;

create or replace synonym book_delete_stage for book_delete_stage_t1;

drop table book_delete_stage_t2;

create table book_delete_stage_t2(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date date default sysdate
)
tablespace data_auto;

-- no DDL lock after DML done.
insert into book_delete_stage
values(1,1,1,sysdate);
commit;

create or replace synonym book_delete_stage for book_delete_stage_t2;

select * from book_delete_stage_t1;
select * from book_delete_stage_t2;
select * from book_delete_stage;

-- Option n+1, online exchange table partition
-- for large TX volume
-- Goal: Partition table to spread the contention

*) Constraint : Must be run serially

*) Process

swap out the partitions Mod(minute,10) are not locked by DML
insert into temporary table
process
commit
truncate the swap table

*) Test Case

Insert 5 rows to each partition
DML lock on 2 partitions
exec book_chg_sync_pkg.merge_delta_multi_part;
- Swap out other 8 partitions

execute book_chg_sync_pkg.merge_delta_rowscn;



Test case
---------
*) Function

- Delete some listings
- Wait 5 minutes, run book_delete_check query, these listings should disappear.

-) Restorea some listings
-) Wait 5 minutes, run book_delete_check query, these listings should come out.

*) Network performance - Test by Production DBA

To check
Global Cache and Enqueue Services - Workload Characteristics
-Avg global enqueue get time (ms): 0.2
-Avg global cache cr block receive time (ms): 3.7
-Avg global cache current block receive time (ms): 2.5
-Avg global cache cr block build time (ms): 0.0
-Avg global cache cr block send time (ms): 0.1
-Avg global cache current block pin time (ms): 0.0
-Avg global cache current block send time (ms):0.1
Global Cache and Enqueue Services - Messaging Statistics
-Avg message sent queue time (ms): 50.8
-Avg message sent queue time on ksxp (ms): 1.0
-Avg message received queue time (ms): 0.0

Test code
---------

-- Option n+1, online exchange table partition
-- for large TX volume
-- Goal: Partition table to spread the contention

-- Test DELETE in MERGE. (D.upd_dd <> S.upd_day or S.delete_flag = 0).
insert into book_delete_stage_part(listingsid,qty,delete_flag,update_date, part_mi)
values(7,5,0,trunc(sysdate,'DD') + 0, 4);

insert into book_delete_stage_part(listingsid,qty,delete_flag,update_date, part_mi)
values(7,5,0,trunc(sysdate,'DD') + 4, 4);
commit;
insert into book_delete_stage_part(listingsid,qty,delete_flag,update_date, part_mi)
values(7,0,1,trunc(sysdate,'DD') + 4, 4);
commit;


select * from book_delete_stage_part;
select * from book_delete_stage_swap_7;
select * from book_del_check where listingsid = 7;

SQL> Exec book_chg_sync_pkg.merge_delta_multi_part;

select * from book_delete_stage_part;
select * from book_delete_stage_swap_7;
select * from book_del_check where listingsid = 7;

--Generate testing data
insert into book_delete_stage_part(listingsid,delete_flag,update_date, part_mi)
select trunc(dbms_random.value(1,5000)), mod(level,2), trunc(sysdate,'DD') + mod(level,60)/1440,
mod(level,10)
from dual
connect by level <= 7000;

SQL> Exec book_chg_sync_pkg.merge_delta_multi_part;


Reference
---------

-- Exchange partition

--DDL With the WAIT Option, number of seconds, 11g new
alter session set DDL_LOCK_TIMEOUT=120;

-- ORA-00054: resource busy and acquire with NOWAIT specified
ALTER TABLE book_delete_stage_part
EXCHANGE PARTITION data_part_5
WITH TABLE book_delete_stage_swap_5
INCLUDING INDEXES
WITHOUT VALIDATION
UPDATE GLOBAL INDEXES
;

--DDL With the WAIT Option, number of seconds, 11g new
alter session set DDL_LOCK_TIMEOUT=0;

-- check TX view, 1000 LIO
SELECT to_char(t.start_date,'MI'), Min(t.start_date), count(*)
FROM gv$transaction t
,gv$locked_object lo
-- FROM v$transaction t
-- ,v$locked_object lo
,sys.obj$ do
WHERE do.NAME = 'BOOK_DELETE_STAGE'
AND do.obj# = lo.object_id
AND lo.xidusn = t.xidusn
group by to_char(t.start_date,'MI');

-- Get partition number, skip locked partition
-- check TX view, 1000 LIO
select Trim(to_char(part_num)) part_num bulk collect into lt_part_num from
(
select (level - 1) part_num from dual connect by level <= 10
minus
SELECT Mod(to_number(to_char(t.start_date,'MI')),10) mi --, count(*)
FROM gv$transaction t
,gv$locked_object lo
-- FROM v$transaction t
-- ,v$locked_object lo
,sys.obj$ do
WHERE do.NAME = 'BOOK_DELETE_STAGE_PART'
AND do.obj# = lo.object_id
AND lo.xidusn = t.xidusn
group by Mod(to_number(to_char(t.start_date,'MI')),10)
);

*)AWR report RAC section anylysis

15 minutes interval in the morning.

The :F1 book_delete_check query used 90% network resource. Make all other database request that compete for network resource (Global cache transfer) worse and worse.
Begin Snap: 13-Nov-08 10:45:20
End Snap: 13-Nov-08 11:00:10

Our practical target is to reduce the network usage 10 to 30 times by applying BASE solution.
DBA can explain why in the story gathering meeting

Network requests :
= (CR Blocks Received + Current Blocks Received) + messages sent + messages received
= (96884 + 12168 ) * 2
= 218104

Note: CR is consistent reads, Current is reads on current data block from disk

Network band width:
= Network requests * db_block_size
= 218104 * 4096
= 893,353,984 (Transfer about 900Mbytes in 15 minutes)

Avg book_delete_check query wait time on network delay:
= ( CR Blocks Received * Avg global cache cr block receive time ms
+ Current Blocks Received * Avg global cache current block receive time ms
+ Global Cache Grant messages * Avg message sent queue time ms
) / Executions / 1000 ms
= (96884 * 60.6 + 12168 * 72.8 + (96884 + 12168) * 201) /45211/1000
= 0.634280436 Seconds

Monday, January 12, 2009

How to generate testing data

Goal
----
Generate testing data

Interpret
---------
Many times you need to build a test case to prove your ideas or prototype, benchmark, ask questions ... etc.

Tom: You would better supply very very simple create tables and insert statements.
The SMALLEST create table possible (no tablespaces, no schema names, just like I do in my examples for you)

Solution
--------

*) 500 distinct value, to get centain data distribution

Mod(rownum,500)
dbms_random.value(1,500)
dbms_random.normal()

*) string

rpad('x',500,'x')
lpad(RowNum,200,'*')
dbms_random.string('l',500)
# 'u', 'U' - returning string in uppercase alpha characters
# 'l', 'L' - returning string in lowercase alpha characters
# 'p': any printable char. this one is slow

*) source

# dual connect by level <= 3000
# all_objects
# WITH subquery factor

*) Random order by

ORDER BY dbms_random.random;

*) Example

SQL>
--Resets the seed
exec dbms_random.seed(1);

drop table t1;
create table t1
nologging -- adjust as necessary
as
with g as (
select -- materialize
level
from dual
connect by level <= 2000
)
select
/*x leading(v1,v2,v3) use_merge(v2) use_merge(v3) */
/*+ leading(v1,v2,v3) use_nl(v2) use_nl(v3) */
rownum id,
Round(dbms_random.normal * 1000,3) n1,
Mod(rownum,500) n5,
Mod(rownum,400) n4,
Round(dbms_random.value(1,700)) n7b
,rpad(rownum,500,'x') s5
-- ,dbms_random.string('l',20) s5b
,Trunc(sysdate,'YYYY') + Mod(rownum,1440)/1440 update_date
from
g v1
,g v2
,g v3
where
rownum <= 111222
-- order by dbms_random.random
;

create index t1_i1 on t1(n5) nologging ;

exec dbms_stats.gather_table_stats(user,'T1');

# MuJiang

DROP TABLE abelisting.book_delete_stage_part;

CREATE TABLE abelisting.book_delete_stage_part (
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date timestamp(3) default SysTimestamp,
part_mi number(2) default mod(to_number(to_char(sysdate,'MI')),10)
)
PARTITION BY LIST (part_mi) (
PARTITION data_part_0 VALUES (0),
PARTITION data_part_1 VALUES (1),
PARTITION data_part_2 VALUES (2),
PARTITION data_part_3 VALUES (3),
PARTITION data_part_4 VALUES (4),
PARTITION data_part_5 VALUES (5),
PARTITION data_part_6 VALUES (6),
PARTITION data_part_7 VALUES (7),
PARTITION data_part_8 VALUES (8),
PARTITION data_part_9 VALUES (9),
PARTITION dummy_null VALUES (NULL),
PARTITION dummy_other VALUES (DEFAULT)
)
tablespace data_auto;

-- 5000 distinct values, 2000 duplicate values
insert into abelisting.book_delete_stage_part(listingsid,delete_flag,update_date, part_mi)
select trunc(dbms_random.value(1,5000)), mod(level,2), trunc(sysdate,'DD') + mod(level,60)/1440,
mod(level,10)
from dual
connect by level <= 7000;


Reference
---------

# Christian Antognini
drop table t;
CREATE TABLE t
AS
SELECT rownum AS id,
round(5678+dbms_random.normal*1234) AS n1,
mod(255+trunc(dbms_random.normal*1000),255) AS n2,
dbms_random.string('p',255) AS pad
FROM dual
CONNECT BY level <= 10000
ORDER BY dbms_random.value;

# Jonathan Lewis

drop table t1;
create table t1
nologging -- adjust as necessary
as
with generator as (
select --+ materialize
rownum id,
substr(dbms_random.string('U',4),1,4) sortcode
from all_objects
where rownum <= 5000
)
select
/*+ ordered use_nl(v2) */
substr(v2.sortcode,1,4) || substr(v1.sortcode,1,2) sortcode,
substr(v1.sortcode,2,2) v2,
substr(v2.sortcode,2,3) v3
from
generator v1,
generator v2
where
rownum <= 1048576
;


# Tanel Põder
Generating lots of rows using connect by - safely!

select rownum r
from
(select rownum r from dual connect by rownum <= 1000) a,
(select rownum r from dual connect by rownum <= 1000) b,
(select rownum r from dual connect by rownum <= 1000) c
where rownum <= 100000000;

AskTom

AskTom

ops$tkyte@ORA9IR2>

CREATE TABLE t1
(
dt date,
x int,
y varchar2(25)
)
PARTITION BY RANGE (dt)
subpartition by hash(x) subpartitions 8
(
PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
PARTITION junk VALUES LESS THAN (MAXVALUE)
);

insert /*+ APPEND */ into t1
select to_date('12-mar-2003','dd-mon-yyyy')+mod(rownum,4), rownum, rpad('*',25,'*')
from all_objects;

drop table t;
create table t as select * from all_objects;
create index t_idx on t(object_name);

Thursday, January 08, 2009

Why asynchronous process?

This article is major for traditional internal business data owners, product managers and programmers.

  • Then why we bother to consider asynchronous process?

# transactional growth, data storage and workload become bottleneck in one single system.
# moving the application to larger computers. there are 2 main limitations:
- outgrowing the capacity of the largest system available
- expensive
# Try distribution transparency(2PC) to insure consistency and accuracy.
- it was better to fail the complete system than to break this transparency.
# Three properties of shared-data systems—data Consistency, system Availability, and tolerance to network Partition—only two can be achieved at any given time.
- Network partition is a given.
- consistency and availability cannot be achieved at the same time.
^ making consistency a priority means that under certain conditions the system will not be available.
^ relaxing consistency will allow the system to remain highly available under the partition network conditions.

  • Solution

Focus on the trade-offs between high availability and data consistency and large-scale data replication through messaging system, batch process the staging data with transaction Database API, we can deliver durability, availability,performance and cost effectiveness large-scale distributed database systems and infrastructure services.

BASE allows for availability in a partitioned database, then opportunities to relax consistency have to be identified. This is often difficult because the tendency of both business stakeholders and developers is to assert that consistency is paramount to the success of the application. Temporal inconsistency cannot be hidden from the end user, so both engineering and product owners must be involved in picking the opportunities for relaxing consistency.

Practical example:

Book qty, 2 input sources, it is not accurate in fact.
- update sequence
- priority
- conflict resolution
Consistency can be relaxed and use just don't care the consistency
take advantage of user-perceived consistency window between write and read
# Finance batch process
# Orders and Credit card batch update
# Return items batch process

ACID: transfer money from one account to another account, maintain total amount same.

The requirement ratio of ACID vs. BASE is about 1:100, like 100 searches generate 1 order.

Example:
a: Flight ticket booking system
b: 1% performance decrease caused 5% orders lost
c: publish a new book and make it online for search
d: user data, e.g. sale_amt, purchase_amt

Reference:

Good things come to who can wait.
* Scalability Best Practices: Lessons from eBay (Partition and Asynchronously)
* BASE: AN ACID ALTERNATIVE
* Eventually Consistent
* Latency is Everywhere and it Costs You Sales - How to Crush it

...

My cubicle in office.

Wednesday, January 07, 2009

cross rows comparing

Goal
----
Try many approaches to do cross rows comparing.

Solution
--------
1. Hash Join
eagle’s home http://www.dbafan.com/blog/?p=176

2. Lead()/Lag() Analytic function

3. SQL Model

Outcome
-------
Hash Join is a good option, Lead() Analytic function is better, less db time, 8 times less Latches.
I could not make SQL Model work efficiently.
For large data set, SQL Model asks much more PGA and temporary tablespace to do sort.
Reduce the columns from Measure will help a little.

Who can help to tune SQL Model?

Setup
-----

drop table scott.t_range ;
create table scott.t_range nologging
as
select level id, level * 10 low_value, level * 10 + Trunc(dbms_random.value(1,11)) high_value
from dual
connect by level <= 300000
order by dbms_random.value;

exec dbms_stats.gather_table_stats('SCOTT','T_RANGE');

Notes: do not do it in production database.

SQL code
--------
1. Hash Join

drop table scott.t_range_tmp;
create table scott.t_range_tmp nologging as
select rownum line_no,low_value,high_value
from
(
select low_value,high_value
from scott.t_range
order by low_value asc
);

select a.low_value,a.high_value,b.low_value,b.high_value
from scott.t_range_tmp a,scott.t_range_tmp b
where a.line_no=b.line_no - 1 and a.high_value>=b.low_value;

2. Lead()/Lag() Analytic function

select * from
(
select id,low_value,high_value,
Lead(low_value,1) OVER (ORDER BY low_value) next_low_value
from scott.t_range
)
where next_low_value <= high_value;

3. SQL Model, slow, need tuning

with a as (SELECT rownum line_no,low_value,high_value FROM scott.t_range order by low_value asc)
select line_no,low_value,next_low_value
FROM a
MODEL
--RETURN UPDATED ROWS
MAIN simple_model
--PARTITION BY (country)
DIMENSION BY (line_no)
MEASURES (low_value,high_value, 0 AS next_low_value)
RULES
(next_low_value[ANY] = low_value[cv(line_no) + 1])
)
where next_low_value <= high_value;

Benchmark
---------
* RunStats

SQL> set arraysize 1000

exec runStats_pkg.rs_start;

drop table scott.t_range_tmp;
create table scott.t_range_tmp nologging as
select rownum line_no,low_value,high_value
from
(
select low_value,high_value
from scott.t_range
order by low_value asc
);

select count(*) from (
select a.low_value,a.high_value,b.low_value,b.high_value
from scott.t_range_tmp a,scott.t_range_tmp b
where a.line_no=b.line_no - 1 and a.high_value>=b.low_value
);

exec runStats_pkg.rs_middle;

select count(*) from
(
select * from
(
select id,low_value,high_value,
Lead(low_value,1) OVER (ORDER BY low_value) next_low_value
from scott.t_range
)
where next_low_value <= high_value
);

exec runStats_pkg.rs_stop(20);

Run1 ran in 270 hsecs
Run2 ran in 201 hsecs
run 1 ran in 134.33% of the run2 time

Name Run1 Run2 Diff
LATCH.JS queue state obj latch 90 60 -30
STAT...redo ordering marks 33 1 -32
STAT...CPU used when call star 209 177 -32
STAT...workarea memory allocat -32 0 32
STAT...CPU used by this sessio 209 177 -32
STAT...consistent changes 82 47 -35
LATCH.redo allocation 39 0 -39
LATCH.begin backup scn array 41 0 -41
LATCH.child cursor hash table 45 0 -45
LATCH.redo writing 46 0 -46
STAT...data blocks consistent 47 0 -47
STAT...rollbacks only - consis 47 0 -47
STAT...CR blocks created 47 0 -47
STAT...deferred (CURRENT) bloc 55 8 -47
STAT...DB time 225 177 -48
STAT...cluster key scans 51 0 -51
STAT...commit cleanouts succes 67 0 -67
STAT...Elapsed Time 272 203 -69
STAT...session cursor cache hi 79 4 -75
LATCH.SQL memory manager worka 81 6 -75
STAT...commit cleanouts 75 0 -75
STAT...index fetch by key 76 0 -76
LATCH.messages 95 12 -83
STAT...cluster key scan block 99 0 -99
STAT...calls to kcmgas 111 1 -110
STAT...opened cursors cumulati 121 4 -117
LATCH.checkpoint queue latch 118 0 -118
STAT...parse count (total) 123 4 -119
STAT...physical write IO reque 126 0 -126
STAT...execute count 131 5 -126
LATCH.dml lock allocation 130 0 -130
LATCH.cache buffers lru chain 140 0 -140
STAT...enqueue requests 157 1 -156
STAT...enqueue releases 158 0 -158
STAT...consistent gets - exami 245 12 -233
STAT...physical read IO reques 238 0 -238
LATCH.undo global data 274 14 -260
LATCH.library cache lock 263 0 -263
LATCH.multiblock read objects 282 0 -282
LATCH.enqueues 411 128 -283
STAT...buffer is not pinned co 305 0 -305
STAT...calls to get snapshot s 318 9 -309
STAT...bytes sent via SQL*Net 1,098 662 -436
LATCH.enqueue hash chains 609 130 -479
STAT...redo entries 568 19 -549
LATCH.shared pool 696 38 -658
STAT...db block changes 817 68 -749
LATCH.library cache pin 817 50 -767
STAT...db block gets from cach 944 43 -901
LATCH.simulator hash latch 978 1 -977
STAT...bytes received via SQL* 2,070 1,058 -1,012
LATCH.library cache 1,155 62 -1,093
STAT...physical reads direct t 1,275 0 -1,275
STAT...physical writes direct 1,275 0 -1,275
STAT...physical reads direct 1,275 0 -1,275
STAT...recursive calls 1,509 1 -1,508
LATCH.row cache objects 1,535 18 -1,517
STAT...physical reads cache pr 1,640 0 -1,640
STAT...db block gets direct 1,793 0 -1,793
STAT...physical reads cache 1,793 0 -1,793
STAT...free buffer requested 1,894 1 -1,893
LATCH.session allocation 2,030 0 -2,030
STAT...db block gets 2,737 43 -2,694
STAT...physical reads 3,068 0 -3,068
STAT...physical writes 3,068 0 -3,068
STAT...physical writes direct 3,068 0 -3,068
STAT...physical writes non che 3,068 0 -3,068
STAT...table scan blocks gotte 5,605 1,795 -3,810
STAT...no work - consistent re 5,663 1,795 -3,868
STAT...consistent gets 6,147 1,824 -4,323
STAT...consistent gets from ca 6,147 1,824 -4,323
LATCH.object queue header oper 5,777 2 -5,775
STAT...session logical reads 8,884 1,867 -7,017
LATCH.cache buffers chains 22,633 3,872 -18,761
STAT...sorts (rows) 322,072 302,880 -19,192
STAT...undo change vector size 28,368 3,372 -24,996
STAT...session pga memory -65,536 0 65,536
STAT...redo size 100,132 4,880 -95,252
STAT...table scan rows gotten 930,309 300,000 -630,309

Run1 latches total versus Run2 -- difference and pct
Run1 Run2 Diff Pct
38,553 4,446 -34,107 867.14%


exec runStats_pkg.rs_start;
select count(*) from
(
with a as (SELECT rownum line_no,low_value,high_value FROM scott.t_range order by low_value asc)
select line_no,low_value,next_low_value
FROM a
MODEL
--RETURN UPDATED ROWS
MAIN simple_model
--PARTITION BY (country)
DIMENSION BY (line_no)
MEASURES (low_value,high_value, 0 AS next_low_value)
RULES
(next_low_value[ANY] = low_value[cv(line_no) + 1])
)
where next_low_value <= high_value
;

exec runStats_pkg.rs_middle;
select count(*) from
(
select * from
(
select id,low_value,high_value,
Lead(low_value,1) OVER (ORDER BY low_value) next_low_value
from scott.t_range
)
where next_low_value <= high_value
);

exec runStats_pkg.rs_stop(20);

Reference
---------
Oracle® Database Data Warehousing Guide
11g Release 1 (11.1)
Part Number B28313-02

21 SQL for Analysis and Reporting
22 SQL for Modeling

Tuesday, January 06, 2009

3 PL/SQL approaches fetch data into ADT collection

Goal
----
There are 3 approaches to fetch data into schema object collection type.
Benchmark to find the best one.

Solution
--------
Setup test case, benchmark with RunStats and SQL trace.

Result
------
PL/SQL native collection(Associative Array and Nested Table) is the fastest way to get data from SQL.
But some time you need to query against a Collection/Array, you have to declare it as an schema object type.

To fetch data into schema object type collection, here list 3 approaches Pros and Cons:

1) Convert multi columns to object type, bulk collect into collection
Pros: code is simple.
Cons: 50% more CPU time compare to approach 2; about same amount of latches.

2) Use multiset, constructs the required collection of ADTs directly in SQL as a select list element in a single row
Pros: Faster, 50% better than approach 1 in my test case.
Cons: code is a little complex, not support SQL Returning clause in 10.1.

3) bulk collect into PL/SQL native collection, copy data into object type collection
Pros: Fastest, 67% better than approach 1 in my test case, 50% less latches.
Cons: use 100% more PGA memory, code is a little complex and not easy to maintain.

Note: ADT: Abstract Data Type, also calls User Define Datatype

Setup
-----

drop TYPE abelisting.book_chg_tab;
drop TYPE abelisting.book_chg_rec;

create or replace TYPE abelisting.book_chg_rec AS OBJECT
(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date date
);
/

create or replace TYPE abelisting.book_chg_tab IS TABLE OF abelisting.book_chg_rec;
/

DROP TABLE abelisting.book_delete_stage_part;

CREATE TABLE abelisting.book_delete_stage_part (
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date timestamp(3) default SysTimestamp,
part_mi number(2) default mod(to_number(to_char(sysdate,'MI')),10)
)
PARTITION BY LIST (part_mi) (
PARTITION data_part_0 VALUES (0),
PARTITION data_part_1 VALUES (1),
PARTITION data_part_2 VALUES (2),
PARTITION data_part_3 VALUES (3),
PARTITION data_part_4 VALUES (4),
PARTITION data_part_5 VALUES (5),
PARTITION data_part_6 VALUES (6),
PARTITION data_part_7 VALUES (7),
PARTITION data_part_8 VALUES (8),
PARTITION data_part_9 VALUES (9),
PARTITION dummy_null VALUES (NULL),
PARTITION dummy_other VALUES (DEFAULT)
)
tablespace data_auto
nologging;

insert into /*+ append */ abelisting.book_delete_stage_part(listingsid,delete_flag,update_date, part_mi)
select trunc(dbms_random.value(1,20000)), mod(level,2), trunc(sysdate,'DD') + mod(level,60)/1440,
mod(level,10)
from dual
connect by level <= 30000;
commit;

exec dbms_stats.gather_table_stats(user,'BOOK_DELETE_STAGE_PART');

Code
----

*) RunStats

# multiset vs. bulk collect into object_array

declare
lt_chg abelisting.book_chg_tab;
-- type my_book_chg_tab is table of abelisting.book_delete_stage%ROWTYPE;
-- lt_chg my_book_chg_tab;
type tab_chg_array is table of abelisting.book_delete_stage_part%rowtype;
lt_chg_array tab_chg_array;
begin
runStats_pkg.rs_start;
select cast(multiset(
select book_chg_rec(LISTINGSID, QTY, DELETE_FLAG, UPDATE_DATE) from abelisting.book_delete_stage_part
) as book_chg_tab)
into lt_chg
from dual;
end;
/

declare
lt_chg abelisting.book_chg_tab;
type item_chg_array is table of abelisting.book_delete_stage_part%rowtype;
lt_chg_array item_chg_array;
begin
runStats_pkg.rs_middle;
select book_chg_rec(LISTINGSID, QTY, DELETE_FLAG, UPDATE_DATE)
bulk collect into lt_chg
from abelisting.book_delete_stage_part
;
runStats_pkg.rs_stop(5);
end;
/

Run1 ran in 24 hsecs
Run2 ran in 32 hsecs
run 1 ran in 75% of the time

Name Run1 Run2 Diff
STAT...recursive calls 11 4 -7
LATCH.active service list 0 7 7
LATCH.cache buffers chains 909 902 -7
LATCH.session idle bit 8 0 -8
STAT...Elapsed Time 26 35 9
STAT...recursive cpu usage 23 34 11
STAT...db block gets 63 51 -12
STAT...db block gets from cach 63 51 -12
STAT...session logical reads 390 377 -13
STAT...workarea memory allocat -34 -9 25
STAT...CPU used when call star 28 0 -28
STAT...CPU used by this sessio 28 0 -28
STAT...DB time 28 0 -28
LATCH.library cache lock 41 12 -29
LATCH.JS queue state obj latch 0 30 30
LATCH.library cache pin 64 24 -40
LATCH.row cache objects 69 27 -42
LATCH.shared pool 58 15 -43
LATCH.enqueues 8 68 60
LATCH.enqueue hash chains 6 69 63
LATCH.library cache 130 36 -94
STAT...undo change vector size 2,808 2,908 100
STAT...redo size 3,956 4,064 108
LATCH.simulator hash latch 2 131 129
STAT...bytes sent via SQL*Net 218 0 -218
STAT...bytes received via SQL* 760 0 -760
STAT...buffer is pinned count 29,796 0 -29,796
STAT...session uga memory max 196,500 0 -196,500
STAT...session pga memory max 6,815,744 393,216 -6,422,528
STAT...session pga memory 0 7,208,960 7,208,960

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
1,337 1,356 19 98.60%

# multiset vs. plsql_collection copy to object array

declare
lt_chg abelisting.book_chg_tab;
type tab_chg_array is table of abelisting.book_delete_stage_part%rowtype;
lt_chg_array tab_chg_array;
begin
runStats_pkg.rs_start;
select cast(multiset(
select book_chg_rec(LISTINGSID, QTY, DELETE_FLAG, UPDATE_DATE) from abelisting.book_delete_stage_part
) as book_chg_tab)
into lt_chg
from dual;
end;
/

declare
lt_chg abelisting.book_chg_tab;
type item_chg_array is table of abelisting.book_delete_stage_part%rowtype;
lt_chg_array item_chg_array;
begin
runStats_pkg.rs_middle;

select LISTINGSID, QTY, DELETE_FLAG, UPDATE_DATE, Null
bulk collect into lt_chg_array
from abelisting.book_delete_stage_part;

lt_chg := abelisting.book_chg_tab();
lt_chg.extend(lt_chg_array.count());
For i in lt_chg_array.First() .. lt_chg_array.Last()
Loop
lt_chg(i) := book_chg_rec(lt_chg_array(i).LISTINGSID,
lt_chg_array(i).QTY ,
lt_chg_array(i).DELETE_FLAG,
lt_chg_array(i).UPDATE_DATE
);
End Loop;
runStats_pkg.rs_stop(5);
end;
/

Run1 ran in 27 hsecs
Run2 ran in 17 hsecs
run 1 ran in 158.82% of the time

Name Run1 Run2 Diff
STAT...opened cursors cumulati 8 2 -6
LATCH.active service list 0 7 7
LATCH.child cursor hash table 7 0 -7
STAT...recursive calls 12 4 -8
LATCH.session idle bit 8 0 -8
STAT...Elapsed Time 29 20 -9
STAT...db block gets 63 53 -10
STAT...db block gets from cach 63 53 -10
STAT...recursive cpu usage 23 12 -11
STAT...workarea memory allocat 12 -7 -19
STAT...calls to get snapshot s 62 38 -24
STAT...table fetch by rowid 24 0 -24
STAT...index fetch by key 24 0 -24
LATCH.checkpoint queue latch 8 32 24
STAT...rows fetched via callba 24 0 -24
STAT...CPU used when call star 28 0 -28
STAT...DB time 28 0 -28
STAT...CPU used by this sessio 28 0 -28
LATCH.JS queue state obj latch 0 30 30
STAT...free buffer inspected 0 38 38
STAT...hot buffers moved to he 0 48 48
STAT...buffer is not pinned co 48 0 -48
LATCH.enqueues 10 70 60
LATCH.enqueue hash chains 8 71 63
LATCH.SQL memory manager worka 4 71 67
STAT...consistent gets 397 327 -70
LATCH.cache buffers chains 973 903 -70
STAT...consistent gets from ca 397 327 -70
STAT...consistent gets - exami 72 1 -71
STAT...session logical reads 460 380 -80
STAT...undo change vector size 2,804 2,904 100
LATCH.simulator hash latch 130 3 -127
STAT...redo size 3,952 4,120 168
LATCH.row cache objects 193 18 -175
STAT...bytes sent via SQL*Net 218 0 -218
LATCH.library cache lock 234 12 -222
LATCH.library cache pin 256 25 -231
LATCH.shared pool 258 13 -245
LATCH.library cache 559 39 -520
STAT...bytes received via SQL* 1,067 0 -1,067
STAT...buffer is pinned count 29,796 0 -29,796
STAT...session uga memory 65,464 0 -65,464
STAT...session uga memory max 193,032 68,932 -124,100
STAT...session pga memory max 6,815,744 9,895,936 3,080,192
STAT...session pga memory 65,536 16,646,144 16,580,608

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
2,691 1,330 -1,361 202.33%

*) SQL Trace

ALTER SESSION SET TRACEFILE_IDENTIFIER="yi";
EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(waits => TRUE, binds => True);

declare
lt_chg abelisting.book_chg_tab;
begin
select cast(multiset(
select book_chg_rec(LISTINGSID, QTY, DELETE_FLAG, UPDATE_DATE) from abelisting.book_delete_stage_part
) as book_chg_tab)
into lt_chg
from dual;
end;
/

declare
lt_chg abelisting.book_chg_tab;
begin
select book_chg_rec(LISTINGSID, QTY, DELETE_FLAG, UPDATE_DATE)
bulk collect into lt_chg
from abelisting.book_delete_stage_part
;
end;
/

EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE();


tkprof report
-------------
SELECT book_chg_rec(LISTINGSID, QTY, DELETE_FLAG, UPDATE_DATE)
FROM
ABELISTING.BOOK_DELETE_STAGE_PART

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.34 0.33 0 316 0 30000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.34 0.33 0 316 0 30000

SELECT CAST(MULTISET( SELECT book_chg_rec(LISTINGSID, QTY, DELETE_FLAG,
UPDATE_DATE)
FROM
ABELISTING.BOOK_DELETE_STAGE_PART ) AS BOOK_CHG_TAB) FROM DUAL

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.21 0.22 0 316 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.21 0.22 0 316 0 1

Misses in library cache during parse: 0
Optimizer mode: FIRST_ROWS
Parsing user id: 35 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 FAST DUAL (cr=0 pr=0 pw=0 time=9 us)

SELECT LISTINGSID, QTY, DELETE_FLAG, UPDATE_DATE, NULL
FROM
ABELISTING.BOOK_DELETE_STAGE_PART

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.11 0.10 0 316 0 30000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.11 0.10 0 316 0 30000

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 35 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
30000 PARTITION LIST ALL PARTITION: 1 12 (cr=316 pr=0 pw=0 time=90091 us)
30000 TABLE ACCESS FULL BOOK_DELETE_STAGE_PART PARTITION: 1 12 (cr=316 pr=0 pw=0 time=30379 us)
.