Wednesday, January 14, 2009

Overview asynchronous process change data

Goal
----
Split function :F1 from funcation :F2, asynchronous the process,
to gain scalability(performance and reliability) and availability.

:F1 = check deleted book
:F2 = upload inventory data

*) Business logic background for this example

In our 4 node RAC database, :F2 = upload inventory data runs on node 4,
:F1 = check deleted book runs on node 1 frequently, 50 times/second;
:F1 is hitting the data randomly, 30 rows a page will probably touch 30 data blocks.
This caused lots of Global Cache inter-connect traffic from node 4 to node 1.
Based on our AWR report, it is

Network band width:
= Network requests * db_block_size
= 218104 * 4096
= 893,353,984 (Transfer about 900Mbytes data in 15 minutes)

After implement asynchronous process, the GC transfer volume reduced 200 times for this function,

The F1 check deleted book performance SLA improved 100 times.


Figure-1: Topology process diagram

*) Keyword: Loose coupling, Function split, Partition data by function and range/key list.

Solution
--------
Queue the change data (deleted book listing_id) into a staging table,
batch process and merge it to target table (book_delete_check) every N minutes,
to reduce the network traffic.

*) Understanding: we will lose unprocessed data if we use SQL start date as end tag timestamp
*) Experiment and Observation:
- replace Synonym will not require DDL lock, not safe
- in MERGE SQL, DELETE share UPDATE predicate, and will not delete INSERT rows
- query gv$ view may fail in a RAC database when there is not enough parallel processes on each instance
*) Interpretation: use predicate stage_table.update_date < v$transaction.start_date

Approach
--------

*) mark rowid into a temporary table, process and delete by rowid in the temporary table
*) update_date < Least(SysDate,v$transaction.start_date)
*) SCN_TO_TIMESTAMP(ORA_ROWSCN) < sysdate
- http://www.ningoo.net/html/2009/about_ora_rowscn.html
*) Exchange one data partition with a swap table
*) RETURNING rows BULK COLLECT INTO UDT(User Define Type schema object collection)
*) Many partitions on Mod(to_number(to_char(update_date,'MI')),10)
- Partition table to spread the DDL/DML lock contention
- so N partitions on a centain interval based on your transaction volume and DML lock frequency
- exchange each partion to each swap table
- process the swap table
- truncate the swap table
- the DDL lock will not happen very often
- if partition locked, insert a row to stage_proc_lock table
- if partition locked 4 times continuously
# process one partition with one of above single table method
# delete the locked partition log row
*) Partition view on heap tables for Oracle Standard Edition
*) Oracle Stream
*) Oracle AQ
*) Other Messaging tool

Setup
-----

-- target table
drop table book_del_check;

create table book_del_check
(
listingsid number(20),
upd_dd number(2)
, constraint book_del_check_pk primary key (listingsid )
)
organization index
including upd_dd
overflow
;

-- staging table
-- Option n+2, for small TX volume
drop table book_delete_stage;

create table book_delete_stage(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date date default sysdate
)
ROWDEPENDENCIES
nologging;

create or replace view book_delete_stage_v
as
select listingsid, delete_flag, To_Number(To_Char(update_date,'DD')) upd_day, upd_time
from
(SELECT listingsid, delete_flag,update_date,
SCN_TO_TIMESTAMP(ORA_ROWSCN) upd_time,
Row_Number() OVER (PARTITION BY listingsid order by update_date desc) rn
FROM book_delete_stage b
)
where rn = 1
;


drop TYPE book_chg_tab;
drop TYPE book_chg_rec;

create or replace TYPE book_chg_rec AS OBJECT
(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date timestamp(3)
);
/

create or replace TYPE book_chg_tab IS TABLE OF book_chg_rec;
/

drop table gt_book_delete_stage_i;
drop table gt_book_delete_stage_d;
drop table gt_book_delete_stage_rid;

create global temporary table gt_book_delete_stage_i
(
listingsid number(20),
qty number(5)
)
ON COMMIT DELETE ROWS
;
create global temporary table gt_book_delete_stage_d
(
listingsid number(20),
qty number(5)
)
ON COMMIT DELETE ROWS
;

create global temporary table gt_book_delete_stage_rid
(
rid rowid,
listingsid number(20)
)
ON COMMIT DELETE ROWS
;

drop table stage_proc_lock;
create table stage_proc_lock
(
stage_table varchar2(30),
part_key number(2)
)
tablespace index_auto;

create index stage_proc_lock_i1 on stage_proc_lock(stage_table,part_key)
nologging tablespace index_auto;

DROP TABLE book_delete_stage_part;

CREATE TABLE book_delete_stage_part (
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date timestamp(3) default SysTimestamp,
part_mi number(2) default mod(to_number(to_char(sysdate,'MI')),10)
)
PARTITION BY LIST (part_mi) (
PARTITION data_part_0 VALUES (0),
PARTITION data_part_1 VALUES (1),
PARTITION data_part_2 VALUES (2),
PARTITION data_part_3 VALUES (3),
PARTITION data_part_4 VALUES (4),
PARTITION data_part_5 VALUES (5),
PARTITION data_part_6 VALUES (6),
PARTITION data_part_7 VALUES (7),
PARTITION data_part_8 VALUES (8),
PARTITION data_part_9 VALUES (9),
PARTITION dummy_null VALUES (NULL),
PARTITION dummy_other VALUES (DEFAULT)
)
tablespace data_auto;

DROP TABLE book_delete_stage_swap_0;
DROP TABLE book_delete_stage_swap_1;
DROP TABLE book_delete_stage_swap_2;
DROP TABLE book_delete_stage_swap_3;
DROP TABLE book_delete_stage_swap_4;
DROP TABLE book_delete_stage_swap_5;
DROP TABLE book_delete_stage_swap_6;
DROP TABLE book_delete_stage_swap_7;
DROP TABLE book_delete_stage_swap_8;
DROP TABLE book_delete_stage_swap_9;

CREATE TABLE book_delete_stage_swap_0 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_1 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_2 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_3 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_4 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_5 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_6 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_7 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_8 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;
CREATE TABLE book_delete_stage_swap_9 (listingsid number(20),qty number(5),delete_flag number(1),update_date timestamp(3),part_mi number(2)) tablespace data_auto;


-- Package
@BOOK_CHG_SYNC_PKG.SQL

alter package book_chg_sync_pkg compile;
-- to avoid ORA-4061
alter package list_trigger_pkg compile;

-- to avoid ORA-4061
-- Synonym
create or replace synonym BOOK_CHG_SYNC_api for BOOK_CHG_SYNC_PKG;

grant execute on book_chg_sync_api to app_role;

-- Schedule job
exec dbms_scheduler.stop_job(job_name=> 'merge_book_del_change_delta', force => True);
exec dbms_scheduler.DROP_job ( job_name => 'merge_book_del_change_delta' );

begin
dbms_scheduler.create_job
(
job_name => 'merge_book_del_change_delta',
job_type => 'PLSQL_BLOCK',
job_action => 'begin book_chg_sync_pkg.upd_delete_track; end;',
repeat_interval => 'FREQ=MINUTELY; INTERVAL=3',
enabled => true,
comments => 'To clear all the pending delta data, merge into book_del_check table'
);
end;
/

Idea
----
-- Option n, online alter synonym pointer, no DDL lock, inconsistent data, unsafe
drop table book_delete_stage;

drop table book_delete_stage_t1;

create table book_delete_stage_t1(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date date default sysdate
)
tablespace data_auto;

create or replace synonym book_delete_stage for book_delete_stage_t1;

drop table book_delete_stage_t2;

create table book_delete_stage_t2(
listingsid number(20),
qty number(5),
delete_flag number(1),
update_date date default sysdate
)
tablespace data_auto;

-- no DDL lock after DML done.
insert into book_delete_stage
values(1,1,1,sysdate);
commit;

create or replace synonym book_delete_stage for book_delete_stage_t2;

select * from book_delete_stage_t1;
select * from book_delete_stage_t2;
select * from book_delete_stage;

-- Option n+1, online exchange table partition
-- for large TX volume
-- Goal: Partition table to spread the contention

*) Constraint : Must be run serially

*) Process

swap out the partitions Mod(minute,10) are not locked by DML
insert into temporary table
process
commit
truncate the swap table

*) Test Case

Insert 5 rows to each partition
DML lock on 2 partitions
exec book_chg_sync_pkg.merge_delta_multi_part;
- Swap out other 8 partitions

execute book_chg_sync_pkg.merge_delta_rowscn;



Test case
---------
*) Function

- Delete some listings
- Wait 5 minutes, run book_delete_check query, these listings should disappear.

-) Restorea some listings
-) Wait 5 minutes, run book_delete_check query, these listings should come out.

*) Network performance - Test by Production DBA

To check
Global Cache and Enqueue Services - Workload Characteristics
-Avg global enqueue get time (ms): 0.2
-Avg global cache cr block receive time (ms): 3.7
-Avg global cache current block receive time (ms): 2.5
-Avg global cache cr block build time (ms): 0.0
-Avg global cache cr block send time (ms): 0.1
-Avg global cache current block pin time (ms): 0.0
-Avg global cache current block send time (ms):0.1
Global Cache and Enqueue Services - Messaging Statistics
-Avg message sent queue time (ms): 50.8
-Avg message sent queue time on ksxp (ms): 1.0
-Avg message received queue time (ms): 0.0

Test code
---------

-- Option n+1, online exchange table partition
-- for large TX volume
-- Goal: Partition table to spread the contention

-- Test DELETE in MERGE. (D.upd_dd <> S.upd_day or S.delete_flag = 0).
insert into book_delete_stage_part(listingsid,qty,delete_flag,update_date, part_mi)
values(7,5,0,trunc(sysdate,'DD') + 0, 4);

insert into book_delete_stage_part(listingsid,qty,delete_flag,update_date, part_mi)
values(7,5,0,trunc(sysdate,'DD') + 4, 4);
commit;
insert into book_delete_stage_part(listingsid,qty,delete_flag,update_date, part_mi)
values(7,0,1,trunc(sysdate,'DD') + 4, 4);
commit;


select * from book_delete_stage_part;
select * from book_delete_stage_swap_7;
select * from book_del_check where listingsid = 7;

SQL> Exec book_chg_sync_pkg.merge_delta_multi_part;

select * from book_delete_stage_part;
select * from book_delete_stage_swap_7;
select * from book_del_check where listingsid = 7;

--Generate testing data
insert into book_delete_stage_part(listingsid,delete_flag,update_date, part_mi)
select trunc(dbms_random.value(1,5000)), mod(level,2), trunc(sysdate,'DD') + mod(level,60)/1440,
mod(level,10)
from dual
connect by level <= 7000;

SQL> Exec book_chg_sync_pkg.merge_delta_multi_part;


Reference
---------

-- Exchange partition

--DDL With the WAIT Option, number of seconds, 11g new
alter session set DDL_LOCK_TIMEOUT=120;

-- ORA-00054: resource busy and acquire with NOWAIT specified
ALTER TABLE book_delete_stage_part
EXCHANGE PARTITION data_part_5
WITH TABLE book_delete_stage_swap_5
INCLUDING INDEXES
WITHOUT VALIDATION
UPDATE GLOBAL INDEXES
;

--DDL With the WAIT Option, number of seconds, 11g new
alter session set DDL_LOCK_TIMEOUT=0;

-- check TX view, 1000 LIO
SELECT to_char(t.start_date,'MI'), Min(t.start_date), count(*)
FROM gv$transaction t
,gv$locked_object lo
-- FROM v$transaction t
-- ,v$locked_object lo
,sys.obj$ do
WHERE do.NAME = 'BOOK_DELETE_STAGE'
AND do.obj# = lo.object_id
AND lo.xidusn = t.xidusn
group by to_char(t.start_date,'MI');

-- Get partition number, skip locked partition
-- check TX view, 1000 LIO
select Trim(to_char(part_num)) part_num bulk collect into lt_part_num from
(
select (level - 1) part_num from dual connect by level <= 10
minus
SELECT Mod(to_number(to_char(t.start_date,'MI')),10) mi --, count(*)
FROM gv$transaction t
,gv$locked_object lo
-- FROM v$transaction t
-- ,v$locked_object lo
,sys.obj$ do
WHERE do.NAME = 'BOOK_DELETE_STAGE_PART'
AND do.obj# = lo.object_id
AND lo.xidusn = t.xidusn
group by Mod(to_number(to_char(t.start_date,'MI')),10)
);

*)AWR report RAC section anylysis

15 minutes interval in the morning.

The :F1 book_delete_check query used 90% network resource. Make all other database request that compete for network resource (Global cache transfer) worse and worse.
Begin Snap: 13-Nov-08 10:45:20
End Snap: 13-Nov-08 11:00:10

Our practical target is to reduce the network usage 10 to 30 times by applying BASE solution.
DBA can explain why in the story gathering meeting

Network requests :
= (CR Blocks Received + Current Blocks Received) + messages sent + messages received
= (96884 + 12168 ) * 2
= 218104

Note: CR is consistent reads, Current is reads on current data block from disk

Network band width:
= Network requests * db_block_size
= 218104 * 4096
= 893,353,984 (Transfer about 900Mbytes in 15 minutes)

Avg book_delete_check query wait time on network delay:
= ( CR Blocks Received * Avg global cache cr block receive time ms
+ Current Blocks Received * Avg global cache current block receive time ms
+ Global Cache Grant messages * Avg message sent queue time ms
) / Executions / 1000 ms
= (96884 * 60.6 + 12168 * 72.8 + (96884 + 12168) * 201) /45211/1000
= 0.634280436 Seconds

No comments: