Friday, December 08, 2006

Database developers and Database API

I’ve come up with an idea while the new 6 months Oracle DBA contractor is being interviewed to back fill for Antoinette while she is on maternity leave.



The idea is we may take this Opportunity to introduce a new database developer join our team.

We pay 6 months salary to the contractor and only get an actual 3 months of work since they need 3 months to get up to speed on our environment (together with the increased cost of hiring a contractor).

It’s better to give this opportunity to the new employee as a database developer which can learn our business, and then contribute to our company in long term.



Why we need a Database Developer and Database API?



Today some GUI logic is in the middle tier but most of the data logic is still in PL/SQL (let’s say eBay here).



Performance
Performance is directly connected to stability and functionality, when db node running to it’s knee and hangs, you can do nothing, and cause some potential locks and bugs.



Dodgy SQL examples.



1) Filter data in java side, treat db as a black data store.
We should always filter data from the database side, I knew it 10 years ago as a database developer.

SELECT I.*

FROM ABELISTING.XBNIDENTS I

WHERE I.PROVIDER <> 44



What we did before:

SELECT I.* FROM ABELISTING.XBNIDENTS I;

Get all the data returned from database,

And then filter the row <> 44 in Java side.



2) Cursor fetches
What we are doing:

• 1. get Max and Min xbnidentsid

SELECT MIN(XBNIDENTSID) AS MIN, MAX(XBNIDENTSID) AS MAX

FROM ABELISTING.XBNIDENTS

• 2. Split it into many 100 rows chunks by calculate each xbnidentsid range

• 3. run below SELECT many times, each run get 100 rows

SELECT I.*, R.RECOMMENDATIONSET

FROM ABELISTING.XBNIDENTS I

LEFT OUTER JOIN LIBRARYTHING.ISBNRECOMMENDATIONS R

ON I.XBNIDENTIFIER = R.ISBN

WHERE I.PROVIDER <> 44

AND I.XBNIDENTTYPECODE = 1

AND I.XBNIDENTSID < #xbnidentsidMax#

AND I.XBNIDENTSID >= #xbnidentsidMin#



It’s a nest loop join, cost 5 times more resource and latches.

Why not run a simple single SQL SELECT, hash join 2 tables, and use bulk cursor prefetch?

It’s been implementing to X-Team, the rows Prefetching.



2.1) change a simple UPDATE to 2 steps.
1) select the PK_ID by some predicates, cached into Java server side, maybe run the SQL many times, return 100 rows each time.

2) process row by row by PK_ID



Why not run a single UPDATE SQL here?



3) Treat database as a black box or a human brain; actually, we control how database works, we need to have:
• knowledge of what the database is capable of doing

• knowledge of all of the intricacies of SQL

• knowledge a solid understanding of goal- what the question is



Here is an example:



This SELECT takes 60 seconds.



SELECT MIN(XBNIDENTSID) AS MIN, MAX(XBNIDENTSID) AS MAX

FROM ABELISTING.XBNIDENTS;

-----------------------------------------------------------------------

| ID | Operation | NAME | ROWS | Bytes | COST (%CPU)| TIME |

-----------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 6 | 4564 (1)| 00:00:51 |

| 1 | SORT AGGREGATE | | 1 | 6 | | |

| 2 | INDEX FAST FULL SCAN| XBNIDENTS_PK | 6888K| 39M| 4564 (1)| 00:00:51 |

-----------------------------------------------------------------------



By simply break it down to 2 SELECT SQL, just 0.01 second



SELECT MIN(XBNIDENTSID) AS MIN

FROM ABELISTING.XBNIDENTS

UNION ALL

SELECT MAX(XBNIDENTSID) AS MAX

FROM ABELISTING.XBNIDENTS;



-----------------------------------------------------------------------

| ID | Operation | NAME | ROWS | Bytes | COST (%CPU)| TIME |

-----------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 2 | 12 | 6 (50)| 00:00:01 |

| 1 | UNION-ALL | | | | | |

| 2 | SORT AGGREGATE | | 1 | 6 | | |

| 3 | INDEX FULL SCAN (MIN/MAX)| XBNIDENTS_PK | 6888K| 39M| 3 (0)| 00:00:01 |

| 4 | SORT AGGREGATE | | 1 | 6 | | |

| 5 | INDEX FULL SCAN (MIN/MAX)| XBNIDENTS_PK | 6888K| 39M| 3 (0)| 00:00:01 |

-----------------------------------------------------------------------



Stability
RDBMS database and store procedure have been around for more than 20 years, and is proven stable. Building our core data process logic inside the database will be stable.



More components means more complexity, more coding and bugs, more maintenance job, and it’s not linear, it’s exponential.



All about database API's. The applications don't do table level(select/insert/update/delete) stuff, the apps don't even really *know* about tables.

On top of database schema we have an API that does the data stuff.
In the application we call this API but have our own application logic as well
(only application logic is in the application, data logic is -- well, right where it belongs -- next to the data, waiting for the 'next great programming paradigm to come along')



make software components modular
software modules must carry out a very specific task
(and be very efficient at carrying it out)
each software module should be loosely coupled (to limit dependencies)
It removes the need for triggers as all inserts, updates and deletes are wrapped in APIs. Instead of writing triggers you simply add the code into the API. I loath triggers.
It prevents people who don't understand SQL writing stupid queries.
All SQL would be written by PL/SQL developers or DBAs, reducing the likelyhood of dodgy queries.
The underlying structure of the database is hidden from the users, so I can make structural changes without client applications being changed.The API implementation can be altered and tuned without affecting the client application.
The same APIs are available to all applications that access the database. No duplication of effort.
Flexibility and Elegant programming
• PL/SQL is the most efficient language for data manipulation

– PL/SQL datatypes are SQL datatypes collection/array, nest object table … – no conversions

– Tight coupling between the language and the database (cursor for loops for example)

– We don’t need to open/close queries – manage lots of stuff

– We are somewhat protected from changes in the database

– You have sub-optimizations like cursor caching done for you.
(Native prefetch 100 rows, with FOR lc_cursor IN (select * from table_name) LOOP)



• Don’t forget the dependency mechanism you get!
Drop/add columns and select * from table_name,
Deploy at anytime after QA time, no build, no server restart, transparent when change table structure (physical database model)

• In PL/SQL it is very hard to not use bind variables correctly.

• In Java is it trivially easy to do it wrong – and it usually is.

• In PL/SQL – parse once, execute many is inherit in the environment.

• In Java, it must be coded – and it hardly ever is.

• In PL/SQL – “select *” can be safe (might not be the best approach, but is is “safe”)

• In Java – “select *” is deadly – but frequently used.

• PL/SQL can survive database schema changes such as a varchar2(80) -> varchar2(255) change

• Java cannot, you will have to inspect all code and change variable sizes.

• PL/SQL has that nice builtin dependency mechanism to avoid the “I didn’t know you used that” syndrome

• Java does not.

• … a long list goes from her…

• Embed SQL complexity into store procedure, give a simple interface to Jave developer

For example, book delete check:

Packaged Store PROCEDURE list_df(p_cursor IN OUT listidcurtyp, p_string VARCHAR2)

AS

i PLS_INTEGER;

BEGIN

OPEN p_cursor FOR

WITH sq

AS

(

SELECT SUBSTR(x, INSTR(x,',',1,LEVEL)+1, INSTR(x,',',1,LEVEL+1) - INSTR(x,',',1,LEVEL) -1) token

FROM (SELECT ','||l_str||',' x,l_str FROM (SELECT p_string l_str FROM dual))

CONNECT BY LEVEL <= LENGTH(l_str)-LENGTH(REPLACE(l_str,',',''))+1

)

SELECT A.listingsid listingsid

FROM abelisting.listings A, sq

WHERE A.listingsid = TO_NUMBER(sq.token)

AND A.rowdf = 'T';



Java side:

SELECT COLUMN_VALUE listingsid

FROM TABLE(abelisting.Sp_Sql_Query.list_df_tabf(:l_string));



Less code, simple is beautiful!
Beat me if you can write less Java code for below PL/SQL store procedure to reach same goal, just to compare the simplicity, not mention the performance yet:

1) For Loop
CREATE OR REPLACE

PROCEDURE process_data( p_inputs IN VARCHAR2 )

AS

BEGIN

FOR x IN ( SELECT * FROM EMP WHERE ename LIKE p_inputs )

LOOP

Process( X );

END LOOP;

END;



2) Enforce read/write consistency, update and delete in one transaction
PROCEDURE clear_pending_list_qty_forall

AS

TYPE listingsidList IS TABLE OF abelisting.list_qty_offset.listingsid%TYPE;

lt_listingsid listingsidList;

TYPE pendingList IS TABLE OF abelisting.list_qty_offset.pending%TYPE;

lt_pending pendingList;

l_cnt NUMBER;

BEGIN

DBMS_APPLICATION_INFO.SET_MODULE('ListLoad', 'Clear pending offset');



--To clear all the pending ordered books qty, into listigns

--Using RETURNING collection to store processing data, make sure you can always get an consistent image of listings.qty

--that is, never delete the extra rows, between update and delete.



-- Options 2 is DELETE with RETURNING list, then bulk FORALL UPDATE, 2 times faster

DELETE FROM abelisting.list_qty_offset

RETURNING listingsid, pending BULK COLLECT INTO lt_listingsid, lt_pending;



DEBUG.f('delete %s rows on abelisting.list_qty_offset', SQL%ROWCOUNT);

FORALL i IN lt_listingsid.FIRST .. lt_listingsid.LAST

UPDATE ABELISTING.listings SET quantity = quantity + lt_pending(i)

WHERE LISTINGSID = lt_listingsid(i);

l_cnt := SQL%ROWCOUNT;

DEBUG.f('updated qty on listing %s rows', l_cnt);

COMMIT;

/*

--Err logging

FOR i IN 1 .. lt_listingsid.COUNT

LOOP

DBMS_OUTPUT.PUT_LINE( 'updated listingsid = ' || lt_listingsid(i)||', qty = '||lt_pending(i) );

DEBUG.f( 'updated listingsid = ' || lt_listingsid(i)||', qty = '||lt_pending(i) );

END LOOP;

*/



EXCEPTION

WHEN OTHERS THEN

DEBUG.f( SUBSTR(TO_CHAR(SQLCODE)||': '||SQLERRM,1,100) );

RAISE;

END;



3) INSERT … SELECT from external_table (flat file), data loading from flat file
PROCEDURE isbn_recomd_full_load(p_add_rows IN OUT NUMBER)

IS

l_cnt NUMBER;

BEGIN

DBMS_APPLICATION_INFO.SET_MODULE('LB', 'isbn_recomd_full_load');

DEBUG.f('truncate table');

EXECUTE IMMEDIATE 'alter session set optimizer_mode=all_rows';

EXECUTE IMMEDIATE 'alter session set skip_unusable_indexes=true';

EXECUTE IMMEDIATE 'TRUNCATE TABLE librarything.isbnRecommendations';

DEBUG.f('unusable index');

--execute immediate 'alter index librarything.isbnRecommendations_isbn unusable';

DEBUG.f('INSERT +APPEND direct load');

INSERT /*+ APPEND */ INTO librarything.isbnRecommendations NOLOGGING

(isbn,recommendationSet)

SELECT isbn,recommendationSet

FROM librarything.isbnRecommendations_ext; -- the external table



l_cnt := SQL%RowCount;

DEBUG.f('INSERTed %s rows', l_cnt);

COMMIT;

p_add_rows := l_cnt;



DEBUG.f('rebuild index on isbn column');

---execute immediate 'alter index librarything.isbnRecommendations_isbn rebuild online';

DEBUG.f('rebuild index done');



DBMS_APPLICATION_INFO.SET_MODULE('LB', 'isbn_recomd_full_load done');



EXCEPTION

WHEN OTHERS THEN

DEBUG.f( SUBSTR(TO_CHAR(SQLCODE)||': '||SQLERRM,1,100) );

RAISE;

END;



My Utopian Development Environment
There is no SELECT/INSERT/UPDATE/DELETE/MERGE in Java code, the database access layer; only call a packaged database Store Procedure for a specific goal.



Communication
Efficiency is achieved by specializing and profession, classify into Java developer, database developer and backend developer. JAVA handle file processing and UI logic, database process logic is just inside the database, wrapped into database store procedure.

I know some backend developer can improve PL/SQL performance 8 times again by calling Pro*C or OCI*C code (Using SQL with array binds of 100 for Key Lookup and Insert), for the raw speed.

A general knowledge to know all other party doing is also necessary and important.



Data modeling
Design a sound data model is one of the most critical stages in the development of a computerized information system.



We’re pursuing a Complete, no Redundancy, Enforce business rule, Data Reusable, flexible and stable, Elegant data model, also for Communication, Integration and

Balance the Conflicting Objectives.



Please check attached email to see the problem could be easily solved by a fitted data model design; and how complex it is when you tackle this issue from Java world.

For database developer, data model is part of their life.

Friday, November 10, 2006

Cascade DELETE

# cascade delete -- that is a database feature.
--
# validate foreign keys -- all should be (have to be) RI in the database as well.
# cascade update -- well, won't even go there
(which means you have an ill designed model in the first place as you
are updating a primary key -- meaning it is not really a primary key at all).
# for simple ins,upd,del,lck -- it is just as easy to code....., into Data API.

I don't like it(cascade delete) personally -- I'd rather have an error when I delete a parent
that has child records. Doesn't "seem" right to have that stuff just disappear.

It is useful in limited cases.
if and only if EVERY time you delete from parent you want to delete from child,
on delete cascade is OK.
but, if there are sometimes that is not the expected behavior -- don't do that, --such as customer and orders.
use a stored procedure/(Database API).

All access would be provided by PL/SQL APIs. I like this because:
It removes the need for triggers as all inserts, updates and deletes are wrapped in APIs. Instead of writing triggers you simply add the code into the API. I loath triggers.

And, at last, make sure to index those foreign keys, else either approach would be slow slow slow.


You can only have on delete cascade when creating the constraint
(so you drop and recreate the constraint)

Cascade DELETE example:

drop table t1;
create table t1 ( x int primary key );
create table t2 ( y int primary key, x
references t1 on delete cascade );
create table t3 ( y references t2 on delete
cascade, x references t1 );


insert into t1 values ( 1 );
insert into t2 values ( 1 , 1 );
insert into t3 values ( 1 , 1 );

delete from t1;

select * from t1;
select * from t2;
select * from t3;

Formal way to add cascade delete constraint:

ALTER table c
add constraint c_fk_parent_table
foreign key(x) references p(x)
on delete cascade;

Wednesday, November 01, 2006

General Knowledge

唉, 我就爱尝鲜, 对每个新特性如获至宝, 记得2001年秋天, 就给世界第二大长话公司上了Oracle 9i on HP-UX.
我的老板(Borland J Builder 的首席构架师)总喜欢问: 啥时候换MySQL呀,要不MS-SQL 2005 也行 (整天宣传SQL-Server2005的优越性), 前两天又来了个 Enterprise DB (基于Postgres)
像我这样只懂Oracle的 咋办呀?!!(我就建议老板 试试DB2还行 我有认证)
还好-最近的统计显示Oracle市场占有率超过50%.

Oracle Data Guard and 11g database

浏览了Fenng的 一篇 Oracle Data Guard practice, 被 control file 搞懵了.
看来我比你幸运,从来没有考虑过copy standby db control file.
我们的配置是, 4 nodes RAC 10.1.0.4, 1 real time standby, 1 read-only standby, for reporting, refresh every night.跑了一年多了 啥问题也没有 可惜Logcial stdby 不支持宽表(100+ columns)等升级到10.2.0.3,会加一个Logical Standby.

根据我的经验,Oracle 整体稳定,但总有一些莫名其妙的Bug,我们的宗旨是: "绕过去, 别较劲"

读了一些Blog, 谈论Oracle database的下一个版本.
Oracle db 11g
Partition
(a. Time dimension and
b. Child table partition based on master tables partition columns
)
和数据压缩, 令人神往. 嘻嘻.

Friday, October 27, 2006

Data API -- Data access and process interface

Some other2 reasons to implement Database API
I) Binding Variable, naturally, implicitly.
II) do set based process (100 rows array prefetch by default) to FOR LOOP.
III) debug.f : to Instrument the code

Compulsory PL/SQL APIs
Client application developers would have no direct access to tables. Not even for queries. All access would be provided by PL/SQL APIs (Include database Views).

I like this because:

  • It removes the need for triggers as all inserts, updates and deletes are wrapped in APIs. Instead of writing triggers you simply add the code into the API. I loath triggers.
  • It prevents people who don't understand SQL writing stupid queries.
    All SQL would be written by PL/SQL developers or DBAs, reducing the likelyhood of dodgy queries.
  • The underlying structure of the database is hidden from the users, so I can make structural changes without client applications being changed.The API implementation can be altered and tuned without affecting the client application.
  • The same APIs are available to all applications that access the database. No duplication of effort.
Anything that generates SQL on-the-fly worries me, not just Java. I want to be able to cut and paste the SQL, not try and capture or trace it during a run.

Our concept is "build the data API in the database, you call the data API".
Database API has been layered by different UI technologies over time.

All about API's. The applications don't do table level(select/insert/update/delete) stuff, the apps don't even really *know* about tables.

On top of database schema we have an API that does the data stuff.
In the application we call this API but have our own application logic as well
(only application logic is in the application, data logic is -- well, right where it belongs -- next to the data, waiting for the 'next great programming paradigm to come along')



  1. make software components modular
  2. software modules must carry out a very specific task
    (and be very efficient at carrying it out)
  3. each software module should be loosly coupled (to limit dependencies)

木匠铺子开张啦

Oracle 技术版