Friday, December 08, 2006

Database developers and Database API

I’ve come up with an idea while the new 6 months Oracle DBA contractor is being interviewed to back fill for Antoinette while she is on maternity leave.



The idea is we may take this Opportunity to introduce a new database developer join our team.

We pay 6 months salary to the contractor and only get an actual 3 months of work since they need 3 months to get up to speed on our environment (together with the increased cost of hiring a contractor).

It’s better to give this opportunity to the new employee as a database developer which can learn our business, and then contribute to our company in long term.



Why we need a Database Developer and Database API?



Today some GUI logic is in the middle tier but most of the data logic is still in PL/SQL (let’s say eBay here).



Performance
Performance is directly connected to stability and functionality, when db node running to it’s knee and hangs, you can do nothing, and cause some potential locks and bugs.



Dodgy SQL examples.



1) Filter data in java side, treat db as a black data store.
We should always filter data from the database side, I knew it 10 years ago as a database developer.

SELECT I.*

FROM ABELISTING.XBNIDENTS I

WHERE I.PROVIDER <> 44



What we did before:

SELECT I.* FROM ABELISTING.XBNIDENTS I;

Get all the data returned from database,

And then filter the row <> 44 in Java side.



2) Cursor fetches
What we are doing:

• 1. get Max and Min xbnidentsid

SELECT MIN(XBNIDENTSID) AS MIN, MAX(XBNIDENTSID) AS MAX

FROM ABELISTING.XBNIDENTS

• 2. Split it into many 100 rows chunks by calculate each xbnidentsid range

• 3. run below SELECT many times, each run get 100 rows

SELECT I.*, R.RECOMMENDATIONSET

FROM ABELISTING.XBNIDENTS I

LEFT OUTER JOIN LIBRARYTHING.ISBNRECOMMENDATIONS R

ON I.XBNIDENTIFIER = R.ISBN

WHERE I.PROVIDER <> 44

AND I.XBNIDENTTYPECODE = 1

AND I.XBNIDENTSID < #xbnidentsidMax#

AND I.XBNIDENTSID >= #xbnidentsidMin#



It’s a nest loop join, cost 5 times more resource and latches.

Why not run a simple single SQL SELECT, hash join 2 tables, and use bulk cursor prefetch?

It’s been implementing to X-Team, the rows Prefetching.



2.1) change a simple UPDATE to 2 steps.
1) select the PK_ID by some predicates, cached into Java server side, maybe run the SQL many times, return 100 rows each time.

2) process row by row by PK_ID



Why not run a single UPDATE SQL here?



3) Treat database as a black box or a human brain; actually, we control how database works, we need to have:
• knowledge of what the database is capable of doing

• knowledge of all of the intricacies of SQL

• knowledge a solid understanding of goal- what the question is



Here is an example:



This SELECT takes 60 seconds.



SELECT MIN(XBNIDENTSID) AS MIN, MAX(XBNIDENTSID) AS MAX

FROM ABELISTING.XBNIDENTS;

-----------------------------------------------------------------------

| ID | Operation | NAME | ROWS | Bytes | COST (%CPU)| TIME |

-----------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 6 | 4564 (1)| 00:00:51 |

| 1 | SORT AGGREGATE | | 1 | 6 | | |

| 2 | INDEX FAST FULL SCAN| XBNIDENTS_PK | 6888K| 39M| 4564 (1)| 00:00:51 |

-----------------------------------------------------------------------



By simply break it down to 2 SELECT SQL, just 0.01 second



SELECT MIN(XBNIDENTSID) AS MIN

FROM ABELISTING.XBNIDENTS

UNION ALL

SELECT MAX(XBNIDENTSID) AS MAX

FROM ABELISTING.XBNIDENTS;



-----------------------------------------------------------------------

| ID | Operation | NAME | ROWS | Bytes | COST (%CPU)| TIME |

-----------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 2 | 12 | 6 (50)| 00:00:01 |

| 1 | UNION-ALL | | | | | |

| 2 | SORT AGGREGATE | | 1 | 6 | | |

| 3 | INDEX FULL SCAN (MIN/MAX)| XBNIDENTS_PK | 6888K| 39M| 3 (0)| 00:00:01 |

| 4 | SORT AGGREGATE | | 1 | 6 | | |

| 5 | INDEX FULL SCAN (MIN/MAX)| XBNIDENTS_PK | 6888K| 39M| 3 (0)| 00:00:01 |

-----------------------------------------------------------------------



Stability
RDBMS database and store procedure have been around for more than 20 years, and is proven stable. Building our core data process logic inside the database will be stable.



More components means more complexity, more coding and bugs, more maintenance job, and it’s not linear, it’s exponential.



All about database API's. The applications don't do table level(select/insert/update/delete) stuff, the apps don't even really *know* about tables.

On top of database schema we have an API that does the data stuff.
In the application we call this API but have our own application logic as well
(only application logic is in the application, data logic is -- well, right where it belongs -- next to the data, waiting for the 'next great programming paradigm to come along')



make software components modular
software modules must carry out a very specific task
(and be very efficient at carrying it out)
each software module should be loosely coupled (to limit dependencies)
It removes the need for triggers as all inserts, updates and deletes are wrapped in APIs. Instead of writing triggers you simply add the code into the API. I loath triggers.
It prevents people who don't understand SQL writing stupid queries.
All SQL would be written by PL/SQL developers or DBAs, reducing the likelyhood of dodgy queries.
The underlying structure of the database is hidden from the users, so I can make structural changes without client applications being changed.The API implementation can be altered and tuned without affecting the client application.
The same APIs are available to all applications that access the database. No duplication of effort.
Flexibility and Elegant programming
• PL/SQL is the most efficient language for data manipulation

– PL/SQL datatypes are SQL datatypes collection/array, nest object table … – no conversions

– Tight coupling between the language and the database (cursor for loops for example)

– We don’t need to open/close queries – manage lots of stuff

– We are somewhat protected from changes in the database

– You have sub-optimizations like cursor caching done for you.
(Native prefetch 100 rows, with FOR lc_cursor IN (select * from table_name) LOOP)



• Don’t forget the dependency mechanism you get!
Drop/add columns and select * from table_name,
Deploy at anytime after QA time, no build, no server restart, transparent when change table structure (physical database model)

• In PL/SQL it is very hard to not use bind variables correctly.

• In Java is it trivially easy to do it wrong – and it usually is.

• In PL/SQL – parse once, execute many is inherit in the environment.

• In Java, it must be coded – and it hardly ever is.

• In PL/SQL – “select *” can be safe (might not be the best approach, but is is “safe”)

• In Java – “select *” is deadly – but frequently used.

• PL/SQL can survive database schema changes such as a varchar2(80) -> varchar2(255) change

• Java cannot, you will have to inspect all code and change variable sizes.

• PL/SQL has that nice builtin dependency mechanism to avoid the “I didn’t know you used that” syndrome

• Java does not.

• … a long list goes from her…

• Embed SQL complexity into store procedure, give a simple interface to Jave developer

For example, book delete check:

Packaged Store PROCEDURE list_df(p_cursor IN OUT listidcurtyp, p_string VARCHAR2)

AS

i PLS_INTEGER;

BEGIN

OPEN p_cursor FOR

WITH sq

AS

(

SELECT SUBSTR(x, INSTR(x,',',1,LEVEL)+1, INSTR(x,',',1,LEVEL+1) - INSTR(x,',',1,LEVEL) -1) token

FROM (SELECT ','||l_str||',' x,l_str FROM (SELECT p_string l_str FROM dual))

CONNECT BY LEVEL <= LENGTH(l_str)-LENGTH(REPLACE(l_str,',',''))+1

)

SELECT A.listingsid listingsid

FROM abelisting.listings A, sq

WHERE A.listingsid = TO_NUMBER(sq.token)

AND A.rowdf = 'T';



Java side:

SELECT COLUMN_VALUE listingsid

FROM TABLE(abelisting.Sp_Sql_Query.list_df_tabf(:l_string));



Less code, simple is beautiful!
Beat me if you can write less Java code for below PL/SQL store procedure to reach same goal, just to compare the simplicity, not mention the performance yet:

1) For Loop
CREATE OR REPLACE

PROCEDURE process_data( p_inputs IN VARCHAR2 )

AS

BEGIN

FOR x IN ( SELECT * FROM EMP WHERE ename LIKE p_inputs )

LOOP

Process( X );

END LOOP;

END;



2) Enforce read/write consistency, update and delete in one transaction
PROCEDURE clear_pending_list_qty_forall

AS

TYPE listingsidList IS TABLE OF abelisting.list_qty_offset.listingsid%TYPE;

lt_listingsid listingsidList;

TYPE pendingList IS TABLE OF abelisting.list_qty_offset.pending%TYPE;

lt_pending pendingList;

l_cnt NUMBER;

BEGIN

DBMS_APPLICATION_INFO.SET_MODULE('ListLoad', 'Clear pending offset');



--To clear all the pending ordered books qty, into listigns

--Using RETURNING collection to store processing data, make sure you can always get an consistent image of listings.qty

--that is, never delete the extra rows, between update and delete.



-- Options 2 is DELETE with RETURNING list, then bulk FORALL UPDATE, 2 times faster

DELETE FROM abelisting.list_qty_offset

RETURNING listingsid, pending BULK COLLECT INTO lt_listingsid, lt_pending;



DEBUG.f('delete %s rows on abelisting.list_qty_offset', SQL%ROWCOUNT);

FORALL i IN lt_listingsid.FIRST .. lt_listingsid.LAST

UPDATE ABELISTING.listings SET quantity = quantity + lt_pending(i)

WHERE LISTINGSID = lt_listingsid(i);

l_cnt := SQL%ROWCOUNT;

DEBUG.f('updated qty on listing %s rows', l_cnt);

COMMIT;

/*

--Err logging

FOR i IN 1 .. lt_listingsid.COUNT

LOOP

DBMS_OUTPUT.PUT_LINE( 'updated listingsid = ' || lt_listingsid(i)||', qty = '||lt_pending(i) );

DEBUG.f( 'updated listingsid = ' || lt_listingsid(i)||', qty = '||lt_pending(i) );

END LOOP;

*/



EXCEPTION

WHEN OTHERS THEN

DEBUG.f( SUBSTR(TO_CHAR(SQLCODE)||': '||SQLERRM,1,100) );

RAISE;

END;



3) INSERT … SELECT from external_table (flat file), data loading from flat file
PROCEDURE isbn_recomd_full_load(p_add_rows IN OUT NUMBER)

IS

l_cnt NUMBER;

BEGIN

DBMS_APPLICATION_INFO.SET_MODULE('LB', 'isbn_recomd_full_load');

DEBUG.f('truncate table');

EXECUTE IMMEDIATE 'alter session set optimizer_mode=all_rows';

EXECUTE IMMEDIATE 'alter session set skip_unusable_indexes=true';

EXECUTE IMMEDIATE 'TRUNCATE TABLE librarything.isbnRecommendations';

DEBUG.f('unusable index');

--execute immediate 'alter index librarything.isbnRecommendations_isbn unusable';

DEBUG.f('INSERT +APPEND direct load');

INSERT /*+ APPEND */ INTO librarything.isbnRecommendations NOLOGGING

(isbn,recommendationSet)

SELECT isbn,recommendationSet

FROM librarything.isbnRecommendations_ext; -- the external table



l_cnt := SQL%RowCount;

DEBUG.f('INSERTed %s rows', l_cnt);

COMMIT;

p_add_rows := l_cnt;



DEBUG.f('rebuild index on isbn column');

---execute immediate 'alter index librarything.isbnRecommendations_isbn rebuild online';

DEBUG.f('rebuild index done');



DBMS_APPLICATION_INFO.SET_MODULE('LB', 'isbn_recomd_full_load done');



EXCEPTION

WHEN OTHERS THEN

DEBUG.f( SUBSTR(TO_CHAR(SQLCODE)||': '||SQLERRM,1,100) );

RAISE;

END;



My Utopian Development Environment
There is no SELECT/INSERT/UPDATE/DELETE/MERGE in Java code, the database access layer; only call a packaged database Store Procedure for a specific goal.



Communication
Efficiency is achieved by specializing and profession, classify into Java developer, database developer and backend developer. JAVA handle file processing and UI logic, database process logic is just inside the database, wrapped into database store procedure.

I know some backend developer can improve PL/SQL performance 8 times again by calling Pro*C or OCI*C code (Using SQL with array binds of 100 for Key Lookup and Insert), for the raw speed.

A general knowledge to know all other party doing is also necessary and important.



Data modeling
Design a sound data model is one of the most critical stages in the development of a computerized information system.



We’re pursuing a Complete, no Redundancy, Enforce business rule, Data Reusable, flexible and stable, Elegant data model, also for Communication, Integration and

Balance the Conflicting Objectives.



Please check attached email to see the problem could be easily solved by a fitted data model design; and how complex it is when you tackle this issue from Java world.

For database developer, data model is part of their life.