Monday, February 02, 2009

External table benchmark fixed-length vs. delimited format

Goal
----
Compare the External Table read performance on Fix length format vs Tab(any char) Delimited format.

Result
------
Fixed-length format flat file use up about 10% less CPU and 7 times less Latches resource than delimited format.
But Delimited format took 50% less Elapsed Time in my test case.
And one interesting point is that DB time for Fixed-length format is 0.

Tested on Oracle 11.1.0.7


..............................Fixed-Length...Delimited........Diff
STAT...CPU used by this sessio 179 202 23 ***
STAT...Elapsed Time 179 91 -88 ***
STAT...DB time 0 421 421 xxx
LATCH.process queue reference 1 58,878 58,877 ***
STAT...physical read bytes 212,992 0 -212,992
STAT...session uga memory max 604,728 115,320 -489,408
STAT...session pga memory max 720,896 131,072 -589,824

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
7,629 64,444 56,815 11.84%


# Fixed-length records are processed faster than records terminated by a string.
# Fixed-length fields are processed faster than delimited fields.

Advice
------
When runing single batch job, Delimited format is 2 times faster, a little more CPU used, less PGA memory.
When runing multi concurrent batch jobs, Fixed-length format is better, and good for scale out,
but Fixed-length format require more PGA memory and more physical reads, since the file size is bigger to hold the space.

Solution
--------
Set up a typical emp test table, generate 700,000 rows, and run the benchmark.

setup
-----

drop TABLE EMP;
CREATE TABLE EMP
(
EMPNO NUMBER(4) NOT NULL,
ENAME VARCHAR2(10 BYTE),
JOB VARCHAR2(9 BYTE),
MGR NUMBER(4),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2),
DML_TIME TIMESTAMP(3) DEFAULT systimestamp
)
;

alter table emp add CONSTRAINT EMP_PK
PRIMARY KEY
(EMPNO) using INDEX NOLOGGING
;

SET DEFINE OFF;

Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7369, 'SMITH', 'CLERK', 7902, TO_DATE('12/17/1980 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 30847.56, 20, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO, DML_TIME)
Values (7499, 'ALLEN', 'SALESMAN', 7698, TO_DATE('02/20/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 1600, 301, 30, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO, DML_TIME)
Values (7521, 'WARD', 'SALESMAN', 7698, TO_DATE('02/22/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 1250, 501, 30, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7566, 'JONES', 'MANAGER', 7839, TO_DATE('04/02/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 2975, 20, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO, DML_TIME)
Values (7654, 'MARTIN', 'SALESMAN', 7698, TO_DATE('09/28/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 1250, 1401, 30, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7698, 'BLAKE', 'MANAGER', 7839, TO_DATE('05/01/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 2850, 30, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7782, 'CLARK', 'MANAGER', 7839, TO_DATE('06/09/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 2450, 10, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7788, 'SCOTT', 'ANALYST', 7566, TO_DATE('12/09/1982 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 3000, 20, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7839, 'KING', 'PRESIDENT', TO_DATE('11/17/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 5000, 10, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO, DML_TIME)
Values (7844, 'TURNER', 'SALESMAN', 7698, TO_DATE('09/08/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 1500, 1, 30, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7876, 'ADAMS', 'CLERK', 7788, TO_DATE('01/12/1983 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 1100, 20, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7900, 'JAMES', 'CLERK', 7698, TO_DATE('12/03/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 950, 30, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7902, 'FORD', 'ANALYST', 7566, TO_DATE('12/03/1981 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 3000, 20, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
Insert into SCOTT.EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, DEPTNO, DML_TIME)
Values (7934, 'MILLER', 'CLERK', 7782, TO_DATE('01/23/1982 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 1300, 10, TO_TIMESTAMP('2/2/2009 2:56:51.766 PM','fmMMfm/fmDDfm/YYYY fmHH12fm:MI:SS.FF AM'));
COMMIT;

CREATE OR REPLACE DIRECTORY data_dir AS '/u01/exp';

set linesize 1000
set pagesize 0
set serveroutput off
set trimspool on
set trimout on
set head off
set echo off
set term off
set verify off
set feedback off

-- Fixed length format external table

spool emp.txt
with a as (select rownum from dual connect by level <= 50000)
select
rpad(EMPNO, 10, ' ')||rpad(ENAME, 20, ' ')||rpad(job, 20, ' ')||trim(to_char(sal,'00000.00'))||' '
||rpad(to_char(HIREDATE,'MM/DD/YYYY HH24:MI:SS'), 21, ' ')
||rpad(to_char(systimestamp,'MM/DD/YYYY HH24:MI:SS.FF3'), 25, ' ')
from scott.emp, a;
spool off

drop table emp_ext_fix;
create table emp_ext_fix
(
EMPNO NUMBER(4) ,
ENAME VARCHAR2(10 BYTE),
JOB VARCHAR2(9 BYTE),
SAL Number(7,2),
HIREDATE date,
DML_Time TIMESTAMP
)
organization external
(type oracle_loader
default directory data_dir
access parameters
(
records delimited by newline
DATE_CACHE 3000
badfile data_dir:'emp.bad'
logfile data_dir:'emp.log'
fields LRtrim
missing field values are null
REJECT ROWS WITH ALL NULL FIELDS
(
empno position(1:10) INTEGER EXTERNAL,
ename position(*:+20) CHAR ,
job position(*:+20) char ,
sal position(*:+9) char ,
HIREDATE position(*:+21) DATE "MM/DD/YYYY HH24:MI:SS",
DML_TIME position(*:+25) CHAR(25) date_format TIMESTAMP MASK "MM/DD/YYYY HH24:MI:SS.FF3"
)
)
LOCATION ('emp.txt')
)
NoPARALLEL
REJECT LIMIT UNLIMITED;

select * from EMP_ext_fix where rownum < 5;


--Tab delimited or csv file

spool emp.csv
with a as (select rownum from dual connect by level <= 50000)
select EMPNO||chr(9)||ENAME||chr(9)||job||chr(9)||sal||chr(9)
||to_char(HIREDATE,'MM/DD/YYYY HH24:MI:SS')||chr(9)
||to_char(systimestamp,'MM/DD/YYYY HH24:MI:SS.FF3')
from scott.emp,a;
spool off

drop TABLE SCOTT.EMP_ext_csv;
CREATE TABLE SCOTT.EMP_ext_csv
(
EMPNO NUMBER(4) ,
ENAME VARCHAR2(10 BYTE),
JOB VARCHAR2(9 BYTE),
SAL Number(7,2),
HIREDATE date,
DML_Time TIMESTAMP
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
records delimited by newline
badfile data_dir:'emp.bad'
logfile data_dir:'emp.log'
fields terminated by 0x'09'
missing field values are null
(
EMPNO, ENAME, JOB, SAL, HIREDATE DATE "MM/DD/YYYY HH24:MI:SS",
dml_time char(25) date_format TIMESTAMP MASK "MM/DD/YYYY HH24:MI:SS.FF3"
)
)
LOCATION ('emp.csv')
)
PARALLEL
REJECT LIMIT UNLIMITED;
select /* */ * from emp_ext_csv a where rownum < 5;

alter system flush buffer_cache;

Benchmark
---------

exec runstats_pkg.rs_start;
select count(HIREDATE) from emp_ext_csv;
exec runstats_pkg.rs_middle;
select count(HIREDATE) from emp_ext_fix;
exec runstats_pkg.rs_stop;

begin
runstats_pkg.rs_start;
for c in (select ename,sal,hiredate from emp_ext_csv)
loop
Null;
end loop;
runstats_pkg.rs_middle;
for c in (select ename,sal,hiredate from emp_ext_fix)
loop
Null;
end loop;
runstats_pkg.rs_stop;
end;
/

begin
runstats_pkg.rs_start;
for c in (select * from emp_ext_fix)
loop
Null;
end loop;
runstats_pkg.rs_middle;
for c in (select * from emp_ext_csv)
loop
Null;
end loop;
runstats_pkg.rs_stop;
end;
/

*) Output for last test case:

Run1 ran in 179 hsecs
Run2 ran in 91 hsecs
run 1 ran in 196.7% of the time

Name Run1 Run2 Diff
STAT...opened cursors current 1 0 -1
STAT...user I/O wait time 3 2 -1
STAT...db block changes 36 37 1
STAT...shared hash latch upgra 4 3 -1
STAT...calls to kcmgas 0 1 1
STAT...redo ordering marks 0 1 1
STAT...redo subscn max counts 0 1 1
STAT...Batched IO single block 1 0 -1
STAT...Batched IO block miss c 1 0 -1
STAT...Batched IO double miss 1 0 -1
STAT...active txn count during 0 1 1
STAT...cleanout - number of kt 0 1 1
STAT...queries parallelized 0 1 1
STAT...DFO trees parallelized 0 1 1
STAT...Parallel operations not 0 1 1
LATCH.process allocation 0 1 1
LATCH.ksuosstats global area 0 1 1
LATCH.OS process allocation 1 0 -1
LATCH.KMG MMAN ready and start 1 0 -1
LATCH.mostly latch-free SCN 0 1 1
LATCH.lgwr LWN SCN 0 1 1
LATCH.Consistent RBA 0 1 1
LATCH.active checkpoint queue 1 0 -1
LATCH.archive process latch 0 1 1
LATCH.redo allocation 3 4 1
LATCH.list of block allocation 0 1 1
LATCH.Change Notification Hash 1 0 -1
LATCH.space background task la 1 0 -1
LATCH.session timer 1 0 -1
LATCH.In memory undo latch 3 1 -2
LATCH.job_queue_processes para 2 0 -2
LATCH.ASM db client latch 0 2 2
STAT...db block gets 30 27 -3
STAT...db block gets from cach 30 27 -3
STAT...shared hash latch upgra 0 3 3
LATCH.undo global data 1 4 3
LATCH.channel operations paren 7 11 4
LATCH.redo writing 1 5 4
STAT...table scans (short tabl 1 6 5
STAT...lob reads 0 5 5
STAT...undo change vector size 2,472 2,480 8
STAT...session cursor cache hi 1 10 9
STAT...free buffer requested 17 7 -10
LATCH.messages 16 6 -10
LATCH.cache buffers lru chain 12 1 -11
STAT...physical read total IO 13 0 -13
STAT...physical reads 13 0 -13
STAT...physical reads cache 13 0 -13
STAT...physical read IO reques 13 0 -13
LATCH.enqueues 82 68 -14
LATCH.object queue header heap 16 1 -15
LATCH.resmgr group change latc 0 16 16
LATCH.resmgr:actses change gro 1 17 16
LATCH.compile environment latc 0 16 16
LATCH.PL/SQL warning settings 0 16 16
LATCH.Real-time plan statistic 0 17 17
LATCH.kokc descriptor allocati 2 20 18
LATCH.parallel query stats 1 20 19
STAT...CPU used by this sessio 179 202 23 ***
STAT...recursive cpu usage 171 199 28
LATCH.object queue header oper 34 6 -28
STAT...enqueue releases 5 34 29
LATCH.session state list latch 3 32 29
LATCH.simulator hash latch 14 45 31
STAT...enqueue conversions 0 32 32
LATCH.parameter table manageme 0 32 32
LATCH.resmgr:free threads list 0 32 32
LATCH.dummy allocation 1 33 32
LATCH.resmgr:active threads 0 32 32
LATCH.error message lists 0 33 33
STAT...session cursor cache co 7 43 36
LATCH.JS queue state obj latch 36 0 -36
STAT...enqueue requests 5 50 45
STAT...parse count (total) 7 53 46
STAT...execute count 7 53 46
LATCH.enqueue hash chains 86 133 47
LATCH.checkpoint queue latch 17 65 48
STAT...workarea memory allocat -26 37 63
STAT...user calls 0 64 64
LATCH.query server freelists 1 68 67
STAT...redo size 2,976 3,044 68
STAT...opened cursors cumulati 7 78 71
LATCH.active service list 8 82 74
LATCH.SQL memory manager worka 5 82 77
STAT...Elapsed Time 179 91 -88 ***
STAT...index scans kdiixs1 18 108 90
STAT...table scan blocks gotte 18 114 96
LATCH.client/application info 1 97 96
LATCH.process queue 1 99 98
LATCH.parallel query alloc buf 1 161 160
LATCH.session idle bit 3 167 164
LATCH.session allocation 3 189 186
STAT...recursive calls 7,043 7,300 257
STAT...no work - consistent re 77 474 397
STAT...consistent gets from ca 75 490 415
STAT...DB time 0 421 421 xxx
LATCH.shared pool 14 517 503
STAT...table fetch by rowid 1,695 186 -1,509
STAT...calls to get snapshot s 1,697 129 -1,568
LATCH.MinActiveScn Latch 1,689 106 -1,583
STAT...rows fetched via callba 1,676 78 -1,598
STAT...index fetch by key 1,676 68 -1,608
LATCH.row cache objects 104 1,881 1,777
STAT...table scan rows gotten 360 2,280 1,920
STAT...buffer is not pinned co 3,415 533 -2,882
LATCH.cache buffers chains 5,332 1,320 -4,012
STAT...PX local messages recv 0 4,262 4,262
STAT...PX local messages sent 0 4,262 4,262
STAT...consistent gets 5,112 706 -4,406
STAT...consistent gets from ca 5,112 706 -4,406
STAT...session logical reads 5,142 733 -4,409
STAT...consistent gets - exami 5,028 210 -4,818
LATCH.process queue reference 1 58,878 58,877 ***
STAT...cell physical IO interc 212,992 0 -212,992
STAT...physical IO disk bytes 212,992 0 -212,992
STAT...physical read total byt 212,992 0 -212,992
STAT...physical read bytes 212,992 0 -212,992
STAT...session uga memory max 604,728 115,320 -489,408
STAT...session uga memory 604,728 84,664 -520,064
STAT...session pga memory 720,896 131,072 -589,824
STAT...session pga memory max 720,896 131,072 -589,824

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
7,629 64,444 56,815 11.84%

PL/SQL procedure successfully completed.


Reference
---------
Oracle® Database Utilities 11g Release 1 (11.1)
12 External Tables Concepts: Performance Hints When Using External Tables


10 mile point

No comments: