Developer 木匠

Tuesday, November 03, 2015

2 level left outer join

Here is Oracle outer join behavior observation.
We get same results between Oracle notation (+), and ANSI join syntax (LEFT OUTER JOIN)

Oracle 12.1.0.2.0

drop table t1 purge;
drop table t2 purge;
drop table t3 purge;

create table t1 (id number);
create table t2 (id number);
create table t3 (id number);

insert into t1(id) select rownum from dual connect by level <= 3;
insert into t2(id) select rownum from dual connect by level <= 2;
insert into t3(id) select rownum from dual connect by level <= 1;
commit;

select t1.id t1_id, t2.id t2_id, t3.id t3_id
from t3,t2,t1
where t1.id = t2.id(+)
and t2.id = t3.id(+)
order by t1.id;

T1_ID   T2_ID    T3_ID
------- ------- --------
      1       1        1
      2       2
      3

drop table t1 purge;
drop table t2 purge;
drop table t3 purge;

create table t1 (id number);
create table t2 (id number, t1_id number);
create table t3 (id number, t2_id number);

insert into t1(id) select rownum from dual connect by level <= 3;
insert into t2(id, t1_id) select rownum, rownum from dual connect by level <= 2;
insert into t3(id, t2_id) select rownum, rownum from dual connect by level <= 1;
commit;

select t1.id, t2.id t2_id, t2.t1_id, t3.id t3_id, t3.t2_id
from t3,t2,t1
where t1.id = t2.t1_id(+)
and t2.id = t3.t2_id(+)
order by t1.id;

ID T2_ID T1_ID T3_ID T2_ID
---- ------ ------ ------ ------
   1      1      1      1      1
   2      2      2
   3

Your homework is to write ANSI join syntax (LEFT OUTER JOIN) SQL, to get same result.   ^_^

Thursday, March 12, 2015

reference partition

/****
Purpose:

Demo Oracle reference partition, 3 levels parent child relationships, partition key is not primary key or foreign key.

@C:\z00\dev\project\migration\reference\ref_partition.sql

Modification History:
Date Who What
12-Mar-2015: Charlie(Yi): Create the file,

****/

drop table order_item_color cascade constraint purge;
drop table order_line_items cascade constraint purge;
drop table orders cascade constraint purge;

create table orders
(
order_no number primary key,
order_date date,
data varchar2(30)
)
enable row movement
partition by range (order_date)
(
partition part_2014 values less than (to_date('01-01-2015','dd-mm-yyyy')),
partition part_2015 values less than (to_date('01-01-2016','dd-mm-yyyy'))
);

insert into orders values ( 1, to_date( '01-jun-2014', 'dd-mon-yyyy' ), 'xxx' );
insert into orders values ( 2, to_date( '01-jun-2015', 'dd-mon-yyyy' ), 'xxx' );

drop table order_line_items cascade constraint purge;
create table order_line_items
(
order_no number
constraint order_line_items_nn1 not null,
line_no number,
data varchar2(30),
constraint c1_pk primary key(line_no),
constraint c1_fk_p foreign key(order_no) references orders
)
enable row movement
partition by reference(c1_fk_p)
;

insert into order_line_items values ( 1, 1, 'zzz' );
insert into order_line_items values ( 1, 2, 'zzz' );

insert into order_line_items values ( 2, 3, 'zzz' );
insert into order_line_items values ( 2, 4, 'zzz' );

select * from order_line_items partition(PART_2014);
/*
ORDER_NO LINE_NO DATA
---------- ---------- -----------------
1 1 zzz
1 2 zzz
*/

column table_name format a18
column partition_name format a15

select table_name, partition_name
from user_tab_partitions
where table_name in ( 'ORDERS', 'ORDER_LINE_ITEMS', 'ORDER_ITEM_COLOR' )
order by table_name, partition_name;

drop table order_item_color cascade constraint purge;
create table order_item_color
(
line_no number
constraint order_line_size_nn1 not null,
color_no number,
data varchar2(30),
constraint order_item_color_pk primary key(color_no),
constraint order_item_color_fk foreign key(line_no) references order_line_items
)
enable row movement
partition by reference(order_item_color_fk)
;

insert into order_item_color values ( 1, 1, 'y' );
insert into order_item_color values ( 2, 2, 'y' );
insert into order_item_color values ( 3, 3, 'yy' );
insert into order_item_color values ( 4, 4, 'yy' );
commit;

--alter table orders drop partition part_2014 update global indexes;

alter table orders add partition
part_2016 values less than
(to_date( '01-01-2017', 'dd-mm-yyyy' ));

select table_name, partition_name
from user_tab_partitions
where table_name in ( 'ORDERS', 'ORDER_LINE_ITEMS', 'ORDER_ITEM_COLOR' )
order by table_name, partition_name;

/*
TABLE_NAME PARTITION_NAME
------------------ ---------------
ORDERS PART_2014
ORDERS PART_2015
ORDERS PART_2016
ORDER_ITEM_COLOR PART_2014
ORDER_ITEM_COLOR PART_2015
ORDER_ITEM_COLOR SYS_P877
ORDER_LINE_ITEMS PART_2014
ORDER_LINE_ITEMS PART_2015
ORDER_LINE_ITEMS PART_2016
*/

select * from order_item_color partition(PART_2014);
/*
LINE_NO COLOR_NO DATA
---------- ---------- ------
1 1 y
2 2 y
*/

select * from order_item_color partition(PART_2015);
/*
LINE_NO COLOR_NO DATA
---------- ---------- ------
3 3 yy
4 4 yy
*/

Tuesday, March 10, 2015

Reading notes - Developing Successful Oracle Applications

Here we go,

Knowing what’s out there in Oracle.
Solving problem simply. (Don’t reinvent the wheel)
Layered programming. (Database API)

“A layer of abstraction and indirection is another example of defensive programming to allow for portability and code scalability. it is to layer our access to the database when necessary.”

“No matter what database we are using, we paid a lot for these technologies,

isn't it in our best interest to exploit it to the fullest extent possible, squeezing every last bit of functionality we can out of that product ? ”

Friday, August 29, 2014

How to estimate Cassandra database processing Capacity

Estimate Cassandra database processing Capacity.

- Use cases
- Select group_id,ts_upd from my_table where pk_col = 'xxxxxxxxxxxxx';
- Select group_id,ts_upd from my_table where index1_col = 1234;
- Select group_id,ts_upd from my_table where index2_col = 1234;
- Select group_id,ts_upd from my_table where index3_col = 1234;
All queries return 1 or 0 rows.
- 80% time only return group_id,
- update ?
- Create a data model
- pk_col VARCHAR(20),
- index1_col VARCHAR(30),
- index2_col VARCHAR(30),
- index3_col VARCHAR(30),
- group_id NUMBER(10),
- ts_upd : TIMESTAMP : 8 bytes,
- record size: 128 bytes,
- Replication, High Available
- Data distribution and replication
- Strategy 1: one data center, 3 nodes, replication_factor = 3. Write Consistency Levels = 2
- Strategy 2: two data centers, 3 nodes on each data center,
- Murmur3Partitioner
- Round((read 2 copy, or write 3 data copy) / 3 node) = 1. The redundant work is distributed to 3 nodes.
- Estimate Casandra processing power with current price-performance sweet spot hardware.
- variable read/write % criteria, 100:0, 90:10, 0:100
- Transaction volume
- Response time
- Memory: 64GB : insure data is always in cache.
- CPU: 8-core CPU processors
- SSD: can provide P99.999 under 5 *milliseconds* regardless RAM usage.
- SATA spinning disks: Hard drives will give wide ranges of latency, easily up to 5 *seconds* in the P99.999% range
- Basic operation time,
- average read latency < 0.16 ms, or 6250 reads/sec
- average write latency < 0.025 ms, or 40,000 writes/sec
- max latency < 5ms, 99.999%
- hypothesis / presumption
- 1/4 queries on each index.
- turn off key cache and row cache
- Distributed index and MV data model, more code to maintain,
- Sizing overhead
- Column size = 15 + size of name(10) + size of value : use short column name,
- row overhead = 23
- primary key index size = 32 + average_key_size
- index options
"Cassandra internal: http://www.wentnet.com/blog/?p=77"
- Primary Key
- Logical reads = 1,
- Secondary index
- (index column, primary key column), size of value: 50.
- Logical reads = O(n) + 1 = 3 + 1 = 4; n is number of nodes
- Logical writes = 1 + 1 = 2;
- 100% read : 6250 / (1 + 3 * 4) / 4 = 120 queries / second
- 100% write : 40000 / (1 + 3 * 2) = 5714 rows / second
- 90% read, 10% write:
- 120 * 90% = 108 queries / second
- 5714 * 10% = 571 rows / second
- Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 50 + 23 + (32+30))*3) / 3 = 16.7GB
- Distributed index.
- (index column, primary key column), size of value: 50.
- Logical reads = 1 + 1 = 2;
- Logical writes = 1 + 3 = 4;
- 100% read : 6250 / (1 + 3 * 2) / 4 = 223 queries/second
- 100% write : 40000 / (1 + 3 * 4) = 3077 rows/second
- 90% read, 10% write:
- 223 * 90% = 201 queries/second
- 3077 * 10% = 308 rows/second
- Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 50 + 23 + (32+30))*3) / 3 = 16.7GB
- Materialized View.
- ((index column, primary key column, group_id, ts_upd), size of value: 68.
- Logical reads = 1
- Logical writes = 1 + 3 = 4;
- Row size = (140 + 32) * 4 = 688
- 100% read : 6250 / (1 + 3 * 1) / 4 = 391 queries/second
- 100% write : 40000 / (1 + 3 * 4) = 3077 rows/second
- 90% read, 10% write:
- 391 * 90% = 352 queries/second
- 3077 * 10% = 308 rows/second
- Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 68 + 23 + (32+30))*3) / 3 = 17.7GB

- Oracle database processing power
- Max to 2000 queries per second, update 20 million rows a day.
- query latency:
- 99% < 0.01 second
- 99.99% < 0.2 second

- --
- Reference
- http://planetcassandra.org/nosql-performance-benchmarks/#EndPoint
- http://www.datastax.com/dev/blog/datastax-sandbox-2-0-now-available
- http://www.stackdriver.com/cassandra-aws-gce-rackspace/
"http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningHardware_c.html"

Thanks,
Charlie 木匠 | Database Architect Developer

Tuesday, August 26, 2014

Estimate Casandra processing power with current price-performance sweet spot hardware

Hi Cassandra database experts,

Can you help us estimate the Cassandra processing power based on below hardware and cluster configuration ?

They are current price-performance sweet spot hardware.

Data Model: Only one table

create table_1

(

key_1 varchar(30),

key_2 varchar(30),

key_3 varchar(30),

col_1 varchar(30),

col_2 varchar(30),

col_3 varchar(500),

primary key (key_1)

);

Besides primary key index, there are 3 indexes, are respective on column key_2, key_3 and Key_4.

There are 60 million rows.

Average row length 500 bytes.

Memory: 16GB to 64GB

CPU: 8-core CPU processors

Disk:

- SSD (solid state drives) : Size ?

- SATA spinning disks : Size ?

Data Model 1: One base table with 3 indexes
Data Model 2: One base table and 3 Materialized View tables.

Data distribution and replication
- Strategy 1: one data center, 3 nodes, replication_factor = 3. Write Consistency Levels = 2
- Strategy 2: two data centers, 3 nodes on each data center,

The final matrix will be looked like this:

Read/Write operation pattern	Max Throughput	Response time
		99% Reads	99.99% Reads	99% Writes	99.99% Write
100% read	? reads/second	< ? seconds	< ? seconds
99% read, 1% write	? reads/second, ? writes/second	< ? seconds	< ? seconds	< ? seconds	< ? seconds
90% read, 10% write	? reads/second ? writes/second	< ? seconds	< ? seconds	< ? seconds	< ? seconds
50% read, 50% write	? reads/second ? writes/second	< ? seconds	< ? seconds	< ? seconds	Less than ? seconds

Disk storage size : ? GB.

Please help to fill the green text ? with estimated numbers.

If you could tell us how did you calcuate these number, it will be much better.

Thanks,
Charlie 木匠 | Database Architect Developer

Thursday, May 01, 2014

JavaScript router

Question:

I couldn't understand the function respond in below code snippet .

How can it know req.params.name from '/hello/:name' ?

var restify = require('restify');
function respond(req, res, next) {
    res.send('hello ' + req.params.name);
}
var server = restify.createServer();
server.get('/hello/:name', respond);
server.head('/hello/:name', respond);
server.listen(8080, function() {
    console.log('%s listening at %s', server.name, server.url);
});

Link: http://www.joyent.com/developers/getting-started-node-js

Answer:

It is handled automatically by a router. Everything after a ":" is consider a param, and will be put into req.params.XXX by router. Different framework has different router. But it is always follow the similar rule: If you define a route like in this code '/hello/:abc', then a real route '/hello/:charlie" will cause params.abc -> charlie

I guess, in your case, you are using Node express as router. So you can check out Express's router function, how it is parse the URL. I am using Metoer.js's iron router, all of them should be doing the same thing.

- Kevin Zhang

Wednesday, April 23, 2014

6 months Oracle DBA contract job

Greetings,

Work for Canada BC province government. It's easy, simple, and pays well.
90% it will be extended to 2 to 3 years long term job.

Please leave comments if you are interested.

Thanks,
Charlie 木匠 | Database Architect Developer