Friday, August 29, 2014

How to estimate Cassandra database processing Capacity

Estimate Cassandra database processing Capacity.

- Use cases
  - Select group_id,ts_upd from my_table where pk_col = 'xxxxxxxxxxxxx';
  - Select group_id,ts_upd from my_table where index1_col = 1234;
  - Select group_id,ts_upd from my_table where index2_col = 1234;
  - Select group_id,ts_upd from my_table where index3_col = 1234;
All queries return 1 or 0 rows.
  - 80% time only return group_id,
  - update ?
- Create a data model
  - pk_col VARCHAR(20),
  - index1_col VARCHAR(30),
  - index2_col VARCHAR(30),
  - index3_col VARCHAR(30),
  - group_id NUMBER(10),
  - ts_upd : TIMESTAMP : 8 bytes,
  - record size: 128 bytes,
- Replication, High Available
  - Data distribution and replication
    - Strategy 1: one data center, 3 nodes, replication_factor = 3. Write Consistency Levels = 2
    - Strategy 2: two data centers, 3 nodes on each data center,
  - Murmur3Partitioner
  - Round((read 2 copy, or write 3 data copy) / 3 node) = 1.  The redundant work is distributed to 3 nodes.
- Estimate Casandra processing power with current price-performance sweet spot hardware.
  - variable read/write % criteria, 100:0, 90:10, 0:100
    - Transaction volume
    - Response time
  - Memory: 64GB : insure data is always in cache.
  - CPU: 8-core CPU processors
  - SSD: can provide P99.999 under 5 *milliseconds* regardless RAM usage.
    - SATA spinning disks: Hard drives will give wide ranges of latency, easily up to 5 *seconds* in the P99.999% range
- Basic operation time,
  - average read latency < 0.16 ms,  or 6250 reads/sec
  - average write latency < 0.025 ms, or 40,000 writes/sec
  - max latency < 5ms, 99.999%
- hypothesis / presumption
  - 1/4 queries on each index.
  - turn off key cache and row cache
  - Distributed index and MV data model, more code to maintain,
  - Sizing overhead
    - Column size = 15 + size of name(10) + size of value  :  use short column name,
    - row overhead = 23
    - primary key index size = 32 + average_key_size
- index options
  "Cassandra internal: http://www.wentnet.com/blog/?p=77"
  - Primary Key
    - Logical reads = 1,
  - Secondary index
    - (index column, primary key column), size of value: 50.
    - Logical reads = O(n) + 1 = 3 + 1 = 4;  n is number of nodes
    - Logical writes = 1 + 1 = 2;
    - 100% read : 6250 / (1 + 3 * 4) / 4 = 120 queries / second
    - 100% write : 40000 / (1 + 3 * 2) = 5714 rows / second
    - 90% read, 10% write:
      - 120 * 90% = 108 queries / second
      - 5714 * 10% = 571 rows / second
    - Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 50 + 23 + (32+30))*3) / 3 = 16.7GB
  - Distributed index.
    - (index column, primary key column), size of value: 50.
    - Logical reads = 1 + 1 = 2;
    - Logical writes = 1 + 3 = 4;
    - 100% read : 6250 / (1 + 3 * 2) / 4 = 223 queries/second
    - 100% write : 40000 / (1 + 3 * 4) = 3077 rows/second
    - 90% read, 10% write:
      - 223 * 90% = 201 queries/second
      - 3077 * 10% = 308 rows/second
    - Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 50 + 23 + (32+30))*3) / 3 = 16.7GB
  - Materialized View.
    - ((index column, primary key column, group_id, ts_upd), size of value: 68.
    - Logical reads = 1
    - Logical writes = 1 + 3 = 4;
    - Row size = (140 + 32) * 4 = 688
    - 100% read : 6250 / (1 + 3 * 1) / 4 = 391 queries/second
    - 100% write : 40000 / (1 + 3 * 4) = 3077 rows/second
    - 90% read, 10% write:
      - 391 * 90% = 352 queries/second
      - 3077 * 10% = 308 rows/second
    - Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 68 + 23 + (32+30))*3) / 3 = 17.7GB

- Oracle database processing power
  - Max to 2000 queries per second, update 20 million rows a day.
  - query latency:
    - 99% < 0.01 second
    - 99.99% < 0.2 second

- --
- Reference
  - http://planetcassandra.org/nosql-performance-benchmarks/#EndPoint
  - http://www.datastax.com/dev/blog/datastax-sandbox-2-0-now-available
  - http://www.stackdriver.com/cassandra-aws-gce-rackspace/
  "http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningHardware_c.html"


Thanks,
Charlie
木匠 | Database Architect Developer

Tuesday, August 26, 2014

Estimate Casandra processing power with current price-performance sweet spot hardware

Hi Cassandra database experts,

Can you help us estimate the Cassandra processing power based on below hardware and cluster configuration ?
They are current price-performance sweet spot hardware.

Data Model:  Only one table

create table_1
(
 key_1 varchar(30),
 key_2 varchar(30),
 key_3 varchar(30),
 key_3 varchar(30),
 col_1 varchar(30),
 col_2 varchar(30),
 col_3 varchar(500),
 primary key (key_1)
);

Besides primary key index, there are 3 indexes, are respective on column key_2, key_3 and Key_4.

There are 60 million rows.
Average row length 500 bytes.

Memory: 16GB to 64GB
CPU: 8-core CPU processors
Disk:
-          SSD (solid state drives) : Size ?
-          SATA spinning disks : Size ?

  • Data Model 1: One base table with 3 indexes
  • Data Model 2: One base table and 3 Materialized View tables.
  • Data distribution and replication
    • Strategy 1: one data center, 3 nodes, replication_factor = 3. Write Consistency Levels = 2
    • Strategy 2: two data centers, 3 nodes on each data center,

The final matrix will be looked like this:

Read/Write operation pattern
Max Throughput
Response time


99% Reads
99.99% Reads
99% Writes
99.99% Write
100% read
? reads/second
< ? seconds
< ? seconds

99% read, 1% write
? reads/second,
? writes/second
< ? seconds
< ? seconds
< ? seconds
< ? seconds
90% read, 10% write
? reads/second
? writes/second
< ? seconds
< ? seconds
< ? seconds
< ? seconds
50% read, 50% write
? reads/second
? writes/second
< ? seconds
< ? seconds
< ? seconds
Less than ? seconds
Disk storage size : ? GB.

Please help to fill the green text ? with estimated numbers.
If you could tell us how did you calcuate these number, it will be much better.


Thanks,
Charlie 木匠 | Database Architect Developer

Thursday, May 01, 2014

JavaScript router

Question:

I couldn't understand the function respond in below code snippet .
How can it know req.params.name from '/hello/:name' ?


var restify = require('restify');
function respond(req, res, next) {
    res.send('hello ' + req.params.name);
}
var server = restify.createServer();
server.get('/hello/:name', respond);
server.head('/hello/:name', respond);
server.listen(8080, function() {
    console.log('%s listening at %s', server.name, server.url);
});

Link: http://www.joyent.com/developers/getting-started-node-js

Answer:


It is handled automatically by a router. Everything after a ":" is consider a param, and will be put into req.params.XXX by router. Different framework has different router. But it is always follow the similar rule: If you define a route like in this code '/hello/:abc', then a real route '/hello/:charlie" will cause params.abc -> charlie

I guess, in your case, you are using Node express as router. So you can check out Express's router function, how it is parse the URL. I am using Metoer.js's iron router, all of them should be doing the same thing.

- Kevin Zhang

Wednesday, April 23, 2014

6 months Oracle DBA contract job

Greetings,

Work for Canada BC province government. It's easy, simple, and pays well.
90% it will be extended to 2 to 3 years long term job.

Please leave comments if you are interested.

Thanks,
Charlie 木匠 | Database Architect Developer

Tuesday, March 18, 2014

RMAN cold backup gotcha

This is something new I learned today from a DBA at Boeing.

Empty pot is easy to fill. "Stay hungry".

Those days many site only do online hot backup.
But cold backup still have it's suited scenario, e.g. backup dev db and clone.

Here is the script many people used to do.
The problem is when there are some long running active transactions, database instance will wait and not be closed.

.
RMAN>
shutdown immediate;
startup mount;
.

Here is the workaround, force disconnect all active sessions, startup in a consistent state, and then close.

RMAN>
ALTER SYSTEM CHECKPOINT; -- I think this is optional.  :) 
shutdown abort;
startup restrict;
shutdown immediate;
startup mount;
.

Notes: This method is not recommended for production database.

Enjoy,
Charlie

Wednesday, March 05, 2014

Simulate tail -f logger_logs table - instrument PL/SQL

debugf used to be the instrumentation tool on my kit before.

At November 7, 2013, Tom Kyte said: "debug.f is really really really old :) . I recommend Logger to do PL/SQL instrumentation."

To install and use logger 2.0, please see https://github.com/tmuth/Logger---A-PL-SQL-Logging-Utility

There are 2 good debugf features that Logger 2.0 doesn't have.

1. tail -f debug_trace_file.
2. debug.f("%s %s", l_a, l_b); -- similar to C printf format string.

Today I will show you how to simulate "tail -f" on logger_logs table.

Call example

SQL> 

set linesize 120

select * from table(u$log.log_tail);

--Test, open another session.

SQL>

begin
 for i in 1..15
 loop
   logger.log(i||' : '||rpad('.',i,'.'));
   dbms_lock.sleep(1);
 end loop;
end;
/

To exit, press Ctrl + C.

Setup and installation.

--DDL

connect sys

grant execute on dbms_lock to utility;

create or replace view utility.logger_tail_v
as
select 
  TIME_STAMP||'.'||
  SCOPE||'.'||
  MODULE||'.'||
  ACTION||'.'||
  USER_NAME||'.'||
  TEXT||':'||
  CLIENT_IDENTIFIER||
--  CALL_STACK||
  UNIT_NAME||
  LINE_NO||
  SCN||
  EXTRA log_text, id
from logger_logs;

@u$log.sql

alter package utility.u$log compile;
grant execute on utility.u$log to public;
create or replace public synonym u$log for utility.u$log;


-- Create Package u$log.  file u$log.sql


CREATE OR REPLACE PACKAGE u$log AS

/***
tail -f logger_logs;
***/

  cursor gc_log_src is
  select log_text,id  from logger_tail_v;

  TYPE log_src_tab IS TABLE OF gc_log_src%ROWTYPE;
  FUNCTION log_tail return log_src_tab PIPELINED DETERMINISTIC;

END;
/


CREATE OR REPLACE PACKAGE BODY                 u$log
/***
Goal: 1. tail -f logger_logs table,

Who| Date         | What
Charlie(Yi) | 12-June-2008 | Create the package.

alter package utility.u$log compile;
grant execute on utility.u$log to public;
create or replace public synonym u$log for utility.u$log;

***/
/*
| Copyright Information Here
|
| File name:
|
| Overview:
|
| Author(s):
|
| Modification History:
|  Date        Who          What
| 23-Dec-2013: Charlie(Yi): Instrument with Logger, to replace debugf,
| 08-Nov-2010: Charlie(Yi): Create the function.


It meets user requirements
It is maintainable
It runs fast enough to minimize user frustration

!!! Pl/SQL development workflow, Four steps of preparing an application !!!

 - Validate program requirements
  # ask lots of questions
  # what users ask for is not always the easiest way to solve a problem
  # consider other approaches, include business processes and programming algorithms
 - Implement header of the program
  # good name for the program, accurately represent the purpose of the program
  # inputs and outputs
  # overload sub-procedure ?
 - Define the test cases
  # Verify it works
  # how will I know when I am done with this program
 - Build test code

Reference:
  http://jira/jira/
|
|
*/
IS
 gs_module varchar2(30);

 PROCEDURE initialize_pkg
 IS
 BEGIN
   gs_module := 'package_name';
   --gc_scope_prefix constant VARCHAR2(31) := lower($$PLSQL_UNIT) || '.';
   gs_module := lower($$PLSQL_UNIT);
   DBMS_APPLICATION_INFO.SET_MODULE( gs_module,'init' );
 END initialize_pkg;

 PROCEDURE commit_tx
 IS
 BEGIN
    logger.log('soft commit');
    --rollback;
    commit;
 END commit_tx;

 FUNCTION log_tail
   return log_src_tab PIPELINED DETERMINISTIC
 /*
 | Copyright Information Here
 |
 | File name:
 |
 | Overview: simulate tail -f on logger_log table.
 |

Call example:

select * from table(u$log.log_tail);

select text,id from table(u$log.log_tail);

 |
 | Author(s): Charlie Yi Zhu 木匠.
 |
 | Modification History:
 Date         Who          What
 04-Mar-2014: Charlie(Yi): Create the table function.

 */
 IS
  -- PRAGMA AUTONOMOUS_TRANSACTION;

  l_scope logger_logs.scope%type;
  l_params logger.tab_param;
  ls_action varchar2(30);
  l_cnt simple_integer := 0;
  l_row_cnt pls_integer;
  l_piece simple_integer := 0;
  l_chunk simple_integer := 10000;
  l_max_id number; l_int_id number;

  PROCEDURE initialize
  IS
  BEGIN
    ls_action := 'log_tail';
    l_scope := gs_module ||'.'|| ls_action;
    DBMS_APPLICATION_INFO.SET_module(gs_module, ls_action);
    DBMS_APPLICATION_INFO.set_client_info( gs_module||'.'||ls_action );
    l_row_cnt := 0;
    logger.log('START', l_scope);
  END initialize;

  PROCEDURE cleanup
  IS
  BEGIN
     DBMS_APPLICATION_INFO.SET_MODULE(gs_module,Null);
     DBMS_APPLICATION_INFO.set_client_info( ' ' );
     logger.log('END', l_scope);
  END cleanup;

 BEGIN
  initialize;
  /*
  Main body of program
  */

  logger.append_param(l_params, 'l_cnt', l_cnt);
  logger.log('xxxxx', l_scope, null, l_params);

  select Nvl(Max(id),0) into l_max_id from logger_logs;
  logger.log('Start id: '||l_max_id, l_scope, null, l_params);

  Loop
   FOR lc_stage IN (
    select --+ index_asc(a logger_logs_pk)
     log_text, id
    from logger_tail_v a
    where id > l_max_id
    order by id ASC
   )
   LOOP
    l_int_id := lc_stage.id;
    pipe ROW(lc_stage);
   END LOOP;
   --logger.log('Last max id: '||l_int_id, l_scope, null, l_params);
   dbms_lock.sleep(1);
   l_max_id := l_int_id;
  End Loop;

  logger.log(l_row_cnt||' rows returned', l_scope, null, l_params);

   cleanup;
   RETURN;

 EXCEPTION
  when NO_DATA_NEEDED then
      cleanup;
      return;
  WHEN OTHERS
  THEN
     /* Don't forget to clean up here, too! */
     cleanup;

     /* Standard error logging mechanism */
     logger.log_error('Unhandled Exception', l_scope, null, l_params);
     --RollBack;
     raise;
 END log_tail;

BEGIN
  initialize_pkg;
END;
/

Monday, February 10, 2014

Design to plan database capacity

Purpose

The database resources is limited on CPU, Memory and IO. We have to do capacity planning well ahead. That includes transaction arrival rate, Transaction's SQL resource usage: LIO / CPU / Memory / Redo / Undo, Lock, Latch,...etc. We have to design and build a synthetic transaction workload benchmark to get these matrices.
Also "We have really got to think about the data and how it will be used, how it will be accessed so that we can utilize the optimal structure for that.
Make sure that the database organizes the data in a way that’s efficient for us to retrieve that data."
As we talked, here is the database criteria need to be filled before build a new project involving database process, to get started.
(Actually, we'd better consider it from the day one design the system for the new project)

  • I. Function priority, asynchronous batch process OK?
  • II. work load ( Planning the performance )
    • II.i. Data Usage
      • a1. UI query and OLTP transaction response time
      • a2. Batch throughput
  • III. Growth and Archive - Data lifecycle management
  • Target
    • Capacity plan
    • Data life cycle management
    • SLA
---

I. Function priority, asynchronous batch process OK?

 #) Downtime allowed
   - -) HA implementation (physical standby or Stream replication or RMAN backup and recovery or logical data dump expdp/impdp)
 #) Process delay/latency allowed or data consistency delay/latency allowed
   - -) to select an ACID or BASE solution

II. work load ( Planning the performance )

  #) OLTP: average and peak executions per hour per day and average/peak response time requirement
  #) Batch: throughput requirement, see System Architecture Model
See Define+database+application+SLA

II.i. Data Usage

a1. UI query and OLTP transaction response time


 SELECT customer_name
 FROM customer
 WHERE customer_id = 12345678;

Type
Arrival rate
Time
Peak season
SLA 90%
SLA 99.9%
Avg
3 / minute


< 2 seconds
< 4 seconds
Peak
20 / minute
11am ~ 2pm
Xmas
< 3.5 seconds
< 5 seconds

a2. Batch throughput

INSERT INTO order_stats(time_id, product_id, order_cnt)
SELECT to_date('2011/01/01','yyyy/mm/dd'), product_id, count(*)
FROM order
WHERE order_date BETWEEN to_date('2011/01/01','yyyy/mm/dd')
                     AND to_date('2011/02/01','yyyy/mm/dd')
GROUP BY product_id;
Type
Arrival rate
Rows per Exec(Volume)
Time
Peak season
Throughput SLA
Avg
2 times / day
30


50,000 rows / day
Peak
2 times / day
200
12:00 ~ 14:00, 20:00 ~ 22:00
January
2,000,000 rows / day

III. Growth and Archive - Data lifecycle management

 #) Growth rate, E.g. 30MB/Day
 #) Data retention and archive plan, E.g. 1 year working data, 3 years reporting data in DW, 10 years archive data in Tape/DVD.

Target

Based on the requirements, the architects are able to design a solution.
We will select one data process pattern(good practice) or design a new one to meet the SLA and business objective.

Capacity plan

We will plan our hardware resource to meet the above SLA.

Data life cycle management

It is not necessary to put everything into our expensive OLTP system, it tightly bind to SLA.
Usage Profile
Data Retention
Priority
SLA
Storage/Disk
Write/Read hot
Last 2 months
priority one
99.999% HA, 7 x 24
Most Expensive
Read hot
Last 2 ~ 6 months
priority 2
99.99% HA, 7 x 24
Expensive, copy to DW
Read only cooler
Last 6 ~ 36 months
priority 3
nearline storage 99% HA, 5 x 8
Cheap
Read only cold
Older than 3 years
priority 4
move to DVD or tape, only mount online when needed.
Cheapest
Obsoleted
Older than 9 years
priority 5
Keep aggregate data and Drop it
...
There are many requirements to effectively manage the data lifecycle. Effective and efficient management of the lifecycle will reduce both the hardware and management costs associated with the database facility, and improve overall database performance. To begin the process, the lifecycle for each events/fact table data stream must be understood. This includes defining who will ultimately use the data, and how they will use it. Based upon the usage profile, the characteristics of the life of the data can be quantified.

Reference:


Data Usage Over Time
Description of Figure 5-2 follows

Data Lifecycle
Description of Figure 5-3 follows