Monday, February 10, 2014

Design to plan database capacity

Purpose

The database resources is limited on CPU, Memory and IO. We have to do capacity planning well ahead. That includes transaction arrival rate, Transaction's SQL resource usage: LIO / CPU / Memory / Redo / Undo, Lock, Latch,...etc. We have to design and build a synthetic transaction workload benchmark to get these matrices.
Also "We have really got to think about the data and how it will be used, how it will be accessed so that we can utilize the optimal structure for that.
Make sure that the database organizes the data in a way that’s efficient for us to retrieve that data."
As we talked, here is the database criteria need to be filled before build a new project involving database process, to get started.
(Actually, we'd better consider it from the day one design the system for the new project)

  • I. Function priority, asynchronous batch process OK?
  • II. work load ( Planning the performance )
    • II.i. Data Usage
      • a1. UI query and OLTP transaction response time
      • a2. Batch throughput
  • III. Growth and Archive - Data lifecycle management
  • Target
    • Capacity plan
    • Data life cycle management
    • SLA
---

I. Function priority, asynchronous batch process OK?

 #) Downtime allowed
   - -) HA implementation (physical standby or Stream replication or RMAN backup and recovery or logical data dump expdp/impdp)
 #) Process delay/latency allowed or data consistency delay/latency allowed
   - -) to select an ACID or BASE solution

II. work load ( Planning the performance )

  #) OLTP: average and peak executions per hour per day and average/peak response time requirement
  #) Batch: throughput requirement, see System Architecture Model
See Define+database+application+SLA

II.i. Data Usage

a1. UI query and OLTP transaction response time


 SELECT customer_name
 FROM customer
 WHERE customer_id = 12345678;

Type
Arrival rate
Time
Peak season
SLA 90%
SLA 99.9%
Avg
3 / minute


< 2 seconds
< 4 seconds
Peak
20 / minute
11am ~ 2pm
Xmas
< 3.5 seconds
< 5 seconds

a2. Batch throughput

INSERT INTO order_stats(time_id, product_id, order_cnt)
SELECT to_date('2011/01/01','yyyy/mm/dd'), product_id, count(*)
FROM order
WHERE order_date BETWEEN to_date('2011/01/01','yyyy/mm/dd')
                     AND to_date('2011/02/01','yyyy/mm/dd')
GROUP BY product_id;
Type
Arrival rate
Rows per Exec(Volume)
Time
Peak season
Throughput SLA
Avg
2 times / day
30


50,000 rows / day
Peak
2 times / day
200
12:00 ~ 14:00, 20:00 ~ 22:00
January
2,000,000 rows / day

III. Growth and Archive - Data lifecycle management

 #) Growth rate, E.g. 30MB/Day
 #) Data retention and archive plan, E.g. 1 year working data, 3 years reporting data in DW, 10 years archive data in Tape/DVD.

Target

Based on the requirements, the architects are able to design a solution.
We will select one data process pattern(good practice) or design a new one to meet the SLA and business objective.

Capacity plan

We will plan our hardware resource to meet the above SLA.

Data life cycle management

It is not necessary to put everything into our expensive OLTP system, it tightly bind to SLA.
Usage Profile
Data Retention
Priority
SLA
Storage/Disk
Write/Read hot
Last 2 months
priority one
99.999% HA, 7 x 24
Most Expensive
Read hot
Last 2 ~ 6 months
priority 2
99.99% HA, 7 x 24
Expensive, copy to DW
Read only cooler
Last 6 ~ 36 months
priority 3
nearline storage 99% HA, 5 x 8
Cheap
Read only cold
Older than 3 years
priority 4
move to DVD or tape, only mount online when needed.
Cheapest
Obsoleted
Older than 9 years
priority 5
Keep aggregate data and Drop it
...
There are many requirements to effectively manage the data lifecycle. Effective and efficient management of the lifecycle will reduce both the hardware and management costs associated with the database facility, and improve overall database performance. To begin the process, the lifecycle for each events/fact table data stream must be understood. This includes defining who will ultimately use the data, and how they will use it. Based upon the usage profile, the characteristics of the life of the data can be quantified.

Reference:


Data Usage Over Time
Description of Figure 5-2 follows

Data Lifecycle
Description of Figure 5-3 follows