Purpose

The database resources is limited on CPU, Memory and IO. We have to do capacity planning well ahead. That includes transaction arrival rate, Transaction's SQL resource usage: LIO / CPU / Memory / Redo / Undo, Lock, Latch,...etc. We have to design and build a synthetic transaction workload benchmark to get these matrices.

Also "We have really got to think about the data and how it will be used, how it will be accessed so that we can utilize the optimal structure for that.
Make sure that the database organizes the data in a way that’s efficient for us to retrieve that data."

As we talked, here is the database criteria need to be filled before build a new project involving database process, to get started.
(Actually, we'd better consider it from the day one design the system for the new project)

I. Function priority, asynchronous batch process OK?
II. work load ( Planning the performance )
- II.i. Data Usage
  - a1. UI query and OLTP transaction response time
  - a2. Batch throughput
III. Growth and Archive - Data lifecycle management
Target
- Capacity plan
- Data life cycle management
- SLA

---

I. Function priority, asynchronous batch process OK?

#) Downtime allowed

- -) HA implementation (physical standby or Stream replication or RMAN backup and recovery or logical data dump expdp/impdp)

#) Process delay/latency allowed or data consistency delay/latency allowed

- -) to select an ACID or BASE solution

II. work load ( Planning the performance )

#) OLTP: average and peak executions per hour per day and average/peak response time requirement

#) Batch: throughput requirement, see System Architecture Model

See Define+database+application+SLA

II.i. Data Usage

a1. UI query and OLTP transaction response time

 SELECT customer_name

 FROM customer

 WHERE customer_id = 12345678;

Type	Arrival rate	Time	Peak season	SLA 90%	SLA 99.9%
Avg	3 / minute			< 2 seconds	< 4 seconds
Peak	20 / minute	11am ~ 2pm	Xmas	< 3.5 seconds	< 5 seconds

a2. Batch throughput

INSERT INTO order_stats(time_id, product_id, order_cnt)

SELECT to_date('2011/01/01','yyyy/mm/dd'), product_id, count(*)

FROM order

WHERE order_date BETWEEN to_date('2011/01/01','yyyy/mm/dd')

                     AND to_date('2011/02/01','yyyy/mm/dd')

GROUP BY product_id;

Type	Arrival rate	Rows per Exec(Volume)	Time	Peak season	Throughput SLA
Avg	2 times / day	30			50,000 rows / day
Peak	2 times / day	200	12:00 ~ 14:00, 20:00 ~ 22:00	January	2,000,000 rows / day

III. Growth and Archive - Data lifecycle management

#) Growth rate, E.g. 30MB/Day

#) Data retention and archive plan, E.g. 1 year working data, 3 years reporting data in DW, 10 years archive data in Tape/DVD.

See An Information-centric Approach to Data Classification Reduces Data Volumes.

—

Target

Based on the requirements, the architects are able to design a solution.

We will select one data process pattern(good practice) or design a new one to meet the SLA and business objective.

Capacity plan

We will plan our hardware resource to meet the above SLA.

Data life cycle management

It is not necessary to put everything into our expensive OLTP system, it tightly bind to SLA.

Usage Profile	Data Retention	Priority	SLA	Storage/Disk
Write/Read hot	Last 2 months	priority one	99.999% HA, 7 x 24	Most Expensive
Read hot	Last 2 ~ 6 months	priority 2	99.99% HA, 7 x 24	Expensive, copy to DW
Read only cooler	Last 6 ~ 36 months	priority 3	nearline storage 99% HA, 5 x 8	Cheap
Read only cold	Older than 3 years	priority 4	move to DVD or tape, only mount online when needed.	Cheapest
Obsoleted	Older than 9 years	priority 5	Keep aggregate data and Drop it	...

There are many requirements to effectively manage the data lifecycle. Effective and efficient management of the lifecycle will reduce both the hardware and management costs associated with the database facility, and improve overall database performance. To begin the process, the lifecycle for each events/fact table data stream must be understood. This includes defining who will ultimately use the data, and how they will use it. Based upon the usage profile, the characteristics of the life of the data can be quantified.

Reference:

http://docs.oracle.com/database/121/VLDBG/part_lifecycle.htm

Data Usage Over Time

Data Lifecycle

Developer 木匠

Monday, February 10, 2014

Design to plan database capacity