Purpose
The database resources is limited on CPU, Memory and IO. We have to do capacity planning well ahead. That includes transaction arrival rate, Transaction's SQL resource usage: LIO / CPU / Memory / Redo / Undo, Lock, Latch,...etc. We have to design and build a synthetic transaction workload benchmark to get these matrices.
Also "We have really got to think about the data and how it will be used, how it will be accessed so that we can utilize the optimal structure for that.
Make sure that the database organizes the data in a way that’s efficient for us to retrieve that data."
Make sure that the database organizes the data in a way that’s efficient for us to retrieve that data."
As we talked, here is the database criteria need to be filled before build a new project involving database process, to get started.
(Actually, we'd better consider it from the day one design the system for the new project)
(Actually, we'd better consider it from the day one design the system for the new project)
- I. Function priority, asynchronous batch process OK?
- II. work load ( Planning the performance )
- II.i. Data Usage
- a1. UI query and OLTP transaction response time
- a2. Batch throughput
- II.i. Data Usage
- III. Growth and Archive - Data lifecycle management
- Target
- Capacity plan
- Data life cycle management
- SLA
---
I. Function priority, asynchronous batch process OK?
#) Downtime allowed
- -) HA implementation (physical standby or Stream replication or RMAN backup and recovery or logical data dump expdp/impdp)
#) Process delay/latency allowed or data consistency delay/latency allowed
- -) to select an ACID or BASE solution
II. work load ( Planning the performance )
#) OLTP: average and peak executions per hour per day and average/peak response time requirement
#) Batch: throughput requirement, see System Architecture Model
See Define+database+ application+SLA
II.i. Data Usage
a1. UI query and OLTP transaction response time
SELECT customer_name
FROM customer
WHERE customer_id = 12345678
;
Type
|
Arrival rate
|
Time
|
Peak season
|
SLA 90%
|
SLA 99.9%
|
---|---|---|---|---|---|
Avg
|
3 / minute
|
< 2 seconds
|
< 4 seconds
| ||
Peak
|
20 / minute
|
11am ~ 2pm
|
Xmas
|
< 3.5 seconds
|
< 5 seconds
|
a2. Batch throughput
INSERT INTO order_stats(time_id, product_id, order_cnt) SELECT to_date( '2011/01/01' , 'yyyy/mm/ ), product_id, count(*)FROM order WHERE order_date BETWEEN to_date( '2011/01/01' , 'yyyy/mm/ ) AND to_date( '2011/02/01' , 'yyyy/mm/ ) GROUP BY product_id; |
Type
|
Arrival rate
|
Rows per Exec(Volume)
|
Time
|
Peak season
|
Throughput SLA
|
---|---|---|---|---|---|
Avg
|
2 times / day
|
30
|
50,000 rows / day
| ||
Peak
|
2 times / day
|
200
|
12:00 ~ 14:00, 20:00 ~ 22:00
|
January
|
2,000,000 rows / day
|
III. Growth and Archive - Data lifecycle management
#) Growth rate, E.g. 30MB/Day
#) Data retention and archive plan, E.g. 1 year working data, 3 years reporting data in DW, 10 years archive data in Tape/DVD.
—
Target
Based on the requirements, the architects are able to design a solution.
We will select one data process pattern(good practice) or design a new one to meet the SLA and business objective.
Capacity plan
We will plan our hardware resource to meet the above SLA.
Data life cycle management
It is not necessary to put everything into our expensive OLTP system, it tightly bind to SLA.
Usage Profile
|
Data Retention
|
Priority
|
SLA
|
Storage/Disk
|
---|---|---|---|---|
Write/Read hot
|
Last 2 months
|
priority one
|
99.999% HA, 7 x 24
|
Most Expensive
|
Read hot
|
Last 2 ~ 6 months
|
priority 2
|
99.99% HA, 7 x 24
|
Expensive, copy to DW
|
Read only cooler
|
Last 6 ~ 36 months
|
priority 3
|
nearline storage 99% HA, 5 x 8
|
Cheap
|
Read only cold
|
Older than 3 years
|
priority 4
|
move to DVD or tape, only mount online when needed.
|
Cheapest
|
Obsoleted
|
Older than 9 years
|
priority 5
|
Keep aggregate data and Drop it
|
...
|
There are many requirements to effectively manage the data lifecycle. Effective and efficient management of the lifecycle will reduce both the hardware and management costs associated with the database facility, and improve overall database performance. To begin the process, the lifecycle for each events/fact table data stream must be understood. This includes defining who will ultimately use the data, and how they will use it. Based upon the usage profile, the characteristics of the life of the data can be quantified.
Reference:
Data Usage Over Time
Data Lifecycle