Every time when there is a problem, and people come to me to ask solutions, or to give solutions, I must ask, as always, could someone describe the business goal?
Plus current data flow and data process logic would be better.
To help me understanding the problem we're trying to solve, to figure out what the problem really is.
A solid understanding of the goal of ABC.
“And most importantly of all - a solid understanding of the goal, of what the question is. Tuning a query or process is really hard (impossible I would say) - unless we understand the question in the first place. I cannot tell you how many times I've not been able to tune a query until I had the question in hand. Certainly we can derive a question from a query - however, many times that derived question is much more confining than the real question being asked. For example, many people use outer joins in all queries - they are "afraid" of losing a row (perhaps they got "burned" in some past experience and now use outer joins everywhere). If the objects are related in a one to one mandatory fashion - we don't need an outer join at all. The question derived from the query is much more confining than reality.” - AskTom
To help us better understanding an application business goal and application logic, could you please help us to find out below details?
Some follow up questions will be asked, or one or more teleconference meetings will be helpful.
And then some design options can be given from there.
/******* Question Start *******/
- What is the goal of ABC system? Please describe the problem that ABC is try to solve. The big picture.
- For example: Goal: Replicate data from MySQL to Oracle.
- What is the goal for each module?
- For example: Goal of module A: Capture incremental changes from MySQL.
Application logic and data flow for each transaction module.
- What are the input, process, output for each step?
- Which step could go wrong?
- What would be the data look like after fail?
- How often it goes wrong? e.g. 3 times a day.
- What are the limitations? e.g. cannot process cash orders.
- How to handle failure/exception? e.g. rollback.
- Single process single connection, or multiple concurrent processes and multi-connections ?
- publisher => subscription
- master/coordinator => slave workers
- What is performance SLA?
- Response time. E.g. 90% < 2 seconds, 99.9% < 5 seconds per transaction.
- Transaction volume. E.g. process 5000 orders/hours.
- What is availability SLA? E.g. Max to 8 hours downtime a year.
Capture all SQL called in applicate by SQL Session Trace tool.
For all transactions that access table user_abc.table_a,
including every SQL SELECT/INSERT/UPDATE/DELETE.
(Another email will be sent to show you how to capture all SQL)
For example, the shopping cart checkout transaction would be:
Business goal: Ensure the online shopping process clear, fast, simple ...
The goal for each module:
Application logic and data flow.
/* create_order */
SELECT product_id, qty, ... FROM shopping_cart_item WHERE shopping_cart_id = 123
INSERT INTO order ...VALUES (...);
INSERT INTO order_item ...VALUES (...);
INSERT INTO order_audit ...VALUES (...);
DELETE shopping_cart ... WHERE shopping_cart_id = 123;
DELETE shopping_cart_item ... WHERE shopping_cart_id = 123;
UPDATE inventory SET qty = qty - 1 WHERE item_id = 7788123;
/* create_finance_journal */
INSERT into po_journal...VALUES (...);
INSERT into sub_ledger ...VALUES (...);
/* notify_buyer, call 3rd party API */
/******* End *******/
I really appreciate your effort.
Design is to finding the problem, not solution.
We’ll be doing this for performance, scalability, security and so on.
We'll meet heavy resistance, big time - I am sure.
So we always be able to talk "numerously" - in numbers.
"if we do this ETL process slow by slow (row by row) and take just 1ms per row (pretty fast don't you think?) and you have to process 5,000,000 rows - it will take almost 1.5 hours. When we get to 50 million rows - it'll be over a half a day. When we get to 500 million rows - it will take a WEEK to process. if I am allowed to do it in the database in a single DDL statement to transform the data, I can be done with 500 million rows in 2 minutes."
See my blog …, for the proof of why row by row process is slow by slow process, and why set based process is much more better than row by row process.
- "More orders equals more revenue". From a software architect in Google. http://www.eaipatterns.com/ramblings/18_starbucks.html
Charlie | Database Architect Developer