Thursday, September 13, 2012

Polyglot persistence update 2012-Sept

This Polyglot Persistence broadcasting will be irregular.


* Pros *
 - Concurrency Improvements, from daemon global lock to DB Level Locking.
    -- it'll eventruall support Collection level locking in future release.
 - TTL Collections
 - Aggregation Framework,
 - Tag Aware Sharding

* Cons *

Their arguments center around a couple of core themes:

- Product Maturity: low QA quality, sub-systems repeatedly breaking, driver inconsistencies, scattered and incomplete documentation, complex node management.

- Design Decisions: single write lock, memory-mapped files means that the server has little control over its performance, replication has no proxying causing ridiculous connection numbers, sharding is possible but very complex to implement b/c of several special nodes, sharding is unreliable, lots of pitfalls and gotchas.

The key thing to understand about MongoDB is that it's not a magic bullet. It has significant tradeoffs like everything else. At the end of the day, MongoDB kind of lives in its own little niche. It makes a lot of unique trade-offs that must be understood to use it effectively.

MongoDB is really a set of trade-offs. Many of the (NoSQL and SQL) databases are very specific in what they do. MongoDB is less specific but is serviceable for handling many different cases.

However, once you get to a certain scale, MongoDB will underperform the specialized solution. In fact, I'm seeing this at my day job where we are actively moving several sub-systems off MongoDB and onto better-suited products.

* Story *

Instagram was in the process of moving to MongoDB, but they eventually gave up and stuck with PostgreSQL.
- 2012-May-16

[PostgreSQL] 9.2

1.1 Index-only scans
1.2 Replication improvements
1.3 JSON datatype
1.4 Range Types

Many clients are excited about [JSON datatype], this could be another reason to move away from MongoDB.'s_new_in_PostgreSQL_9.2


* Cons *
Rows in HBase tables are sorted by row key. The sort is byte-ordered. All table accesses are via the table row key -- its primary key.

- byte-ordered, many sites have complained the inflexibility of this abstract sorting.

When desiging the key, we need to fully understand how the byte array is stored, put together the frequently accessed data set alone.

Example: lexicographical int sort results:
 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21, ..., 9,91,92,93,9495,96,97,98,99.

To keep shaping the natural order, then the row keys must be left padding 0:


数据按照Row key的字典序(byte order)排序存储。设计key时,要充分排序存储这个特性,将经常一起读取的行存储放到一起。(位置相关性) 注意: 字典序对int排序的结果是1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,…,9,91,92,93,94,95,96,97,98,99。要保持整形的自然序,行键必须用0作左填充。

Charlie 木匠 | Database Architect Developer