- Use cases
- Select group_id,ts_upd from my_table where pk_col = 'xxxxxxxxxxxxx';
- Select group_id,ts_upd from my_table where index1_col = 1234;
- Select group_id,ts_upd from my_table where index2_col = 1234;
- Select group_id,ts_upd from my_table where index3_col = 1234;
All queries return 1 or 0 rows.
- 80% time only return group_id,
- update ?
- Create a data model
- pk_col VARCHAR(20),
- index1_col VARCHAR(30),
- index2_col VARCHAR(30),
- index3_col VARCHAR(30),
- group_id NUMBER(10),
- ts_upd : TIMESTAMP : 8 bytes,
- record size: 128 bytes,
- Replication, High Available
- Data distribution and replication
- Strategy 1: one data center, 3 nodes, replication_factor = 3. Write Consistency Levels = 2
- Strategy 2: two data centers, 3 nodes on each data center,
- Murmur3Partitioner
- Round((read 2 copy, or write 3 data copy) / 3 node) = 1. The redundant work is distributed to 3 nodes.
- Estimate Casandra processing power with current price-performance sweet spot hardware.
- variable read/write % criteria, 100:0, 90:10, 0:100
- Transaction volume
- Response time
- Memory: 64GB : insure data is always in cache.
- CPU: 8-core CPU processors
- SSD: can provide P99.999 under 5 *milliseconds* regardless RAM usage.
- SATA spinning disks: Hard drives will give wide ranges of latency, easily up to 5 *seconds* in the P99.999% range
- Basic operation time,
- average read latency < 0.16 ms, or 6250 reads/sec
- average write latency < 0.025 ms, or 40,000 writes/sec
- max latency < 5ms, 99.999%
- hypothesis / presumption
- 1/4 queries on each index.
- turn off key cache and row cache
- Distributed index and MV data model, more code to maintain,
- Sizing overhead
- Column size = 15 + size of name(10) + size of value : use short column name,
- row overhead = 23
- primary key index size = 32 + average_key_size
- index options
"Cassandra internal: http://www.wentnet.com/blog/?p=77"
- Primary Key
- Logical reads = 1,
- Secondary index
- (index column, primary key column), size of value: 50.
- Logical reads = O(n) + 1 = 3 + 1 = 4; n is number of nodes
- Logical writes = 1 + 1 = 2;
- 100% read : 6250 / (1 + 3 * 4) / 4 = 120 queries / second
- 100% write : 40000 / (1 + 3 * 2) = 5714 rows / second
- 90% read, 10% write:
- 120 * 90% = 108 queries / second
- 5714 * 10% = 571 rows / second
- Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 50 + 23 + (32+30))*3) / 3 = 16.7GB
- Distributed index.
- (index column, primary key column), size of value: 50.
- Logical reads = 1 + 1 = 2;
- Logical writes = 1 + 3 = 4;
- 100% read : 6250 / (1 + 3 * 2) / 4 = 223 queries/second
- 100% write : 40000 / (1 + 3 * 4) = 3077 rows/second
- 90% read, 10% write:
- 223 * 90% = 201 queries/second
- 3077 * 10% = 308 rows/second
- Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 50 + 23 + (32+30))*3) / 3 = 16.7GB
- Materialized View.
- ((index column, primary key column, group_id, ts_upd), size of value: 68.
- Logical reads = 1
- Logical writes = 1 + 3 = 4;
- Row size = (140 + 32) * 4 = 688
- 100% read : 6250 / (1 + 3 * 1) / 4 = 391 queries/second
- 100% write : 40000 / (1 + 3 * 4) = 3077 rows/second
- 90% read, 10% write:
- 391 * 90% = 352 queries/second
- 3077 * 10% = 308 rows/second
- Storage Size: 60M * ((15+10)*3 + 128 + 23 + (32+20) + ((15+10)*2 + 68 + 23 + (32+30))*3) / 3 = 17.7GB
- Oracle database processing power
- Max to 2000 queries per second, update 20 million rows a day.
- query latency:
- 99% < 0.01 second
- 99.99% < 0.2 second
- --
- Reference
- http://planetcassandra.org/nosql-performance-benchmarks/#EndPoint
- http://www.datastax.com/dev/blog/datastax-sandbox-2-0-now-available
- http://www.stackdriver.com/cassandra-aws-gce-rackspace/
"http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningHardware_c.html"
Thanks,
Charlie 木匠 | Database Architect Developer
Charlie 木匠 | Database Architect Developer