Showing posts with label Apache Cassandra. Show all posts
Showing posts with label Apache Cassandra. Show all posts

Tuesday, October 14, 2014

Comparing complexity of a BTree index to an SSTable index(Cassandra)

When we compare a BTree index to an SSTable index, we should consider the write complexity:
  • When writing randomly to a copy-on-write BTree, we will incur random reads (to do the copy of the leaf node and path). So while the writes my be sequential on disk, for datasets larger than RAM, these random reads will quickly become the bottle neck. For a SSTable-like index, no such read occurs on write - there will only be the sequential writes.
  • We should also consider that in the worse case, every update to a BTree could incur log_b N IOs - that is, we could end up writing 3 or 4 blocks for every key. If key size is much less than block size, this is extremely expensive. For an SSTable-like index, each write IO will contain as many fresh keys as it can, so the IO cost for each key is more like 1/B.
In practice, this make SSTable-like thousands of times faster (for random writes) than BTrees.
We should also re-consider we read costs. We are correct than a BTree is O(log_b N) random IOs for random point reads, but a SSTable-like index is actually O(#sstables . log_b N). Without an decent merge scheme, #sstables is proportional to N. There are various tricks to get round this (using Bloom Filters, for instance used in Cassandra)

Tuesday, August 12, 2014

SSTable Format in Cassandra

Cassandra creates a new SSTable when the data of a column family in Memtable is flushed to disk. SSTable stands for Sorted Strings Table a concept borrowed from Google BigTable which stores a set of immutable row fragments in sorted order based on row keys. SSTable files of a column family are stored in its respective column family directory.

The data in a SSTable is organized in six types of component files.  The format of a SSTable component file is
<keyspace>-<column family>-[tmp marker]-<version>-<generation>-<component>.db

<keyspace> and <column family> fields represent the Keyspace and column family of the SSTable, <version> is an alphabetic string which represents SSTable storage format version,<generation> is an index number which is incremented every time a new SSTable is created for a column family and <component> represents the type of information stored in the file. The optional"tmp" marker in the file name indicates that the file is still being created. The six SSTable components are Data, Index, Filter, Summary, CompressionInfo and Statistics.

For example, I created a column family data in Keyspace usertable using cassandra-cli and inserted 1000 rows {user0, user1,...user999} with Cassandra version 1.2.5.

create keyspace usertable with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use usertable;
create column family data with comparator=UTF8Type;

The SSTables under cassandra/data/usertable/data directory: 

usertable-data-ic-1-CompressionInfo.db
usertable-data-ic-1-Data.db
usertable-data-ic-1-Filter.db
usertable-data-ic-1-Index.db
usertable-data-ic-1-Statistics.db
usertable-data-ic-1-Summary.db
usertable-data-ic-1-TOC.txt

In the above SSTable listing, the SSTable storage format version is ic, generation number is 1. The usertable-data-ic-1-TOC.txt contains the list of components for the SSTable.

Data file stores the base data of SSTable which contains the set of rows and their columns. For each row, it stores the row key, data size, column names bloom filter, columns index, row level tombstone information, column count, and the list of columns.  The columns are stored in sorted order by their names. Filter file stores the row keys bloom filter.

Index file contains the SSTable Index which maps row keys to their respective offsets in the Data file. Row keys are stored in sorted order and each row key is associated with an index entry which includes the position in the Data file where its data is stored. New versions of SSTable (version "ia" and above), promoted additional row level information from Data file to the index entry to improve performance for wide rows. A row's columns index, and its tombstone information are also included in its index entry. SSTable version "ic" also stores column names bloom filter in the index entry.



Summary file contains the index summary and index boundaries of the SSTable index. The index summary is calculated from SSTable index. It samples row indexes that are index_interval (Default index_interval is 128) apart with their respective positions in the index file. Index boundaries include the start and end row keys in the SSTable index.



CompressionInfo file stores compression metadata information that includes uncompressed data length, chuck size, and a list of the chunk offsets. Statistics file contains metadata for a SSTable. The metadata includes histograms for estimated row size and estimated column count. It also includes the partitioner used for distributing the key, the ratio of compressed data to uncompressed data and the list of SSTable generation numbers from which this SSTable is compacted. If a SSTable is created from Memtable flush then  the list of ancestor generation numbers will be empty.

All SSTable storage format versions and their respective Cassandra versions are described inhttps://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/Descriptor.java and the different components are described inhttps://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/Component.java

View the cache-hit ratio in Cassandra

As of version 1.2, the key cache and row cache are global to make better use of the off-heap functionality. Unfortunately, this means those stats are no longer available on the column family level.

These stats are available from the command line via nodetool info (last 2 lines of output).

e.g:


 [ec2-user@ip-172-31-45-42 ~]$ ./apache-cassandra-2.0.4/bin/nodetool info  
 Token      : (invoke with -T/--tokens to see all 256 tokens)  
 ID        : 0f694f54-4ec8-4730-a6d9-02c34bb1b847  
 Gossip active  : true  
 Thrift active  : true  
 Native Transport active: true  
 Load       : 3.48 MB  
 Generation No  : 1407836588  
 Uptime (seconds) : 2094  
 Heap Memory (MB) : 542.97 / 1014.00  
 Data Center   : datacenter1  
 Rack       : rack1  
 Exceptions    : 0  
 Key Cache    : size 1608 (bytes), capacity 52428800 (bytes), 83 hits, 101 requests, 0.822 recent hit rate, 14400 save period in seconds  
 Row Cache    : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds  

Friday, July 18, 2014

Deploying Cassandra Across Multiple Data Centers with Replication

Cassandra provides a highly scalable key/value storage that can be used for many applications. When Cassandra is to be used in production one might consider deploying it across multiple data centers for various reasons. For example, your current architecture is such that you update data in one data center and all the other data centers should have a replication of the same data but you are ok with eventual consistency.

In this post I discuss how can we deploy a Cassandra across two data centers(nodes) making sure every data center contains full copy of the complete data set.

Scenario:
My setup contains two nodes with valid public ip address. (e.g: for the sake of better explanation I assign two ip address to two nodes node1: 129.97.74.12, Node2:193.166.167.5 ). Now follow the next steps: 

Steps
Note that all these steps, except Step 4, must be followed in EACH AND EVERY node of the cluster. These steps are tested on Cassandra 2.0.4 version.

Step 1: Configure cassandra.yaml
Open up $CASSANDRA_HOME/conf/cassandra.yaml in your favorite test editor.
I assume you already familiar downloaded and configured Cassandra on each of the boxes in your data centers. Since most of the steps we are doing here should be done for each node in every data center, I encourage you to first look at my previous post about deploying a single node cassandra Here.

1.1) find the following lines and Change the seeds to the public ip address of the other nodes(at least one node per datacenter): 
e.g: 

      # Ex: "<ip1>,<ip2>,<ip3>"  
      - seeds: "129.97.74.12,193.166.167.5"  


1.2) change the listen address and rpc_address to public ip address of the node that you are setting up. e.g:
 # Setting this to 0.0.0.0 is always wrong.  
 listen_address: 193.166.167.5  
 rpc_address: 193.166.167.5  
 # port for Thrift to listen for clients on  
 rpc_port: 9160  


1.3) Change the endpoint_snitch line to PropertyFileSnitch. this tells the cassandra to check the cassandra-topology.properties for the topology of the nodes. 
# of the snitch, which will be assumed to be on your classpath.
endpoint_snitch: PropertyFileSnitch

Step 2: Configure Snitch File
Open up $CASSANDRA_HOME/conf/cassandra-topology.properties. Modify and add all nodes' ip address like:

 # Cassandra Node IP=Data Center:Rack  
 129.97.74.12=DC1:RAC1  
 193.166.167.5=DC2:RAC2 

#default for unknown nodes
default=DC1:RAC1
Step 4: Start Your Cluster
Goto $CASSANDRA_HOME on each node and type ./bin/cassandra -f to bring up the node. Once you do this in all the nodes, type./bin/nodetool -h localhost ring to make sure all the nodes are up and running.

Step 5: Create Data Model with Replication
We are almost there. Now you need to tell Cassandra to use this configuration for our data model. The easiest way to do is through cassandra-cli. Goto $CASSANDRA_HOME/bin and type cassandra-cli -p 9160 -h ip_address.

Now you need to create the keyspace with proper replication. Assuming your keyspace name is usertable type the following.

 create keyspace usertable with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:1,DC2:1}];  

Now create the your table or column family in cassandra with name data and insert a test data:

 use usertable;  
 create column family data with column_type = 'Standard' and comparator = 'UTF8Type';  
 ASSUME data validator AS UTF8Type;  
 ASSUME data keys AS UTF8Type;  
 ASSUME data comparator AS UTF8Type;  
 set data[2][1]='test';  


As you defined a replication policy of one copy per node at the time of creating your usertable you should now be able to see the inserted item('test') in all your node. so go to each node and type in cli:
use usertable; 
list data;

your data has to be shown like this:

 [default@hsntest] list data;  
 Using default limit of 100  
 Using default cell limit of 100  
 -------------------  
 RowKey: 32  
 => (name=1, value=74657374, timestamp=1405689037312000)  
 1 Row Returned.  
 Elapsed time: 84 msec(s).  

All Done now you have a two cluster with replicated data in each one. enjoy :)

Friday, January 24, 2014

Manual Installation and Configuration of a Single Node Cassandra on Amazon Linux AMI EC2

Before you start you need to be careful that you launch amazon ami instance, personally experienced it does not work for other instances like redhat for unknown reason. Port 9160 which is the thrift port for connecting to cassandra remain closed in the redhat after launching it.

Requirements:

  • jdk currently 1.7
  • Cassandra tar file currently version 2.0
Steps:
  • Install Java


  1. Downlaod and install jdk
  2. Install jdk
    • move to the folder contains jdk
    • rpm -Uvh jdk-7u51-linux-i586.rpm
    • Set Java_home : export JAVA_HOME=/usr/java/jdk1.7.0_4  

  • Install and Configure Cassandra
  1. download cassandra from Apache Cassandra  or run : 
  2. Create the casssandra necessary folders
      • sudo mkdir /var/lib/cassandra sudo mkdir /var/log/cassandra sudo mkdir /var/lib/cassandra/data sudo mkdir /var/lib/cassandra/saved_caches sudo mkdir /var/lib/cassandra/commitlog
      • set the permissions
        sudo chown -R ec2-user:ec2-user /var/log/cassandra/
        sudo chown -R ec2-user:ec2-user /var/lib/cassandra/
    • cd apache-cassandra-2.0.4/conf
    • vi cassandra.yaml or nano cassandra.yaml
    • change listen_address and rpc_address to your private address
    • change seed and broadcast_address to your public ip address(don't forget to remove the # from the begining of the broadcast_address line)
    • change the snitch to EC2Snitch
    • exit from the file
    • run cassandra: bin/cassandra -f