Tuesday, August 12, 2014

SSTable Format in Cassandra

Cassandra creates a new SSTable when the data of a column family in Memtable is flushed to disk. SSTable stands for Sorted Strings Table a concept borrowed from Google BigTable which stores a set of immutable row fragments in sorted order based on row keys. SSTable files of a column family are stored in its respective column family directory.

The data in a SSTable is organized in six types of component files.  The format of a SSTable component file is
<keyspace>-<column family>-[tmp marker]-<version>-<generation>-<component>.db

<keyspace> and <column family> fields represent the Keyspace and column family of the SSTable, <version> is an alphabetic string which represents SSTable storage format version,<generation> is an index number which is incremented every time a new SSTable is created for a column family and <component> represents the type of information stored in the file. The optional"tmp" marker in the file name indicates that the file is still being created. The six SSTable components are Data, Index, Filter, Summary, CompressionInfo and Statistics.

For example, I created a column family data in Keyspace usertable using cassandra-cli and inserted 1000 rows {user0, user1,...user999} with Cassandra version 1.2.5.

create keyspace usertable with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use usertable;
create column family data with comparator=UTF8Type;

The SSTables under cassandra/data/usertable/data directory: 

usertable-data-ic-1-CompressionInfo.db
usertable-data-ic-1-Data.db
usertable-data-ic-1-Filter.db
usertable-data-ic-1-Index.db
usertable-data-ic-1-Statistics.db
usertable-data-ic-1-Summary.db
usertable-data-ic-1-TOC.txt

In the above SSTable listing, the SSTable storage format version is ic, generation number is 1. The usertable-data-ic-1-TOC.txt contains the list of components for the SSTable.

Data file stores the base data of SSTable which contains the set of rows and their columns. For each row, it stores the row key, data size, column names bloom filter, columns index, row level tombstone information, column count, and the list of columns.  The columns are stored in sorted order by their names. Filter file stores the row keys bloom filter.

Index file contains the SSTable Index which maps row keys to their respective offsets in the Data file. Row keys are stored in sorted order and each row key is associated with an index entry which includes the position in the Data file where its data is stored. New versions of SSTable (version "ia" and above), promoted additional row level information from Data file to the index entry to improve performance for wide rows. A row's columns index, and its tombstone information are also included in its index entry. SSTable version "ic" also stores column names bloom filter in the index entry.



Summary file contains the index summary and index boundaries of the SSTable index. The index summary is calculated from SSTable index. It samples row indexes that are index_interval (Default index_interval is 128) apart with their respective positions in the index file. Index boundaries include the start and end row keys in the SSTable index.



CompressionInfo file stores compression metadata information that includes uncompressed data length, chuck size, and a list of the chunk offsets. Statistics file contains metadata for a SSTable. The metadata includes histograms for estimated row size and estimated column count. It also includes the partitioner used for distributing the key, the ratio of compressed data to uncompressed data and the list of SSTable generation numbers from which this SSTable is compacted. If a SSTable is created from Memtable flush then  the list of ancestor generation numbers will be empty.

All SSTable storage format versions and their respective Cassandra versions are described inhttps://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/Descriptor.java and the different components are described inhttps://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/Component.java

No comments:

Post a Comment