Apache Cassandra – Part 05 (Data Types)

From the last post we talked about Cassandra Query Language, CQL. There we met some data types when we are dealing with Column Families. But we didn’t go in to deep about data types there. So let’s get some knowledge about Cassandra data types from this post.

There are 5 main categories in Cassandra data types. They are,

  • Native Type
  • Collection Type
  • User Defined Type
  • Tuple Type
  • Custom Type

Let’s have a close look at each one of them.

NATIVE TYPE

When talking about the native data types, the are built in simple data types. Even in the last post we use few of them such as text, int, etc. Here is a list of native data types in Cassandra.

1.png

Blob type

The Cassandra blob data type represents a constant hexadecimal number defined as 0[xX](hex)+ where hex is an hexadecimal character, such as [0-9a-fA-F]. For example, 0xcafe. The maximum theoretical size for a blob is 2GB. The practical limit on blob size, however, is less than 1 MB, ideally even smaller. A blob type is suitable for storing a small image or short string.

CREATE TABLE bios ( user_name varchar PRIMARY KEY,

bio blob

);

INSERT INTO bios (user_name, bio) VALUES (‘fred’, bigintAsBlob(3));

SELECT * FROM bios;

user_name | bio
———–+——————–
fred | 0x0000000000000003

Counter type

A counter column value is a 64-bit signed integer. You cannot set the value of a counter, which supports two operations: increment and decrements. Do not assign this type to a column that serves as the primary key or partition key. Also, do not use the counter type in a table that contains anything other than counter types and the primary key. To generate sequential numbers for surrogate keys, use the timeuuid type instead of the counter type. You cannot create an index on a counter column or set data in a counter column to expire using the Time-To-Live (TTL) property.

CREATE TABLE counterks.page_view_counts(

counter_value counter,

url_name varchar,

page_name varchar,

PRIMARY KEY (url_name, page_name));
UPDATE counterks.page_view_counts

SET counter_value = counter_value + 1

WHERE url_name=’www.datastax.com’ AND page_name=’home’;
SELECT * FROM counterks.page_view_counts;
url_name | page_name | counter_value
——————+———–+—————
http://www.datastax.com | home | 1
UPDATE counterks.page_view_counts

SET counter_value = counter_value + 2

WHERE url_name=’www.datastax.com’ AND page_name=’home’;
url_name | page_name | counter_value
——————+———–+—————
http://www.datastax.com | home | 3

UUID and timeuuid types

The UUID (universally unique id) comparator type is used to avoid collisions in column names. Alternatively, you can use the timeuuid.

Timeuuid types can be entered as integers for CQL input. A value of the timeuuid type is a Version 1 UUID. A Version 1 UUID includes the time of its generation and are sorted by timestamp, making them ideal for use in applications requiring conflict-free timestamps. For example, you can use this type to identify a column (such as a blog entry) by its timestamp and allow multiple clients to write to the same partition key simultaneously. Collisions that would potentially overwrite data that was not intended to be overwritten cannot occur.

A valid timeuuid conforms to the timeuuid format shown in valid literals.

Timestamp type

Values for the timestamp type are encoded as 64-bit signed integers representing a number of milliseconds since the standard base time known as the epoch: January 1 1970 at 00:00:00 GMT. A timestamp type can be entered as an integer for CQL input, or as a string literal in any of the following ISO 8601 formats:

yyyy-mm-dd HH:mm
yyyy-mm-dd HH:mm:ss
yyyy-mm-dd HH:mmZ
yyyy-mm-dd HH:mm:ssZ
yyyy-mm-dd’T’HH:mm
yyyy-mm-dd’T’HH:mmZ
yyyy-mm-dd’T’HH:mm:ss
yyyy-mm-dd’T’HH:mm:ssZ
yyyy-mm-dd
yyyy-mm-ddZ
where Z is the RFC-822 4-digit time zone, expressing the time zone’s difference from UTC. For example, for the date and time of Jan 2, 2003, at 04:05:00 AM, GMT:

2011-02-03 04:05+0000
2011-02-03 04:05:00+0000
2011-02-03T04:05+0000
2011-02-03T04:05:00+0000
If no time zone is specified, the time zone of the Cassandra coordinator node handing the write request is used. For accuracy, DataStax recommends specifying the time zone rather than relying on the time zone configured on the Cassandra nodes.

Collection Type

A collection column is declared using the collection type, followed by another type, such as int or text, in angle brackets. For example, you can create a table having a list of textual elements, a list of integers, or a list of some other element types.

MAP

  • First let’s add a new column to the table with map data type.

map<native-data-type-1,native-data-type-2,…>

9

  • Let’s insert data to the map.

{‘key-1′:’value-1′,’key-2′:’value-2’,…}

11

  • Let’s retrieve data.

12

SET

  • Let’s create a new column with key tag.

set

15

  • Insert data

{‘value-1′,’value-2’,…}

16

  • Retrieve data

17

  • Update data

= +{‘value-3′,’value-4’,…}

18

19

LIST

  • Create a new column for the table with LIST

list

20

  • Insert data

[,,…]

21

  • Retrieve data

22

  • Update data

= +[,,…]

23

24

USER DEFINED TYPE

  • Let’s create a new data type

CREATE TYPE (

,

,

….

);

3

  • We can check the created data type by,

DESCRIBE TYPE

DESCRIBE TYPES

4

  • Now we can create a new table with the user defined data type.

5

  • Let’s insert data

{‘key-one’:’value-one’,’key-two’:’value-two’,…}

6

  • Retrieve data

7

  • Also we can get data from the key as well

8

TUPLE TYPE

Cassandra 2.1 introduces the tuple type that holds fixed-length sets of typed positional fields. You can use a tuple as an alternative to a user-defined type when you don’t need to add new fields. A tuple can accommodate many fields (32768), more than you can prudently use. Typically, you create a tuple having only a few fields.

In the table creation statement, use angle brackets and a comma delimiter to declare the tuple component types. Surround tuple values in parentheses to insert the values into a table, as shown in this example.

CREATE TABLE collect_things (

k int PRIMARY KEY,

v <tuple<int, text, float>>

);

INSERT INTO collect_things (k, v) VALUES(0, (3, ‘bar’, 2.1));

SELECT * FROM collect_things;

k | v
—+—————–
0 | (3, ‘bar’, 2.1)

CUSTOM TYPE

I’m not going to talk about this data type in detail here. Following note was added on the Cassandra official documentation about this data type.

Custom types exists mostly for backward compatiliby purposes and their usage is discouraged. Their usage is complex, not user friendly and the other provided types, particularly user-defined types, should almost always be enough.

Hope now you have a clear idea about Cassandra data types. Hope to see you soon with another interesting topic. Thank You!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s