Skip to main content

Posts

Showing posts from October, 2012

Patch pig_cassandra for setting ttl to cassandra data

Apache pig provides a platform for analyzing very large data set. With apache pig you can easily analyze your data from Cassandra. Apache pig compiles instruction to sequences of Map-Reduce programs which will run on Hadoop cluster. Cassandra source provides a simple pig script to run pig with Cassandra data. Cassandra also provides CassandraStorage class which will load and store data from Cassandra DB, this class will no built in support for storing data with TTL (time to live). In many cases you have to update a few columns or rows with ttl to delete later automatically from DB. For that, i have patched the CassandraStorage class and add the similar functionality. Here is the patch Index: src/main/java/ru/atc/smev/cassandra/storage/CassandraStorage.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/main/java/ru/atc/smev/cassandra/storage/CassandraStorage.ja

Another Cassandra data manipulation api - PlayOrm

Recently i have found one interesting project on Github named PlayOrm , which features very impress me. I have decided to just play with it. Lets first check out there features list: Just added support for Entity has a Cursor instead of List which is lazy read to prevent out of memory on VERY wide rows PlayOrm Queries use way less resources from cassandra cluster than CQL queries Scalabla JQL(SJQL) supported which is modified JQL that scales(SQL doesn't scale well) Partitioning so you can query a one trillion row table in just ms with SJQL(Scalable Java Query Language) Typical query support of <=, <, >, >= and = and no limitations here Typical query support of AND and OR as well as parenthesis Inner Join support (Must keep your very very large tables partitioned so you get very fast access times here) Left Outer Join support Return Database cursor on query OneToMany, ManyToMany, OneToOne, and ManyToOne but the ToMany's are nosql fashion not like RDBMS