Local installation of standalone HBase and Apache Storm simple cluster
We mainly use Apache Storm for streaming processing and Apache HBase as NoSQL wide-column database.
Even if Apache Cassandra is a great NoSQL database, we mostly prefer HBase because of Cloudera distribution and as it is more consistent (check CAP theorem) than Cassandra.
HBase is based on HDFS, but it can be easy installed as standalone for testing purposes. You just need to download latest version, extract compressed file, start standalone node and then start an HBase shell and play.
$> tar zxvf hbase-1.1.2-bin.tar.gz $> cd hbase-1.1.2/bin/ $> ./start-hbase.sh $> ./hbase shell hbase(main):001:0> create 'DummyTable', 'cf' hbase(main):001:0> scan 'DummyTable'
When you start HBase in standalone mode, then it automatically starts a local Zookeeper node too (running in default port 2181).
$> netstat -anp|grep 2181
Zookeeper is used by HBase and Storm as a distributed coordinator mechanism. Now, as you have already running a local Zookeeper node, then you are ready to configure and run a local Storm cluster.
- Download latest Storm
- Extract
- Configure “STORM_HOME/conf/storm.yaml” (check below)
- Start local cluster:
$> cd STORM_HOME/bin
$> ./storm nimbus
$> ./storm supervisor
$> ./storm ui
- Logs are located at “STORM_HOME/logs/” directory
- Check local Storm UI at: localhost:8080
Contents of new “storm.yaml” configuration file:
storm.zookeeper.servers: - "localhost" nimbus.host: "localhost" supervisor.slots.ports: - 6701 - 6702
You can also set parameter “worker.childopts” to set JVM options for each Worker (processing nodes). Here is a simple example for my local JVMs, where I set min/max heap size, garbage collection strategy, enable JXM and GC logs.
worker.childopts: "-server -Xms512m -Xmx2560m -XX:PermSize=128m -XX:MaxPermSize=512m -XX:+UseParallelOldGC -XX:ParallelGCThreads=3 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/tmp/gc-storm-worker-%ID%.log -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=1%ID% -XX:+PrintFlagsFinal -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true"
Parameter “worker.childopts” is loaded by all the Worker JVM nodes. Variable “%ID%” corresponds to port (6701 or 6702) assigned to each Worker. As you can see, I have used it to enable different JMX port for each worker and different GC log file.
We are using Storm using JDK 7, but JDK 8 seems to be compatible too. Latest Storm has switched from Logback to Log4j2 (check full release notes here and here).
Using the above instructions, you will be able to run HBase and Storm mini cluster in your laptop without any problem.
Reference: | Local installation of standalone HBase and Apache Storm simple cluster from our JCG partner Adrianos Dadis at the Java, Integration and the virtues of source blog. |