Apache Karaf meets Apache HBase
Introduction
Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google’s Bigtable. If you are a regular reader most probably you already know what Apache Karaf is, but for those who are not: Apache Karaf is an OSGi runtime that runs on top of any OSGi framework and provides you a set of services, a powerful provisioning concept, an extensible shell & more.
Since Apache HBase is not OSGi ready (yet), people that are developing OSGi applications often have a hard time understanding how to use HBase inside OSGi.
This post explains how you can build an OSGi application that uses HBase. Please note, that this post is not about running parts of HBase inside OSGi, but focuses on how to use the client api inside OSGi. As always I’ll be focusing on Karaf based containers, like Apache ServiceMix, Fuse ESB etc, but most of the things inside this post are generally applicable to all OSGi runtimes.
HBase and OSGi
Let’s have a closer look at HBase and explain some things about its relation with OSGi.
Bad news
- HBase provides no OSGi metadata, which means that you either need to wrap HBase yourself or find a 3rd party bundle for HBase.
- HBase comes in as a single jar.
- Uses Hadoop configuration.
The first point is pretty straightforward. The second point might not seem as bad news with a first glance, but if you give it some thought you will realize that when everything is inside a single jar things are not quite modular. For example the client api is inside the same jar, with the avro & thrift interfaces and even if you don’t need them, they will still be there. So that jar contains stuff that may be totally useless for your use case.
Please note, that the single jar statement does not refer to dependencies like Hadoop or Zookeeper.
The fact that is HBase depends on the Hadoop configuration loading mechanisms, is also bad news, because some versions of Hadoop are a bit itchy when running inside OSGi.
Good news
- There are no class loading monsters inside HBase, so you won’t be really bitten when you are trying to use the client api inside OSGi.
The challenges
So there are two types of challenges, the first is to find or create a bundle for HBase that will have requirements that make sense to your use case. The second is to load the hbase client configuration inside OSGi.
Finding a bundle for HBase
As far as I know, there are bundles for HBase provided by the Apache ServiceMix Bundles. However, the bundles that are currently provided, have more requirements in terms of required packages than they are actually needed (see bad news, second point). Providing a bundle with more sensible requirements is currently a work in progress, and hopefull will be released pretty soon.
In this port I am going to make use of the Pax Url Wrap Protocol. The wrap protocol will create on the fly OSGi metadata for any jar. Moreover, all package imports will be marked as optional, so you won’t have to deal with unnecessary requirements. This is something that can get you started, but its not recommended for use in a production environment. So you can use it in a P.O.C. but when its time to move to production, it might be a better idea to use a proper bundle.
Creating a Karaf feature descriptor for HBase
After experimenting a bit, I found that I could use HBase inside Karaf, by installing the bundles listed in the feature descriptor below:
<feature name="hbase" version="0.90.5" resolver="(obr)" start-level="50"> <feature>war</feature> <bundle dependency="true">mvn:org.apache.servicemix.specs/org.apache.servicemix.specs.jaxws-api-2.2/1.9.0</bundle> <bundle dependency="true">mvn:org.apache.servicemix.specs/org.apache.servicemix.specs.saaj-api-1.3/1.9.0</bundle> <bundle dependency="true">mvn:org.apache.geronimo.specs/geronimo-jta_1.1_spec/1.1.1</bundle> <bundle dependency="true">mvn:javax.mail/mail/1.4.5</bundle> <bundle dependency="true">mvn:commons-codec/commons-codec/1.6</bundle> <bundle dependency="true">mvn:org.apache.servicemix.bundles/org.apache.servicemix.bundles.commons-beanutils/1.8.3_1</bundle> <bundle dependency="true">mvn:commons-collections/commons-collections/3.2.1</bundle> <bundle dependency="true">mvn:commons-digester/commons-digester/2.1</bundle> <bundle dependency="true">mvn:commons-jxpath/commons-jxpath/1.3</bundle> <bundle dependency="true">mvn:org.apache.servicemix.bundles/org.apache.servicemix.bundles.jdom/1.1_4</bundle> <bundle dependency="true">mvn:commons-lang/commons-lang/2.6</bundle> <bundle dependency="true">mvn:org.apache.servicemix.bundles/org.apache.servicemix.bundles.ant/1.7.0_6</bundle> <bundle dependency="true">mvn:commons-configuration/commons-configuration/1.6</bundle> <bundle dependency="true">mvn:commons-daemon/commons-daemon/1.0.5</bundle> <bundle dependency="true">mvn:org.apache.servicemix.bundles/org.apache.servicemix.bundles.commons-httpclient/3.1_7</bundle> <bundle dependency="true">mvn:org.apache.commons/commons-math/2.2</bundle> <bundle dependency="true">mvn:commons-net/commons-net/3.1</bundle> <bundle dependency="true">mvn:org.codehaus.jackson/jackson-core-asl/1.9.7</bundle> <bundle dependency="true">mvn:org.codehaus.jackson/jackson-mapper-asl/1.9.7</bundle> <bundle>mvn:org.apache.servicemix.bundles/org.apache.servicemix.bundles.jetty/6.1.26_4</bundle> <bundle dependency="true">mvn:org.apache.zookeeper/zookeeper/3.3.5</bundle><bundle> mvn:org.apache.servicemix.bundles/org.apache.servicemix.bundles.hadoop-core/1.0.0_2</bundle> <bundle>wrap:mvn:org.apache.hbase/hbase/0.90.5</bundle> </feature>
In fact this feature descriptor is almost identical to the feature descriptor provided by the latest release of Apache Camel. One difference is the version of Apache Hadoop used. I preferred to use in this example a slightly lower version of Apache Hadoop, which seems to behave a bit better inside OSGi.
Creating HBase client configuration inside OSGi
The things described in this section may vary, depending on the version of the Hadoop jar, that you are using. I’ll try to provide a general solution that covers all cases.
Usually, when configuring the hbase client, you’ll just need to keep an hbase-site.xml inside your classpath. Inside OSGi this is not always enough. Some version of hadoop will manage to pick up this file, some others will not. In many cases hbase will complain that there is a version mismatch between the current version and the one found inside hbase-defatult.xml.
A workaround is to set the hbase.defaults.for.version to match your HBase version:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.zookeeper.quorum</name> <value>localhost</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.defaults.for.version</name> <value>${hbase.version}</value> </property> </configuration>
An approach that will save you in most cases, is to use set the hbase bundle classloader as the thread context class loader before creating the configuration object.
Thread.currentThread().setContextClassLoader(HBaseConfiguration.class.getClassLoader());
The reason I am proposing this, is that hbase will make use of the thread context classloader, in order to load resources (hbase-default.xml and hbase-site.xml). Setting the TCCL will allow you to load the defaults and override them later.
The snippet below shows how you can set the TCCL in order to load the defaults directly from the hbase bundle.
ClassLoader ocl = Thread.currentThread().getContextClassLoader(); try { Thread.currentThread().setContextClassLoader(HBaseConfiguration.class.getClassLoader()); Configuration conf = HBaseConfiguration.create(); } finally { Thread.currentThread().setContextClassLoader(ocl); }
Note, that when following this approach you will not need to include the hbase-site.xml inside your bundle. You will need to set the configuration programmatically.
Also note, In some cases HBase internal classes will recreate the configuration and this might cause you issues, if HBase can’t find the right classloader.
Thoughts
HBase is no different than almost any library that doesn’t provide out of the box support for OSGi. If you understand the basics of class loading, you can get it to work. Of course understanding class loaders is something that sooner or later will be of use, no matter if you are using OSGi or not.
The next couple of weeks, I intend to take HBase for a ride on the back of the camel, using the brand new camel-hbase component inside OSGi, so stay tuned.
Edit: The original post has been edited, as it contain a snippet, which I found out that its best to be avoided (sharing the HBase configuration as an OSGi service).
this is a very old version of hbase that is being used here. Does this information change for the newer version of hbase (ex 2.4.0)…?
I have tried wrapping the dependencies in karaf but still get a …hbase.client.ConnectionFactory not found error…