Beginner’s Guide To Hazelcast Part 1

Daryl MathisonOctober 10th, 2014Last Updated: October 9th, 2014

3 142 5 minutes read

Introduction

I am going to be doing a series on Hazelcast. I learned about this product from Twitter. They decided to follow me and after some research into what they do, I decided to follow them. I tweeted that Hazelcast would be a great backbone for a distributed password cracker. This got some interest and I decided to go make one. A vice president of Hazelcast started corresponding with me and we decided that while a cracker was a good project, the community (and me) would benefit from having a series of posts for beginners. I have been getting a lot of good information in the book preview The Book of Hazelcast found on www.hazelcast.com.

What is Hazelcast?

Hazelcast is a distributed, in-memory database. There are projects all over the world using Hazelcast. The code is open source under the Apache License 2.0.

Features

There are a lot of features already built into Hazelcast. Here are some of them:

Auto discovery of nodes on a network
High Availablity
In memory backups
The ability to cache data
Distributed thread pools
- Distributed Executor Service
The ability to have data in different partitions.
The ability to persist data asynchronously or synchronously.
Transactions
SSL support
Structures to store data:
- IList
- IMap
- MultiMap
- ISet
Structures for communication among different processes
- IQueue
- ITopic
Atomic Operations
- IAtomicLong
Id Generation
- IdGenerator
Locking
- ISemaphore
- ICondition
- ILock
- ICountDownLatch

Working with Hazelcast

Just playing around with Hazelcast and reading has taught me to assume these things.

The data will be stored as an array of bytes. (This is not an assumption, I got this directly from the book)
The data will go over the network.
The data is remote.
If the data is not in memory, it doesn’t exist.

Let me explain these assumptions:

The data will be stored as an array of bytes

I got this information from The Book of Hazelcast so it is really not an assumption. This is important because not only is the data stored that way, so is the key. This makes life very interesting if one uses something other than a primitive or a String as a key. The developer of hash() and equals() must think about it in terms of the key as an array of bytes instead of as a class.

The data will go over the network

This is a distributed database and so parts of the data will be stored in other nodes. There are also backups and caching that happen too. There are techniques and settings to reduce transferring data over the network but if one wants high availability, backups must be done.

The data is remote

This is a distributed database and so parts of the database will be stored on other nodes. I put in this assumption not to resign to the fact that the data is remote but to motivate designs that make sure operations are preformed where most of the data is located. If the developer is skilled enough, this can be kept to a minimum.

If the data is not in memory, it doesn’t exist

Do not forget that this is an in-memory database. If it doesn’t get loaded into memory, the database will not know that data is stored somewhere else. This database doesn’t persist data to bring it up later. It persists because the data is important. There is no bringing it back from disk once it is out of memory like a conventional database (MySQL) would do.

Data Storage

Java developers will be happy to know that Hazelcast’s data storage containers except one are extensions of the java.util.Collections interfaces. For example, an IList follows the same method contracts as java.util.List. Here is a list of the different data storage types:

IList – This keeps a number of objects in the order they were put in
IQueue – This follows BlockingQueue and can be used as alternative to a Message Queue in JMS. This can be persisted via a QueueStore
IMap – This extends ConcurrentMap. It can also be persisted by a MapStore. It also has a number of other features that I will talk about in another post.
ISet – The keeps a set of unique elements where order is not guaranteed.
MultiMap – This does not follow a typical map as there can be multiple values per key.

Example

Setup

For all the features that Hazelcast contains, the initial setup steps are really easy.

Download the Hazelcast zip file at www.hazelcast.org and extract contents.
Add the jar files found in the lib directory into one’s classpath.
Create a file named hazelcast.xml and put the following into the file

 <?xml version="1.0" encoding="UTF-8"?>
<hazelcast
xsi:schemaLocation ="http://www.hazelcast.com/schema/config
http://www.hazelcast.com/schema/config/hazelcast-config-3.0.xsd "
xmlns ="http://www.hazelcast.com/schema/config "
xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance">
    <network>
        <join><multicast enabled="true"/></join>
    </network>
     
    <map name="a"></map>
</hazelcast>

Hazelcast looks in a few places for a configuration file:

The path defined by the property hazelcast.config
hazelcast.xml in the classpath if classpath is included in the hazelcast.config
The working directory
If all else fails, hazelcast-default.xml is loaded witch is in the hazelcast.jar.
If one dose not want to deal with a configuration file at all, the configuration can be done programmatically.

The configuration example here defines multicast for joining together. It also defines the IMap “a.”

A Warning About Configuration

Hazelcast does not copy configurations to each node. So if one wants to be able to share a data structure, it needs to be defined in every node exactly the same.

Code

This code brings up two nodes and places values in instance’s IMap using an IdGenerator to generate keys and reads the data from instance2.

package hazelcastsimpleapp;
 
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IdGenerator;
import java.util.Map;
 
/**
 *
 * @author Daryl
 */
public class HazelcastSimpleApp {
 
    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        HazelcastInstance instance = Hazelcast.newHazelcastInstance();
        HazelcastInstance instance2 = Hazelcast.newHazelcastInstance();
         
        Map map = instance.getMap("a");
        IdGenerator gen = instance.getIdGenerator("gen");
        for(int i = 0; i < 10; i++) {
            map.put(gen.newId(), "stuff " + i);
        }
         
        Map map2 = instance2.getMap("a");
        for(Map.Entry entry: map2.entrySet()) {
            System.out.printf("entry: %d; %s\n", entry.getKey(), entry.getValue());
        }
         
        System.exit(0);
    }
     
}

Amazingly simple isn’t it! Notice that I didn’t even use the IMap interface when I retrieved an instance of the map. I just used the java.util.Map interface. This isn’t good for using the distributed features of Hazelcast but for this example, it works fine.

One can observe the assumptions at work here. The first assumption is storing the information as an array of bytes. Notice the data and keys are serializable. This is important because that is needed to store the data. The second and third assumptions hold true with the data being being accessed by the instance2 node. The fourth assumption holds true because every value that was put into the “a” map was displayed when read. All of this example can be found at http://darylmathisonblog.googlecode.com/svn/trunk/HazelcastSimpleApp using subversion. The project was made using Netbeans 8.0.

Conclusion

An quick overview of the numerous features of Hazelcast were reviewed with a simple example showing IMap and IdGenerator. A list of assumptions were discussed that apply when developing in a distributed, in-memory database environment.