ElasticSearch Tutorial for Beginners
1. Introduction
In this example, we shall demonstrate how to make use of Elasticsearch, a distributed free-text search and analysis database engine based on Apache Lucene with a simple maven-based Java client.
We will be using the latest version of Elasticsearch, which is ES v6.1.2 while writing this post. For this example we use the following technologies:
- Maven 3
- Java 8
- Elasticsearch 6.1.2
Elasticsearch is very well known due to its capability of communication over RESTful APIs. This means that we will be using APIs to interact with the database along with the HTTP methods like GET, POST, PUT and DELETE. It is a highly scalable distributed database which provides an excellent implementation with Apache Lucene. Some more features about Elasticsearch are:
- With the total dependency size of only around 300 KB, Elasticsearch is very lightweight
- Elasticsearch is focused solely on the performance of the queries. This means that whatever operations are done with the database, they are highly optimised and scalable
- It is a highly fault-tolerant system. If a single Elasticsearch node dies in a cluster, the master server is very quick in identifying the issue and routes the incoming requests to a new node as fast as possible
- Elasticsearch’s speciality lies in indexable text data which can be searched on the basis of tokens and filters
Although Elasticsearch is a great candidate when it comes to distributed free-text search and analysis engine, it might not be the best-suited database when it comes to doing some other operations like:
- Counting operations like total and average
- Executing transactional queries with rollbacks
- Managing records which will be unique across multiple given terms
This means that Elasticsearch is a highly use-case based database but is an excellent one when it comes to its own domains.
2. Prerequisites
You must have installed Java on your computer in order to proceed because maven is a Java tool. You can download Java here.
Once you have Java installed on your system, you must install maven. You can download Maven from here.
Finally, you need to install Elasticsearch. You can download it from here and follow the steps for your OS. Note that we will be using v6.1.2 for this lesson. Other versions might now work exactly the same way. You can verify that ES is running by opening this URL in your browser:
localhost:9200
You should get a response like:
{ "name": "wKUxRAO", "cluster_name": "elasticsearch", "cluster_uuid": "gvBXz7xsS5W4zlZuiADelw", "version": { "number": "6.1.2", "build_hash": "5b1fea5", "build_date": "2018-01-10T02:35:59.208Z", "build_snapshot": false, "lucene_version": "7.1.0", "minimum_wire_compatibility_version": "5.6.0", "minimum_index_compatibility_version": "5.0.0" }, "tagline": "You Know, for Search" }
Note that the elasticsearch
is the default cluster name in Elasticsearch.
3. Project Setup
We will be using one of the many Maven archetypes to create a sample project for our example. To create the project execute the following command in a directory that you will use as workspace:
mvn archetype:generate -DgroupId=com.javacodegeeks.example -DartifactId=jcg-elasticsearch-example -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
If you are running maven for the first time, it will take a few seconds to accomplish the generate command because maven has to download all the required plugins and artifacts in order to make the generation task.
Notice that now, you will have a new directory with the same name as the artifactId
inside the chosen directory. Now, feel free to open the project in your favourite IDE.
4. Maven Dependencies
To start with, we need to add appropriate Maven dependencies to our project. We will add the following dependency to our pom.xml file:
pom.xml
<properties> <elasticsearch.version>6.1.2</elasticsearch.version> <jackson.version>2.9.4</jackson.version> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>${elasticsearch.version}</version> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>${elasticsearch.version}</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>${jackson.version}</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>${jackson.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies>
Find the latest Elasticsearch dependency here.
Note that we used Jackson only as the standard JSON library for Java in our code.
5. Making Database Queries
Now, we’re ready to start building our project and add more components to it.
5.1 Making a Model
We will start by adding a very simple model in our project, a Person. Its definition will be very standard, like:
Person.java
public class Person { private String personId; private String name; //standard getters and setters @Override public String toString() { return String.format("Person{personId='%s', name='%s'}", personId, name); } }
We omitted standard getters and setters for brevity but they are necessary to be made as Jackson uses them during Serialization and Deserialization of an Object.
5.2 Defining Connection parameters
We will use default connection parameters for making a connection with Elasticsearch. By default, ES uses two ports: 9200 and 9201.
Connection Parameters
//The config parameters for the connection private static final String HOST = "localhost"; private static final int PORT_ONE = 9200; private static final int PORT_TWO = 9201; private static final String SCHEME = "http"; private static RestHighLevelClient restHighLevelClient; private static ObjectMapper objectMapper = new ObjectMapper(); private static final String INDEX = "persondata"; private static final String TYPE = "person";
Apart from connection configuration params, we also defined the index params above to identify where our Person data is saved.
As mentioned in parameters above, Elasticsearch uses two ports, 9200 and 9201. The first port, 9200 is used by the Elasticsearch Query Server with which we can query the database directly through the RESTful APIs. The second port, 9201 is used by the REST server with which external clients can connect and perform operations.
5.3 Making a connection
We will make a method to establish the connection with the Elasticsearch Database. While making a connection to the Database, we must provide both the ports because only this way, our application will be able to connect to Elasticsearch server and we will be able to perform the database operations. Here is the code to make a connection:
Singleton method for getting Connection Object
/** * Implemented Singleton pattern here * so that there is just one connection at a time. * @return RestHighLevelClient */ private static synchronized RestHighLevelClient makeConnection() { if(restHighLevelClient == null) { restHighLevelClient = new RestHighLevelClient( RestClient.builder( new HttpHost(HOST, PORT_ONE, SCHEME), new HttpHost(HOST, PORT_TWO, SCHEME))); } return restHighLevelClient; }
Note that we have implemented Singleton Design pattern here so that multiple connections aren’t made for ES which saves a lot of memory.
Due to the presence of RestHighLevelClient, the connection to Elasticsearch is thread-safe. The best time to initialise this connection will be at application request or when the first request is made to the client. Once this connection client is initialised, it can be used to perform any supported APIs.
5.4 Closing a connection
Just like in older versions of Elasticsearch, we used TransportClient and we closed it once we were done with our queries, it is also necessary to close a connection once the Database interaction is complete with RestHighLevelClient as well. Here is how this can be done:
Close Connection
private static synchronized void closeConnection() throws IOException { restHighLevelClient.close(); restHighLevelClient = null; }
We assigned null to RestHighLevelClient object as well so that Singleton pattern can stay consistent.
5.5 Inserting Data
We can insert data into the Database by converting the keys and values to a Hashmap. ES Database only accepts values in the form of a HashMap. Let’s see the code snippet on how this can be achieved:
POST Query
private static Person insertPerson(Person person){ person.setPersonId(UUID.randomUUID().toString()); Map<String, Object> dataMap = new HashMap<String, Object>(); dataMap.put("personId", person.getPersonId()); dataMap.put("name", person.getName()); IndexRequest indexRequest = new IndexRequest(INDEX, TYPE, person.getPersonId()) .source(dataMap); try { IndexResponse response = restHighLevelClient.index(indexRequest); } catch(ElasticsearchException e) { e.getDetailedMessage(); } catch (java.io.IOException ex){ ex.getLocalizedMessage(); } return person; }
Above, we used Java’s UUID class to create a unique identifier of the object as well. This way, we can control how the identifiers of an object are made.
5.6 Making a GET request
Once we are done with inserting data into the Database, we can confirm the operation by making a GET request to the Elasticsearch Database server. Let’s see the code snippet on how this can be done:
GET Query
private static Person getPersonById(String id){ GetRequest getPersonRequest = new GetRequest(INDEX, TYPE, id); GetResponse getResponse = null; try { getResponse = restHighLevelClient.get(getPersonRequest); } catch (java.io.IOException e){ e.getLocalizedMessage(); } return getResponse != null ? objectMapper.convertValue(getResponse.getSourceAsMap(), Person.class) : null; }
In this query, we just provided the main information about the object with which it can be identified, i.e., the Index, the Type and its unique identifier. Also, what we get back is actually a Map of values, as expressed by this expression:
Getting Map
getResponse.getSourceAsMap()
It is actually the Jackson’s objectMapper which is used to convert this Map to a POJO Object which can be easily used in our program and this way, we don’t have to each key form the Map, which will be a tedious process when you can simply have a POJO object.
5.7 Updating Data
We can make an Update request to Elasticsearch easily by first identifying the resource with its Index, Type and unique identifier. Then we can use a new HashMap object to update any number of values in the Object. Here is an example code snippet:
PUT Query
private static Person updatePersonById(String id, Person person){ UpdateRequest updateRequest = new UpdateRequest(INDEX, TYPE, id) .fetchSource(true); // Fetch Object after its update try { String personJson = objectMapper.writeValueAsString(person); updateRequest.doc(personJson, XContentType.JSON); UpdateResponse updateResponse = restHighLevelClient.update(updateRequest); return objectMapper.convertValue(updateResponse.getGetResult().sourceAsMap(), Person.class); }catch (JsonProcessingException e){ e.getMessage(); } catch (java.io.IOException e){ e.getLocalizedMessage(); } System.out.println("Unable to update person"); return null; }
Notice what we did above in the following statement:
PUT Query
updateRequest.doc(personJson, XContentType.JSON);
Here, we didn’t passed any specific property of the object which needs to be updated, instead, we passed complete Object JSON which will replace every key present for that Object.
We also checked for any possible errors through the catch statements. In a real-world application, you will want to handle these errors gracefully and make documented logs.
5.8 Deleting Data
Finally, we can delete data by simply identifying the resource with its Index, Type and unique identifier. Let’s see the code snippet on how this can be done:
DELETE Query
private static void deletePersonById(String id) { DeleteRequest deleteRequest = new DeleteRequest(INDEX, TYPE, id); try { DeleteResponse deleteResponse = restHighLevelClient.delete(deleteRequest); } catch (java.io.IOException e){ e.getLocalizedMessage(); } }
Again, in the DELETE Query above, we just mentioned how we can identify an object.
5.9 Running the application
Let’s try our application by performing all the operations we mentioned above. As this is a plain Java application, we will call each of these methods and print the operation results:
main() method
public static void main(String[] args) throws IOException { makeConnection(); System.out.println("Inserting a new Person with name Shubham..."); Person person = new Person(); person.setName("Shubham"); person = insertPerson(person); System.out.println("Person inserted --> " + person); System.out.println("Changing name to `Shubham Aggarwal`..."); person.setName("Shubham Aggarwal"); updatePersonById(person.getPersonId(), person); System.out.println("Person updated --> " + person); System.out.println("Getting Shubham..."); Person personFromDB = getPersonById(person.getPersonId()); System.out.println("Person from DB --> " + personFromDB); System.out.println("Deleting Shubham..."); deletePersonById(personFromDB.getPersonId()); System.out.println("Person Deleted"); closeConnection(); }
Once we run this application with the code, we will get the following output:
Program Output
Inserting a new Person with name Shubham... Person inserted --> Person{personId='bfc5ba80-832a-4925-9b8d-525a4e420cb0', name='Shubham'} Changing name to `Shubham Aggarwal`... Unable to update person Person updated --> Person{personId='bfc5ba80-832a-4925-9b8d-525a4e420cb0', name='Shubham Aggarwal'} Getting Shubham... Person from DB -->Person{personId='bfc5ba80-832a-4925-9b8d-525a4e420cb0', name='Shubham Aggarwal'} Deleting Shubham... Person Deleted
Of course, the IDs can vary. Note that we closed the connection after we are done with the queries. This helps JVM to claim back the memory which was held by the ES connection.
6. Conclusion
In this lesson, we studied how we can use Elasticsearch along with a plain Java client which uses a REST client. Choosing to use the REST client for making it usable in a real-world application needs to be explored with a scalable example. It is a choice we need to make while we start architecting an application.
Explore much more about Elasticsearch in our Elasticsearch course.
7. Download the Complete Source Code
This was a tutorial on ElasticSearch REST client and queries with Java where we interacted with the Elasticsearch Database via the RESTful operations.
You can download the full source code of this example here: Elasticsearch Example
Code is not working im getting error Exception in thread “main” java.lang.Error: Unresolved compilation problem:
The method makeConnection() from the type Application refers to the missing type RestHighLevelClient
at com.javacodegeeks.example.Application.main(Application.java:117)
im Using Pom.xml ie Elastic version 6.5.1
Please can you help in ver 6.5.1 which is different from version 6.1.2
Hi MOHD AFEEF, is there a particular reason you are using version 6.5.1? The thing is, Elasticsearch version updates are usually major and there are different methods to access the ES API in even minor version updates. This lesson specifically covers version 6.1.2.
i have installed new version ie 6.5.1 according to your ver import org.elasticsearch.ElasticsearchException;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse; eclipse is throwing error
The import org.elasticsearch.ElasticsearchException cannot be resolved
I have tested and found
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
we cannot import both package in 6.5.1
https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/6.5.1/jar
when maven compile the pom application.java will throw error in RestHighLevelClient import package
HI Shubham,
I make the code run, but I get an error
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property ‘org.apache.logging.log4j.simplelog.StatusLogger.level’ to TRACE to show Log4j2 internal initialization logging.
Concerning IndexCreation is there a possibility to provide a mapping and settings?
Thanks in advance,
Hello Im getting warningh in Line
getResponse = restHighLevelClient.get(getPersonRequest);
//The method get(GetRequest, Header…) from the type RestHighLevelClient is deprecated
please can you give any solution.
Thank you for pointing out: objectMapper.convertValue that is a much easier way to map the object than doing by hand
Hi,
I want to use Spring data version 3.0.5 with Elastic Search version 6.x. Can I use it>
Thank you.
Hello Shubham,
Very good article to get to perform CRUD operations on ElasticSearch.
Here I am trying to get the Person by his name. Could you provide the API to get the list of persons who have same name.
Following query working to fetch the data from ES:
GET /persondata/person/_search
{
“query”: {
“bool”: {
“must”: [
{ “match”: { “name”: “xyz” }}
]
}
}
}
How can I add conditions in GetRequest?
For example, if I want to search a person by name,
If elastic search is hosted somewhere else and needs to access then what should i do.? where should i setup the cluster name. i am not using the spring just simple jsp and servlet + maven and i want to use this elastic search.
what are the configuration needs to set ?
Please help with this..
and thank you so much for this example. :)
It is hard to find a lot of good information on how to code with elasticsearch. This is great tutorial, thank you very much for your help on the topic.
One minor comment : Line 14 for the method ‘updatePersonById’ should say ‘System.out.println(“Updated person successfully”);’