SOLR cloud 7.4 cluster configuration with an external Zookeeper ensemble and using SOLRJ API to access the data

Chaitanya RudrabhatlaAugust 13th, 2018Last Updated: August 21st, 2018

2 621 6 minutes read

SOLR is one of the most popular and highly scalable search engine which runs on the distributed indexing technology. Solr indexes can be built pretty much on top of data from any kind of data source- CSV data or XML data or data extracted from an RDBMS database or a standard file system.

For any web application which is built on RDBMS database as the backend, if a search needs to be performed on a table with millions of rows or if a query needs to be executed which joins multiple tables, it might take a large amount of time to get the response. Such kind of backend services make the website extremely slow. SOLR indexing can be a useful solution in these cases. SOLR can store the data in the form of reverse index documents containing multiple fields, each with a name and value. A single instance of SOLR is usually sufficient for small to medium sized databases. In case of large databases where queries need to be executed on billions of rows, it needs a distributed indexing solution where the indexes needs be distributed in multiple shards and clusters. SOLR cloud is designed for this purpose. But managing SOLR cloud’s nodes, shards and replicas is a huge task that cannot be done manually. Pairing with an external zookeeper cluster can help in SOLR cloud management by routing the queries to right solr instance along with the other benefits like load balancing and fault tolerance.

However, setting up a SOLR cloud cluster with an external Zookeeper ensemble is quite complex and might appear to be a daunting task for the developers. In this article, we are going to discuss the solr cloud setup and implementation with Zookeeper cluster in simple steps along with necessary code snippets and screenshots. We are going to create multiple shards of SOLR and operate them through zookeeper. Later the setup is tested through a spring boot micro service using SOLRJ APIs. SOLRJ is an API which helps java applications to communicate with SOLR and execute the queries. I have used Java 8 for JDK and Eclipse as the IDE in the example shown below.

1. Zookeeper setup

Here are the step by step instructions to setup zookeeper ensemble :

- Download the latest Zookeeper from the URL https://zookeeper.apache.org/releases.html
- Copy the download folders to Dev location for Solr and Zookeeper. In my case I have uploaded it to my dev server at the path /opt/user_projects/poc/solrpoc
- Once we have Zookeeper downloaded, navigate to the conf folder under the path. In this article we are creating 3 instances of zookeeper on the same server. In the real world, these 3 instances run on 3 different servers.
- Navigate to /opt/user_projects/poc/solrpoc/zookeeper-3.4.12/conf/, and add 3 conf files (zoo.conf,zoo2.conf,zoo3.conf).

SOLR cloud - Zookeeper config — Zookeeper config

- In each conf file, update the dataDir location as
  dataDir=/opt/user_projects/poc/tmp/1 (this should be a different sequence number for all three conf files).
- In each conf file , enter the server and port information of the 3 zookeeper instances like the following :

server.1=YourServerName:2888:3888
server.2= YourServerName:2889:3889
server.3= YourServerName:2890:3890

Create 3 folders at the respective locations mentioned in the dataDir property in the conf files above. (/opt/user_projects/poc/tmp/1 , /opt/user_projects/poc/tmp/2, /opt/user_projects/poc/tmp/3 ).
In each of those folders created, make a new file and name it ‘myid’ and enter the sequence number (1 or 2 or 3) as per the folder name.
With that the Zookeeper configuration is done.

2. SOLR Cloud setup

Now let’s start the Solr cloud configuration.

Download the latest Solr from the URL http://lucene.apache.org/solr/downloads.html
Navigate to the server directory under solr installation folder and create 4 solr folders in it. In my case, it is /opt/user_projects/poc/solrpoc/solr-7.4.0/server': solr, solr2, solr3, solr4 as shown in the image below.

Each of solr folder which is created above should have solr.xml and the port has to be assigned in that file as shown below.
${jetty.port:8993}
Also you should have a configsets in the same folder. Which should have data_driven_schema_configs if you want to use database.
After modifying the ports. Solr setup is pretty much ready.

3. Start the Zookeeper

Make sure you had set JAVA_HOME before starting Zookeeper.
Prepare start and stop scripts for Zookeeper and place them at /opt/user_projects/poc/solrpoc/zookeeper-3.4.12/startZookeeper.sh and stopZookeeper.sh as shown below.

SOLR start script

#!/bin/sh
echo "-----------------------------------"
echo "Starting all Solr Instances"

source /opt/sun_jdk/jdkversion/jdkversion.conf

bin/solr start -Duser.timezone="America/Los_Angeles" -c -s server/solr -p 8993 -z yourServer:8997,yourServer:8998,yourServer:8999 -noprompt

bin/solr start -Duser.timezone="America/Los_Angeles" -c -s server/solr2 -p 8994 -z yourServer:8997,yourServer:8998,yourServer:8999 -noprompt

bin/solr start -Duser.timezone="America/Los_Angeles" -c -s server/solr3 -p 8995 -z yourServer:8997,yourServer:8998,yourServer:8999 -noprompt

bin/solr start -Duser.timezone="America/Los_Angeles" -c -s server/solr4 -p 8996 -z yourServer:8997,yourServer:8998,yourServer:8999 -noprompt

echo ""
echo "Started all Solr Instances"
echo "---------------------------------"

Prepare start and stop scripts for Solr and place them at /opt/user_projects/poc/solrpoc/solr-7.4.0/startSolr.sh
When you execute this script, solr starts running on the ports specified.

SOLR cloud - Solr Console — Solr Console

4. Setting up Collections

- Once SOLR is running, make sure you have the database related jar copied to dist and mentioned that dependency in solrconfig.xml. In this case we are using ojdbc14.jar at /opt/user_projects/poc/solrpoc/solr-7.4.0/dist
- For creating a first collection, open the terminal, navigate to the solr bin location and execute the below command.
  ./solr create -c UserSearchCloud -d data_driven_schema_configs -n UserSearchCloud -s 2 -p 8993 -rf 2
  s– number of shards, rf- replication factors, 8993 – port of any one of the solr node that we had setup earlier(we had setup 4 solr instances)
- UserSearchCloud is the collection name and the configset name which will be created from data_driven_schema_configs (which is given by me). If you see the folder structure for configs it will be like below.

SOLR cloud - Solr Collection — Solr Collection

After executing the create command mentioned above, you can go to the Solr Admin UI and see the collection like below.

Once after creating collection we can run the Dataimport like below. Click on Execute.

SOLR cloud - Data Import — Solr Data Import

With this the SOLR cloud setup is complete.

5. SPRING BOOT client with SOLRJ

We will now discuss how to test the SOLR cluster and query the data using SOLRJ APIs in a spring boot based micro service. I have provided my github link which gives the entire project code.

Create a new spring boot project with the following structure.

SOLR cloud - Project Structure — Project Structure

- Configure the gradle dependencies to include SOLRJ library. You can look for the file in the full project link provided at the bottom.
- Create a Java class called SolrUtil, in which the the zookeeper connection is made, as given below.

SOLRJ util to connect to Zookeeper

@Service
public class SolrUtil {
	
	CloudSolrClient solrClient;
	
	@SuppressWarnings("deprecation")
	public CloudSolrClient createConnection(){
		//You need to replace SERVERNAME with the server on which the zookeeper is running
		String zkHostString = "SERVERNAME:8997,SERVERNAME:8998,SERVERNAME:8999"; //- DEV
		if(solrClient == null){
			solrClient = new CloudSolrClient.Builder().withZkHost(zkHostString).build();
		}
		return solrClient;
	}
	
	public SolrDocumentList getSolrResponse(SolrQuery solrQuery, String collection, CloudSolrClient solrClient) {
		QueryResponse response = null;
		SolrDocumentList list = null;
		try {
			QueryRequest req = new QueryRequest(solrQuery);
			solrClient.setDefaultCollection(collection);
			response = req.process(solrClient);
			list = response.getResults();
		} catch (Exception e) {
			e.printStackTrace();//handle errors in this block
		}
		return list;
	}
}

Now create a SolrSearchService which can invoke the queries or update the document or delete in SOLR as shown below.

SOLRJ Service to CRUD Solr documents

@Service
public class SolrSearchService {

	@Autowired
	SolrUtil solrUtil;

	private static final String collection = "UserSearchCloud";

	public ResponseVO search(SearchRequestVO requestVO) {
		CloudSolrClient solrClient = solrUtil.createConnection();
		String query = requestVO.getQuery();
		SolrQuery solrQuery = new SolrQuery();
		solrQuery.setQuery(query);
		solrQuery.setRows(50);
		solrQuery.set("collection", collection);
		solrQuery.set("wt", "json");
		SolrDocumentList documentList = solrUtil.getSolrResponse(solrQuery, collection, solrClient);
		ResponseVO responseVO = new ResponseVO();
		if(documentList != null && documentList.size() >0){
			responseVO.setDocumentList(documentList);
			responseVO.setMessage("Success");
		}else{
			responseVO.setMessage("Failure");
			responseVO.setErrorMessage("Records Not Found");
		}
		return responseVO;
	}

	public ResponseVO update(UpdateRequestVO requestVO) {
		CloudSolrClient solrClient = solrUtil.createConnection();
		UpdateResponse response = new UpdateResponse();
		
		SolrDocument sdoc1 = null;
		String id = requestVO.getId();
		solrClient.setDefaultCollection(collection);
		SolrInputDocument sdoc = new SolrInputDocument();
		try {
			sdoc1 = solrClient.getById(id);
		} catch (SolrServerException e1) {
			e1.printStackTrace();
		} catch (IOException e1) {
			e1.printStackTrace();
		}
		if(sdoc1 != null){
			sdoc.setField("FIRST_NAME",requestVO.getFirstName() != null ? requestVO.getFirstName() : sdoc1.get("FIRST_NAME"));
			sdoc.setField("WORK_EMAIL",requestVO.getWorkEmail() != null ? requestVO.getWorkEmail() : sdoc1.get("WORK_EMAIL"));
			sdoc.setField("LAST_NAME",requestVO.getLastName() != null ? requestVO.getLastName() : sdoc1.get("LAST_NAME"));
			sdoc.setField("ADDRESS1",requestVO.getAddress1() != null ? requestVO.getAddress1() : sdoc1.get("ADDRESS1"));
			sdoc.setField("ADDRESS2",requestVO.getAddress2() != null ? requestVO.getAddress2() : sdoc1.get("ADDRESS2"));
			sdoc.setField("PHONE1",requestVO.getPhone1() != null ? requestVO.getPhone1() : sdoc1.get("PHONE1"));
			sdoc.setField("JOB_TITLE",requestVO.getJobTitle() != null ? requestVO.getJobTitle() : sdoc1.get("JOB_TITLE"));
			sdoc.setField("COMPANY_NAME",requestVO.getCompanyName() != null ? requestVO.getCompanyName() : sdoc1.get("COMPANY_NAME") );
			sdoc.setField("CITY",requestVO.getCity() != null ? requestVO.getCity() : sdoc1.get("CITY"));
			sdoc.setField("PHONE2",requestVO.getPhone2() != null ? requestVO.getPhone2() : sdoc1.get("PHONE2"));
			sdoc.setField("USER_NAME",requestVO.getUserName() != null ? requestVO.getUserName() : sdoc1.get("USER_NAME"));
			sdoc.setField("id",sdoc1.get("id"));
			sdoc.setField("_version_","0");
			try {
				solrClient.add(sdoc);
				response = solrClient.commit();
			} catch (SolrServerException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		ResponseVO responseVO = new ResponseVO();
		if(response != null && response.getResponse() != null){
			responseVO.setMessage("Document Updated");
		}else{
			responseVO.setErrorMessage("Document Not Found");
		}
		return responseVO;
	}
	
	public ResponseVO delete(DeleteRequestVO requestVO) {
		CloudSolrClient solrClient = solrUtil.createConnection();
		UpdateResponse response = new UpdateResponse();
		try {
			solrClient.setDefaultCollection(collection);
			response = solrClient.deleteById(requestVO.getId());
		} catch (SolrServerException e1) {
			e1.printStackTrace();
		} catch (IOException e1) {
			e1.printStackTrace();
		}
		ResponseVO responseVO = new ResponseVO();
		if(response != null){
			responseVO.setMessage("Document Deleted");
		}
		return responseVO;
	}

}