Converting XML to JSON & Raw Use in MongoDB & Spring Batch
Overview
Why convert XML to JSON for raw use in MongoDB?
Since MongoDB uses JSON documents in order to store records, just as tables and rows store records in a relational database, we naturally need to convert our XML to JSON.
Some applications may need to store raw (unmodified) JSON because there is uncertainty in how the data will be structured.
There are hundreds of XML-based standards. If an application is to process XML files that do not follow the same standard, there is uncertainty in how the data will be structured.
Why use Spring Batch?
Spring Batch provides reusable functions that are essential in processing large volumes of records and other features that enable high-volume and high performance batch jobs. The Spring Website has documented Spring Batch well.
For another tutorial on Spring Batch, see my previous post on Processing CSVs with Spring Batch.
0 – Converting XML to JSON For Use In MongoDB With Spring Batch Example Application
The example application converts an XML document that is a “policy” for configuring a music playlist. This policy is intended to resemble real cyber security configuration documents. It is a short document but illustrates how you will search complex XML documents.
The approach we will be taking our tutorial is for handling XML files of varying style. We want to be able to handle the unexpected. This is why we are keeping the data “raw.”
1 – Project Structure
It is a typical Maven structure. We have one package for this example application. The XML file is in src/main/resources.
2 – Project Dependencies
Besides our typical Spring Boot dependencies, we include dependencies for an embedded MongoDB database and for processing JSON.
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.michaelcgood</groupId> <artifactId>michaelcgood-spring-batch-mongodb</artifactId> <version>0.0.1</version> <packaging>jar</packaging> <name>michaelcgood-spring-batch-mongodb</name> <description>Michael C Good - XML to JSON + MongoDB + Spring Batch Example</description> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>1.5.7.RELEASE</version> <relativePath /> <!-- lookup parent from repository --> </parent> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>de.flapdoodle.embed</groupId> <artifactId>de.flapdoodle.embed.mongo</artifactId> <version>1.50.5</version> </dependency> <dependency> <groupId>cz.jirutka.spring</groupId> <artifactId>embedmongo-spring</artifactId> <version>RELEASE</version> </dependency> <dependency> <groupId>org.json</groupId> <artifactId>json</artifactId> <version>20170516</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-mongodb</artifactId> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
3 – XML Document
This is the example policy document created for this tutorial. It’s structure is based on real cyber security policy documents.
- Note the parent of the document is the Policy tag.
- Important information lies within the Group tag.
- Look at the values that reside within the tags, such as the id in Policy or the date within status.
There’s a lot of information condensed in this small document to consider. For instance, there is also the XML namespace (xmlns). We won’t touch on this in the rest of the tutorial, but depending on your goals it could be something to add logic for.
<?xml version="1.0"?> <Policy xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" style="STY_1.1" id="NRD-1"> <status date="2017-10-18">draft</status> <title xmlns:xhtml="http://www.w3.org/1999/xhtml">Guide to the Configuration of Music Playlist</title> <description xmlns:xhtml="http://www.w3.org/1999/xhtml" >This guide presents a catalog of relevant configuration settings for a playlist that I listen to while I work on software development. <html:br xmlns:html="http://www.w3.org/1999/xhtml"/> <html:br xmlns:html="http://www.w3.org/1999/xhtml"/> Providing myself with such guidance reminds me how to efficiently configure my playlist. Lorem ipsum <html:i xmlns:html="http://www.w3.org/1999/xhtml">Lorem ipsum,</html:i> and Lorem ipsum. Some example <html:i xmlns:html="http://www.w3.org/1999/xhtml">Lorem ipsum</html:i>, which are Lorem ipsum. </description> <Group id="remediation_functions"> <title xmlns:xhtml="http://www.w3.org/1999/xhtml" >Remediation functions used by the SCAP Security Guide Project</title> <description xmlns:xhtml="http://www.w3.org/1999/xhtml" >XCCDF form of the various remediation functions as used by remediation scripts from the SCAP Security Guide Project</description> <Value id="is_the_music_good" prohibitChanges="true" > <title xmlns:xhtml="http://www.w3.org/1999/xhtml" >Remediation function to fix bad playlist</title> <description xmlns:xhtml="http://www.w3.org/1999/xhtml" >Function to fix bad playlist. Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum </description> <value> function fix_bad_playlist { # Load function arguments into local variables Lorem ipsum Lorem ipsum Lorem ipsum # Check sanity of the input if [ $# Lorem ipsum ] then echo "Usage: Lorem ipsum" echo "Aborting." exit 1 fi } </value> </Value> </Group> </Policy>
4 – MongoDB Configuration
Below we specify that we are using an embedded MongoDB database, make it discoverable for a component scan that is bundled in the convenience annotation @SpringBootApplication, and specify that mongoTemplate will be a bean.
package com.michaelcgood; import java.io.IOException; import cz.jirutka.spring.embedmongo.EmbeddedMongoFactoryBean; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.data.mongodb.core.*; import com.mongodb.MongoClient; @Configuration public class MongoConfig { private static final String MONGO_DB_URL = "localhost"; private static final String MONGO_DB_NAME = "embeded_db"; @Bean public MongoTemplate mongoTemplate() throws IOException { EmbeddedMongoFactoryBean mongo = new EmbeddedMongoFactoryBean(); mongo.setBindIp(MONGO_DB_URL); MongoClient mongoClient = mongo.getObject(); MongoTemplate mongoTemplate = new MongoTemplate(mongoClient, MONGO_DB_NAME); return mongoTemplate; } }
5 – Processing XML to JSON
step1() of our Spring Batch Job contains calls three methods to help process the XML to JSON. We will review each individually.
@Bean public Step step1() { return stepBuilderFactory.get("step1") .tasklet(new Tasklet() { @Override public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception { // get path of file in src/main/resources Path xmlDocPath = Paths.get(getFilePath()); // process the file to json String json = processXML2JSON(xmlDocPath); // insert json into mongodb insertToMongo(json); return RepeatStatus.FINISHED; } }).build(); }
5.1 – getFilePath()
This method simply gets the file path that is passed as a parameter to the method processXML2JSON.
Note:
- ClassLoader is helping us locate the XML file in our resources folder.
// no parameter method for creating the path to our xml file private String getFilePath(){ String fileName = "FakePolicy.xml"; ClassLoader classLoader = getClass().getClassLoader(); File file = new File(classLoader.getResource(fileName).getFile()); String xmlFilePath = file.getAbsolutePath(); return xmlFilePath; }
5.2 – processXML2JSON(xmlDocPath)
The string returned by getFilePath is passed into this method as a parameter. A JSONOBject is created from a String of the XML file.
// takes a parameter of xml path and returns json as a string private String processXML2JSON(Path xmlDocPath) throws JSONException { String XML_STRING = null; try { XML_STRING = Files.lines(xmlDocPath).collect(Collectors.joining("\n")); } catch (IOException e) { e.printStackTrace(); } JSONObject xmlJSONObj = XML.toJSONObject(XML_STRING); String jsonPrettyPrintString = xmlJSONObj.toString(PRETTY_PRINT_INDENT_FACTOR); System.out.println("PRINTING STRING :::::::::::::::::::::" + jsonPrettyPrintString); return jsonPrettyPrintString; }
5.3 – insertToMongo(json)
We insert the parsed JSON into a MongoDB document. We then insert this document with the help of the @Autowired mongoTemplate into a collection named “foo”.
// inserts to our mongodb private void insertToMongo(String jsonString){ Document doc = Document.parse(jsonString); mongoTemplate.insert(doc, "foo"); }
6 – Querying MongoDB
step2() of our Spring Batch Job contains our MongoDB queries.
- mongoTemplate.collectionExists returns a Boolean value based on the existence of the collection.
- mongoTemplate.getCollection(“foo”).find() returns all the documents within the collection.
- alldocs.toArray() returns an array of DBObjects.
- Then we call three methods that we will review individually below.
public Step step2(){ return stepBuilderFactory.get("step2") .tasklet(new Tasklet(){ @Override public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception{ // all printing out to console removed for post's brevity // checks if our collection exists Boolean doesexist = mongoTemplate.collectionExists("foo"); // show all DBObjects in foo collection DBCursor alldocs = mongoTemplate.getCollection("foo").find(); List<DBObject> dbarray = alldocs.toArray(); // execute the three methods we defined for querying the foo collection String result = doCollect(); String resultTwo = doCollectTwo(); String resultThree = doCollectThree(); return RepeatStatus.FINISHED; } }).build(); }
6.1 – First query
The goal of this query is to find a document where style=”STY_1.1″. To accomplish this, we need to remember where style resides in the document. It is a child of Policy; therefore, we address it in the criteria as Policy.style.
The other goal of this query is to only return the id field of the Policy. It is also just a child of Policy.
The result is returned by calling this method: mongoTemplate.findOne(query, String.class, “foo”);. The output is a String, so the second parameter is String.class. The third parameter is our collection name.
public String doCollect(){ Query query = new Query(); query.addCriteria(Criteria.where("Policy.style").is("STY_1.1")).fields().include("Policy.id"); String result = mongoTemplate.findOne(query, String.class, "foo"); return result; }
6.2 – Second query
The difference between the second query and the first query are the fields returned. In the second query, we return Value, which is a child of both Policy and Group.
public String doCollectTwo(){ Query query = new Query(); query.addCriteria(Criteria.where("Policy.style").is("STY_1.1")).fields().include("Policy.Group.Value"); String result = mongoTemplate.findOne(query, String.class, "foo"); return result; }
6.3 – Third query
The criteria for the third query is different. We only want to return a document with the id “NRD-1” and a status date of “2017-10-18”. We only want to return two fields: title and description, which are both children of Value.
Referring to the XML document or the printed JSON in the demo below for further clarification on the queries.
public String doCollectThree(){ Query query = new Query(); query.addCriteria(Criteria.where("Policy.id").is("NRD-1").and("Policy.status.date").is("2017-10-18")).fields().include("Policy.Group.Value.title").include("Policy.Group.Value.description"); String result = mongoTemplate.findOne(query, String.class, "foo"); return result; }
7 – Spring Batch Job
The Job begins with step1 and calls step2 next.
@Bean public Job xmlToJsonToMongo() { return jobBuilderFactory.get("XML_Processor") .start(step1()) .next(step2()) .build(); }
8 – @SpringBootApplication
This is a standard class with static void main and @SpringBootApplication.
package com.michaelcgood; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.EnableAutoConfiguration; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration; @SpringBootApplication @EnableAutoConfiguration(exclude={DataSourceAutoConfiguration.class}) public class SpringBatchMongodb { public static void main(String[] args) { SpringApplication.run(SpringBatchMongodb.class, args); } }
9 – Demo
9.1 – step1
The JSON is printed as a String. I have cut the output past description below because it is long.
Executing step: [step1] PRINTING STRING :::::::::::::::::::::{"Policy": { "Group": { "Value": { "prohibitChanges": true, "description": {
9.2 – step2
I have cut the results to format the output for the blog post.
Executing step: [step2]
Checking if the collection exists
Status of collection returns :::::::::::::::::::::true
Show all objects
list of db objects returns:::::::::::::::::::::[{ "_id" : { "$oid" : "59e7c0324ad9510acf5773c0"} , [..]
Just return the id of Policy
RESULT:::::::::::::::::::::{ "_id" : { "$oid" : "59e7c0324ad9510acf5773c0"} , "Policy" : { "id" : "NRD-1"}}
To see the other results printed to the console, fork/download the code from Github and run the application.
10 – Conclusion
We have reviewed how to convert XML to JSON, store the JSON to MongoDB, and how to query the database for specific results.
Further reading:
The source code is on Github
Published on Java Code Geeks with permission by Michael Good, partner at our JCG program. See the original article here: Converting XML to JSON + Raw Use in MongoDB + Spring Batch Opinions expressed by Java Code Geeks contributors are their own. |