Groovy

Escaping XML with Groovy 2.1

When posting source code to my blog, I often need to convert less than signs (<), and greater than signs (>) to their respective entity references so that they are not confused as HTML tags when the browser renders the output. I have often done this using quick search-and-replace syntax like %s/</\&lt;/g and %s/>/\&gt;/g with vim or Perl. However, Groovy 2.1 introduced a method to do this and in this post I demonstrate a Groovy script that makes use of that groovy.xml.XmlUtil.escapeXml(String) method.
 
 
 

escapeXml.groovy

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env groovy
/*
 * escapeXml.groovy
 *
 * Requires Groovy 2.1 or later.
 */
if (args.length < 1)
{
   println "USAGE: groovy escapeXml.groovy <xmlFileToBeProcessed>"
   System.exit(-1)
}
def inputFileName = args[0]
println "Processing ${inputFileName}..."
def inputFile = new File(inputFileName)
String outputFileName = inputFileName + ".escaped"
def outputFile = new File(outputFileName)
if (outputFile.createNewFile())
{
   outputFile.text = groovy.xml.XmlUtil.escapeXml(inputFile.text)
}
else
{
   println "Unable to create file ${outputFileName}"
}

The XmlUtil.escapeXml method is intended to, as its GroovyDoc states, “escape the following characters ” ‘ & < > with their XML entities.” Running source code through it helps to convert symbols to XML entity references that will be rendered properly by the browser. This is particularly helpful with Java code that uses generics, for example.

The Groovydoc states that the following transformations from symbols to corresponding entity references are supported:

SymbolEntity
Reference
&quot;
&apos;
&&amp;
<&lt;
>&gt;

One of the advantages of this approach is that I can escape all five of these special symbols in an entire String or file with a single command rather than one symbol at a time.

The Groovydoc for this XmlUtil.escapeXml method also states things that this method does not do:

  • “Does not escape control characters” [use XmlUtil.escapeControlCharacters(String) for this]
  • “Does not support DTDs or external entities”
  • “Does not treat surrogate pairs specially”
  • “Does not perform Unicode validation on its input”

My example above showed a Groovy script file that makes use of XmlUtil.escapeXml(String), but it can also be run inline on the command-line. This is done in DOS, for example, as shown here:

1
type escapeXml.groovy | groovy -e "println groovy.xml.XmlUtil.escapeXml(System.in.text)"

That command just shown will take the provided file (escapeXml.groovy itself in this case) and render output with the specific symbols replaced with entity references. It could be handled the same way in Linux/Unix with “cat” rather than “type.” This is shown in the next screen snapshot.

inlineEscapeXmlGroovy

This blog post has shown how XmlUtil.escapeXml(String) can be used within a script or on the command-line to escape certain commonly problematic XML characters to their entity references. Although not shown here, one could embed such code within a Java application as well.
 

Reference: Escaping XML with Groovy 2.1 from our JCG partner Dustin Marx at the Inspired by Actual Events blog.
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy
Subscribe
Notify of
guest


This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button