XML Manipulation With XML Copy Editor
The XML document format, created in 1996, is still widely used to facilitate communication between disparate systems (though for certain implementations is somewhat being replaced by JSON). As a Java developer, I generally interface with data in an XML document via a DOM parser, but there are occasions where being able to manipulate an XML document directly is advantageous or even necessary.
This might be achievable in a simple text editor in unlikely cases where the document in question is particularly small, but often these documents can be quite large. Thus, it would be beneficial to be able to manipulate the document in a program specifically designed for that type of job.
In my case, I don’t need to do this type of thing often enough to justify paying for such a piece of software, but I have found that XML Copy Editor has fulfilled my needs for free.
A project I am working on communicates with another piece of vendor software through SOAP calls. Sometimes the vendor software does not behave as expected, data gets corrupted, and must be remedied. With no way to facilitate this through the client’s UI, the solution is to directly manipulate the XML data and return it to the vendor system via a SOAP call. This process involves often several pieces of data, delineated by locality (in this case, the state). These files can be up to 10 MB in size, with entries for all 50 states. In my experience, XML Copy Editor has never had any struggle in handling large files, which is of great benefit.
In the proceeding paragraphs, I will be describing not only XML Copy Editor, but will also be providing a quick tutorial of XML/XSLT/XPATH. Taking the following as an example, which is a simplified representation of the type of file I usually work with:
<Header1> <Header2> <Detail> <State> <Name>Missouri</Name> </State> <SubDetail> <Premium>500</Premium> </SubDetail> </Detail> <Detail> <State> <Name>Kansas</Name> </State> <SubDetail> <Premium>600</Premium> </SubDetail> </Detail> </Header2> </Header1>
Usually the first action I take upon opening the file in XML Copy Editor is to execute the “XML → Pretty Print” functionality (F11), since the file is generally meant for communication between systems, and not human readability. “Pretty Print” formats the document with appropriate line breaks and indentations, making it much easier to read and interpret.
Now, to find the data that needs to be altered, we employ XPath. What we are looking for in this case is the “Premium” element for Missouri. From the same menu where we accessed “Pretty Print”, we can also access “XML → Evaluate XPath…” (F9), which presents us with a modal dialog containing a single text entry field. The XPath statement we enter is “\\Header1\Header2\Detail\SubDetail\Premium”. Let’s parse this XPath out, shall we?
This statement is fairly straight-forward; simply stated, it says to return every node with the ancestry, “Header1 → Header2 → Detail → SubDetail → Premium.” However, this will return both “Premium” nodes for Missouri and Kansas, which is not what we want. Keep in mind that every XPath statement has the potential to return multiple results; it’s up to the developer to make the statement concise enough to return the desired result. In other words, every XPath statement makes the request, “Give me all elements that meet this criteria.” So, we refine this statement in the following manner:
“\\Header1\Header2\Detail\SubDetail\Premium[../../../@State='Missouri']”.
The addition of the “[../../../@State=’Missouri’]” at the end of this statement can be a bit confusing, but it is a very valuable part of XPath. The “@State=’Missouri’” is a qualifier that says to return the Premium element where the value of the “State” node in the shared ancestry is equal to “Missouri”. If the “State” node were a child of “Premium”, then the statement would look like this:
“\\Header1\Header2\Detail\SubDetail\Premium[@State='Missouri']”
However, since the “State” node is higher up the ancestry and is a sibling of one of the elements on our XPath (the second “Policy” node), the qualifier has to “step back” in order to identify where in the XML the “State” node resides. Every “../” in the qualifier is one step upwards. Think of it as similar to “cd..” in the Windows command line.
Now that we have found the data we want to change, we must alter it. XPath is not capable of this on its own, but using XPath along with XSLT can accomplish our goals. XSLT “transforms” an XML document by reading the rules defined by “templates,” which are defined in a separate XSLT file. The format of the XSLT file is very similar to that of XML. We can create an XSLT file in XML Copy Editor as follows:
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:XSL="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="\\Header1\Header2\Detail\SubDetail\Premium[@State='Missouri']">100</xsl:template> </xsl:stylesheet>
Then, we instigate the “XML → XSL Transform…” (F8) action from the menu. This presents us with an open file dialog, from where we select the XSL file we just defined. This will produce a new XML document with the following contents:
<Header1> <Header2> <Detail> <State> <Name>Missouri</Name> </State> <SubDetail> <Premium>100</Premium> </SubDetail> </Detail> <Detail> <State> <Name>Kansas</Name> </State> <SubDetail> <Premium>600</Premium> </SubDetail> </Detail> </Header2> </Header1>
So, how does this work? Well, there are two “templates” defined in the XSL file. Templates apply the rules defined to the indicated XML. In this case, the first template instructs to copy the entirety of the XML to the output file. The second template instructs to change the value of the node at the indicated XPath to the value defined.
And there we have it. I hope that you’ve found this helpful! Leave a comment if you have any questions.
Reference: | XML Manipulation With XML Copy Editor from our JCG partner Robert Rice at the Keyhole Software blog. |