Xmltool
XML manipulation library in Java built on a Fluent API
Install / Use
/learn @mathieucarbou/XmltoolREADME
Table of Contents
- Mycila XML Tool
Mycila XML Tool
XMLTool is a very simple Java library to be able to do all sorts of common operations with an XML document. As a Java developer, I often end up writing the always the same code for processing XML, transforming, ... So i decided to put all in a very easy to use class using the Fluent Interface pattern to facilitate XML manipulations.
XMLTag tag = XMLDoc.newDocument(false)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addNamespace("wicket", "http://wicket.sourceforge.net/wicket-1.0")
.addRoot("html")
.addTag("wicket:border")
.gotoRoot().addTag("head")
.addNamespace("other", "http://other-ns.com")
.gotoRoot().addTag("other:foo");
System.out.println(tag.toString());
Features
With XML Tool you will be able to quickly:
- Create new XML documents from external sources or new document from scrash
- Manage namespaces
- Manipulating nodes (add, remove, rename)
- Manipulating data (add, remove text or CDATA)
- Navigate into the document with shortcuts and XPath (note: XPath supports namespaces)
- Tranform an XMlDoc instance to a String or a Document
- Validate your document against schemas
- Executin callbacks on a hierarchy
- Remove all namspaces (namespace ignoring)
- ... and a lot of other features !
Project status
- Issues: https://github.com/mathieucarbou/xmltool/issues
- OSGI Compliant: <img width="100px" src="http://www.sonatype.com/system/images/W1siZiIsIjIwMTMvMDQvMTIvMTEvNDAvMzcvMTgzL05leHVzX0ZlYXR1cmVfTWF0cml4X29zZ2lfbG9nby5wbmciXV0/Nexus-Feature-Matrix-osgi-logo.png" title="OSGI Compliant"></img>
Maven Repository
Releases
Available in Maven Central Repository: http://repo1.maven.org/maven2/com/mycila/mycila-xmltool/
Snapshots
Available in OSS Repository: https://oss.sonatype.org/content/repositories/snapshots/com/mycila/mycila-xmltool/
Maven dependency
<dependency>
<groupId>com.mycila</groupId>
<artifactId>mycila-xmltool</artifactId>
<version>X.Y.ga</version>
</dependency>
Maven sites
- [4.0.ga] (http://oss.carbou.me/xmltool/reports/4.0.ga/index.html)
Documentation
Performance consideration
XML Tool uses the Java DOM API and Document creation has a cost. Thus, to improve peformance, XML Tool uses 2 Object pools of DocumentBuilder instances:
- one pool for namespace-aware document builders
- another one ignoring namespaces
You can configure the pools by using XMLDocumentBuilderFactory.setPoolConfig(config)
By default, each of the 2 pools have the following configuration:
- min idle = 0
- max idle = CPU core number
- max total = CPU core number * 4
- max wait time = -1
If your application is heavily threaded and a lot of threads are using XMLTag concurrently, to avoid thread contention you might want to increase the max total to match your peak thread count and max idle to match your average thread count.
If your application does not use a lot of thread and often create documents, you could probably lower those numbers.
The goal is to have sufficient DocumentBuilder instances available in the pool to be able to "feed" your application as demand without waiting for these objects to become available.
Using an object pool is sure much more complicated, but it will prevent any threading issues and also maximize performance because of object reuse.
Creating XML documents
Creating a new XML document
The newDocument method crate a new XML document. You then have to choose a default namespace if you want and then choose the root name of the document.
System.out.println(XMLDoc.newDocument(true).addRoot("html").toString());
gives:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html/>
Loading an existing XML document
The from methods can load an XML document from any of the following types:
- org.w3c.dom.Node
- InputSource
- Reader
- InputStream
- File
- URL
- String
- javax.xml.transform.Source
Example:
URL yahooGeoCode = new URL("http://local.yahooapis.com/MapsService/V1/geocode?appid=YD-9G7bey8_JXxQP6rxl.fBFGgCdNjoDMACQA--&state=QC&country=CA&zip=H1W3B8");
System.out.println(XMLDoc.from(yahooGeoCode, true).toString());
System.out.println(XMLDoc.from(yahooGeoCode, true).getText("Result/City"));
outputs:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ResultSet xmlns="urn:yahoo:maps" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:yahoo:maps http://api.local.yahoo.com/MapsService/V1/GeocodeResponse.xsd">
<Result precision="zip">
<Latitude>45.543289</Latitude>
<Longitude>-73.543098</Longitude>
<Address/>
<City>Montreal</City>
<State>QC</State>
<Zip>H1W 3B8</Zip>
<Country>CA</Country>
</Result>
</ResultSet>
<!-- ws04.search.re2.yahoo.com uncompressed Tue Dec 9 13:39:12 PST 2008 -->
Montreal
Ignoring namespaces
All creational methods XMLDoc.newDocument and XMLDoc.from requires a boolean attribute ignoreNamespaces. If this attribute is set to true, all namespaces in the document are ignored. This is really useful if you use XPath a lot since you can avoid prefixing all your XPath elements.
Example:
System.out.println(XMLDoc.newDocument(true)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html"));
System.out.println(XMLDoc.newDocument(false)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html"));
outputs:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html/>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/"/>
Navigating in a document with namespaces using XPath is quite a pain:
doc.gotoTag("ns0:body").addTag("child")
.gotoParent().addCDATA("with special characters")
.gotoTag("ns0:body").addCDATA("<\"!@#$%'^&*()>")
whereas if you load the same document with ignoreNamespaces, you can simply navigate like this when you use XPath:
doc.gotoTag("body").addTag("child")
.gotoParent().addCDATA("with special characters")
.gotoTag("body").addCDATA("<\"!@#$%'^&*()>")
Using namespaces
When you create or load a document, and if you decide to not ignore namespaces, you can add a default namespace for your document and add other ones after. Namespace management is quite a challenge, specifically when using XPath. When you have an XMLTag instance, you have access to the following methods to manage namespaces in the document:
Adding and retrieving namespaces and prefixes
addDefaultNamespace
When you create an empty document, you can define a default namespace to use for the document. In example:
XMLTag doc = XMLDoc.newDocument()
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html");
will produce:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/"/>
addNamespace
When you obtained an XMLTag instance, you can add any namespace you want. In example:
XMLTag doc = XMLDoc.newDocument()
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addNamespace("wicket", "http://wicket.sourceforge.net/wicket-1.0")
.addRoot("html")
.addTag("wicket:border")
.gotoRoot().addTag("head")
.addNamespace("other", "http://other-ns.com")
.gotoRoot().addTag("other:foo");
will produce:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/">
<wicket:border xmlns:wicket="http://wicket.sourceforge.net/wicket-1.0"/>
<head/>
<other:foo xmlns:other="http://other-ns.com"/>
</html>
Namespace prefix generation
When you load an existing XML document, or when you define a default namespace in a new document, prefixes and namespaces are automatically found in the whole document. Often, XML documents have default namespace. This is often the case for example in XHTML documents, like below. For this case, XMLDoc will generate for you a prefix that you can use for XPath navigation, and register the namespace as being the default one.
In example, the following document will have a default namespace and also a prefix generated to access it: ns0.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title/>
</head>
<body/>
</html>
XMLTag doc = XMLDoc.from(...);
assertEquals(doc.getPrefix("http://www.w3.org/1999/xhtml"), "ns0");
assertEquals(doc.getContext().getNamespaceURI("ns0"), "http://www.w3.org/1999/xhtm
