Xmlresolver
The xmlresolver project provides an advanced implementation of the SAX EntityResolver (and extended EntityResolver2), the Transformer URIResolver, the DOM LSResourceResolver, the StAX XMLResolver, and a new NamespaceResolver. It uses the OASIS XML Catalogs V1.1 Standard to provide a mapping from external identifiers and URIs to local resources.
Install / Use
/learn @xmlresolver/XmlresolverREADME
XMLResolver: An enhanced XML resolver with XML Catalog support
The xmlresolver project provides an advanced implementation of the SAX
EntityResolver, the Transformer URIResolver, and a new
NamespaceResolver. The implementation uses the OASIS XML Catalogs V1.1
Standard to provide a mapping from public identifiers to local
resources.
The xmlresolver can be found on Maven Central and has the coordinates:
<groupId>org.xmlresolver</groupId>
<artifactId>xmlresolver</artifactId>
In addition to enhanced support for RDDL-based namespace resolution, the implementation supports automatic local caching of resources. This provides the advantages of the catalog specification without requiring users to manage the mapping by hand.
Applications can use the resolver directly or they can instantiate one of a set of convenience classes to access parsers that automatically implement these resolvers.
The goal of this project is to produce a clean, reasonably simple API and a robust, thread-safe implementation.
See also: https://xmlresolver.org/
For guidelines about how to migrate your application from the Apache Commons resolver to this resolver, see documentation and examples in https://github.com/xmlresolver/resolver-migration
Version 6.x
Version 6.x is a significant refactoring and is not (wholly) backwards compatible with version 5.x. (The underlying functionality is the same, but the API is slightly different.) The version 5.x sources are now in the legacy_v5 branch. Important bug fixes will be applied to the 5.x release for some time, but new development is focused on the 6.x release.
Three main considerations drove the refactoring:
- Correcting design errors. For example, using a
javax.xml.transform.Sourceto return a non-XML resource. - Simplification of the design (removing the caching feature, for example)
- Bringing the Java and C# implementations into better alignment.
What’s changed?(tl;dr)
Where previously you would have instantiated an
org.xmlresolver.Resolver and used it as the entity resolver for SAX
(and other) APIs, you should now instantiate an
org.xmlresolver.XMLResolver. This new object has methods for
performing catalog lookup and resource resolution. It also has methods that
return resolver APIs. See Using an XML Resolver.
Behind the scenes, the API has been reworked so that most operations
consist of constructing a request for some resource, asking the XMLResolver to either
(just) look it up in the catalog or resolve it, and returning a response.
A note about version numbers
The XML Resolver API is often integrated into other projects and products. On the one hand, this means that it’s valuable to publish new releases early so that integrators can test them. On the other hand, integrators quite reasonably want to make production releases with only the most stable versions.
In an effort to make this easier, starting with version 6.x, the XML Resolver releases will use an even/odd pattern version number strategy to identify development and stable branches.
If the second number in the verion is even, that’s a work-in-progress, stabalization release. Please test it, and report bugs. If the second number is odd, that’s a stable release. (Test that and report bugs too, obviously!)
In other words 6.0.x are stabalization releases. When the API is deemed stable, there will be a 6.1.0 release. If more features are developed or significant changes are undertaken, those will be published in a series of 6.2.x releases before stabalizing in a 6.3.0 release. Etc.
ChangeLog
6.0.21
- Log “file not found” errors at the debug level, not the warn or error level. A catalog path that refers to non-existent catalogs (for example, ./catalog.xml) is fairly common and the warnings are usually spurious.
6.0.20
- Added checks to avoid an NPE when trying to set the system identifier to the resolved URI.
- Substantially rewrote, and simplified, the resolver logger. All of the complexity associated with categories has been removed. Most messages previously associated with a category are simply debug level messages.
- The
DEFAULT_LOGGER_LOG_LEVELis now theLOGGER_LOG_LEVEL. - Cleaned up the error handling in the catalog loader to avoid spurious error messages about attempting to parse ZIP files (when ultimately retrieving a catalog from the ZIP file).
6.0.19
- Fixed #253. RDDL parsing always uses the resolved resource, not the original URI. RDDL parsing is also disabled now, by default. If you’re using RDDL, make sure you enable it in your configuration.
- The caching feature is not present in version 6.x of the resolver, but there were several dangling references to it. Those have also been removed.
6.0.18
- Fixed #250. Xerces is no longer an implementation dependency.
6.0.17
- Fixed #248. In a last ditch attempt to resolve a relative URI, resolve it against the base URI of the request, not the current working directory.
6.0.16
- This is an interface change. It adds a
copy()method to theResolverConfigurationinterface and removes the then redundant copy constructor forXMLResolverConfiguration. There are also a few JavaDoc fixes.
6.0.15
- Fixed #242. Rewrite entries are now concatenated with, not resolved against, the rewrite prefix.
6.0.14
- Fixed #229. The interaction between the “always resolve” feature and namespace URI lookup had the unfortunate consequence of returning the (almost always incorrect) document at the namespace URI as the resource. I’ve adjusted the API so that this is no longer the case. It’s now possible to configure “always resolve” on a per-request basis. This changes an interface so it’s not binary backwards compatible with previous 6.x releases.
- Fixed #230. When “always resolve” returned a resource, that URI wasn’t being logged which lead to a confusing log trace.
6.0.13 / 5.3.0
- Changed the API so that an attempt to read a scheme that’s forbidden (by
ResolverFeature.ACCESS_EXTERNAL_ENTITYorResolverFeature.ACCESS_EXTERNAL_DOCUMENT) raises anIllegalArgumentExceptioninstead of returningnull. - Generally, the XML Resolver tries to avoid throwing exceptions, but in this
case failing to do so opens a security vulnerability. Returning
nulloften signals the underlying parser to simply load the resource with the original URI. This circumvents the attempt to limit access.
6.0.12
- Reworked the API to use interfaces for
ResourceRequestandResourceResponse. This makes writing a schema handler easier. This is a backwards incompatible change if you were directly accessing those objects. You have to access theResourceRequestImplandResourceResponseImplinstead. On the plus side, the setters are now public on those methods. - Added code that attempts to detect a Windows path (C:\path) passed as a catalog or property file name and avoid accidentally constructing a URI with the scheme “C”.
6.0.11
This version introduces a new API for registering a scheme resolver. This will allow a resolver to be configured, for example, to handle custom URI schemes as are sometimes found in products.
API Changes
Several classes and interfaces are marked as deprecated. They were removed for several early 6.0.x releases but have been restored for binary backwards compatibility.
CatalogResover,Resolver,ResourceResolver,StAXResolver, andXercesResolverare replaced by methods onXMLResolver.Resource,ResolvedResourceandResolvedResourceImplare replaced, effectively, byResourceRequestandResourceResponse.- All the classes related to caching.
The two main classes for users are XMLResolverConfiguration (largely
unchanged) and XMLResolver.
The new XMLResolver object has methods for querying the catalog and
resolving resources. It also has methods that return resolvers for
different APIs.
getURIResolver()returns ajavax.xml.transform.URIResolvergetLSResourceResolver()returns aorg.w3c.dom.ls.LSResourceResolvergetEntityResolver()returns aorg.xml.sax.EntityResolvergetEntityResolver2()returns aorg.xml.sax.ext.EntityResolver2getXMLResolver()returns ajavax.xml.stream.XMLResolver
A note about ALWAYS_RESOLVE
The standard contract for the Java resolver APIs is that they return
null if the resolver doesn’t find a match. But on the modern web, lots
of URIs redirect (from http: to https: especially), and some
parsers don’t follow redirects. That causes the parse to fail in ways
that may not be easy for the user to fix.
By default, the XML Resolver will always resolve resources, follow redirects, and return a stream. This deprives the parser of the option to try something else, but means that redirects don’t cause the parse to fail.
If your implementation want
