Monday, 6 December 2010

Validating XML against multiple schema's within the same namespace

I was trying to validate XML against a XSD schema which in turn contains included schemas. I was getting this strange exception "src-resolve: Cannot resolve the name someName to a(n) 'type definition' component" where someName is the component that is declared in the included schema. I tried to load the schema using newSchema(Array) constructor but that did nor help either.
Schema schema = schemaFactory.newSchema(sources);
Where sources is an array of the StreamSource pointing to the main schema as well as the included schemas.
After googling I found this is actually a known bug in Xerces.
https://issues.apache.org/jira/browse/XERCESJ-1130
One way to bypass this problem is to write a custom ResourceResolver. However after lot of trial and error I found a way to simpler way to do this. The trick is to create a URL object and set systemID of the source to this URL.
//open the main schema

File file = new File("schema1.xsd");
 FileInputStream ss = new FileInputStream(file);
//Get the URL for this file

URL url = file.toURI().toURL();
// Create the StreamSource for the schema
StreamSource source = new StreamSource(url.openStream());
// Most Important: set the systemID to this URL. The included schema will be searched from this location
source.setSystemId(url.toString());
Now create the schema and validation should work fine

 SchemaFactory schemaFactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(source);

Wednesday, 1 December 2010

Removing XML namespaces

I was working on a 2 versions of a BPM product the older one without support for namespaces and the newer one supporting namespaces. The messages from external systems were with namespaces. When the old system was called it was required to strip the namespaces. Initially we thought of using xslt to achieve this. But then again regular expression came to our rescue. The cute little function below can remove namespaces from xml string


public static String removeXmlStringNamespaceAndPreamble(String xmlString) {


return xmlString.replaceAll("(<\\?[^<]*\\?>)?", ""). /* remove preamble */
replaceAll("xmlns.*?(\"|\').*?(\"|\')", "") /* remove xmlns declaration */
.replaceAll("(<)(\\w+:)(.*?>)", "$1$3") /* remove opening tag prefix */
.replaceAll("(</)(\\w+:)(.*?>)", "$1$3"); /* remove closing tags prefix */


}
Love the power of regular expression :)