Monday 6 December 2010

Validating XML against multiple schema's within the same namespace

I was trying to validate XML against a XSD schema which in turn contains included schemas. I was getting this strange exception "src-resolve: Cannot resolve the name someName to a(n) 'type definition' component" where someName is the component that is declared in the included schema. I tried to load the schema using newSchema(Array) constructor but that did nor help either.
Schema schema = schemaFactory.newSchema(sources);
Where sources is an array of the StreamSource pointing to the main schema as well as the included schemas.
After googling I found this is actually a known bug in Xerces.
https://issues.apache.org/jira/browse/XERCESJ-1130
One way to bypass this problem is to write a custom ResourceResolver. However after lot of trial and error I found a way to simpler way to do this. The trick is to create a URL object and set systemID of the source to this URL.
//open the main schema

File file = new File("schema1.xsd");
 FileInputStream ss = new FileInputStream(file);
//Get the URL for this file

URL url = file.toURI().toURL();
// Create the StreamSource for the schema
StreamSource source = new StreamSource(url.openStream());
// Most Important: set the systemID to this URL. The included schema will be searched from this location
source.setSystemId(url.toString());
Now create the schema and validation should work fine

 SchemaFactory schemaFactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(source);

Wednesday 1 December 2010

Removing XML namespaces

I was working on a 2 versions of a BPM product the older one without support for namespaces and the newer one supporting namespaces. The messages from external systems were with namespaces. When the old system was called it was required to strip the namespaces. Initially we thought of using xslt to achieve this. But then again regular expression came to our rescue. The cute little function below can remove namespaces from xml string


public static String removeXmlStringNamespaceAndPreamble(String xmlString) {


return xmlString.replaceAll("(<\\?[^<]*\\?>)?", ""). /* remove preamble */
replaceAll("xmlns.*?(\"|\').*?(\"|\')", "") /* remove xmlns declaration */
.replaceAll("(<)(\\w+:)(.*?>)", "$1$3") /* remove opening tag prefix */
.replaceAll("(</)(\\w+:)(.*?>)", "$1$3"); /* remove closing tags prefix */


}
Love the power of regular expression :)

Tuesday 30 November 2010

XML pretty print without parsing

I was working on some XML generators and one of the methods receives the XML as a string which is written to a file. The xml string however is not formatted and hence the output looks clumsy. There are several pretty print methods available but all of them parses the string to xml for formatting. I did not want to want to a use a heavyweight process like parsing just for formatting. So this piece of code uses regular expression to format the the String as an xml.

public static String prettyPrintXMLAsString(String xmlString) {
/* Remove new lines */
xmlString.replaceAll("\n", "");
StringBuffer xmlFinal = new StringBuffer();
/* Grooup the xml tags */
Pattern p = Pattern
.compile("(<[^/][^>]+>)?([^<]*)(</[^>]+>)?(<[^/][^>]+/>)?");
Matcher m = p.matcher(xmlString);
int tabCnt = 0;
while (m.find()) {
/* Groups return null as string when no match. So replace */
String str1 = (null == m.group(1) || m.group().equals("null")) ? ""
: m.group(1);
String str2 = (null == m.group(2) || m.group().equals("null")) ? ""
: m.group(2);
String str3 = (null == m.group(3) || m.group().equals("null")) ? ""
: m.group(3);
String str4 = (null == m.group(4) || m.group().equals("null")) ? ""
: m.group(4);


printTabs(tabCnt, xmlFinal);
if (!str1.equals("") && str3.equals("")) {
++tabCnt;
}
if (str1.equals("") && !str3.equals("")) {
--tabCnt;
xmlFinal.deleteCharAt(xmlFinal.length() - 1);


}


xmlFinal.append(str1);
xmlFinal.append(str2);
xmlFinal.append(str3);
/* Handle <mytag/> king of tags*/
if (!str4.equals("")) {
xmlFinal.append("\n");
printTabs(tabCnt, xmlFinal);
xmlFinal.append(str4);
}
xmlFinal.append("\n");
}
return xmlFinal.toString();
}


private static void printTabs(int cnt, StringBuffer buf) {
for (int i = 0; i < cnt; i++) {
buf.append("\t");
}




However stuff like CDATA might not work here..