Wednesday 1 December 2010

Removing XML namespaces

I was working on a 2 versions of a BPM product the older one without support for namespaces and the newer one supporting namespaces. The messages from external systems were with namespaces. When the old system was called it was required to strip the namespaces. Initially we thought of using xslt to achieve this. But then again regular expression came to our rescue. The cute little function below can remove namespaces from xml string


public static String removeXmlStringNamespaceAndPreamble(String xmlString) {


return xmlString.replaceAll("(<\\?[^<]*\\?>)?", ""). /* remove preamble */
replaceAll("xmlns.*?(\"|\').*?(\"|\')", "") /* remove xmlns declaration */
.replaceAll("(<)(\\w+:)(.*?>)", "$1$3") /* remove opening tag prefix */
.replaceAll("(</)(\\w+:)(.*?>)", "$1$3"); /* remove closing tags prefix */


}
Love the power of regular expression :)

3 comments:

  1. Excellent! Thank you very much for this!

    ReplyDelete
  2. INCREDIBLE!!!!!!!!

    ReplyDelete
  3. I recently encountered a similar situation and the only comment I would add is to be extra careful if the input xmlString is large. In our case, the input String is around ~200k, and we are seeing frequent OOM in production with Java's replaceAll implementation. I doucmented our use case here.

    http://app-inf.blogspot.com/2013/04/pitfalls-of-handling-
    large-string.html

    ReplyDelete