XML Library

When I needed to do a lot of XML parsing and manipulation, I found I was commonly doing a bunch of relatively basic things and needing to do a lot of set-up to use the XercesC library to do them. What I really wanted was a set of wrappers around the XercesC functionality that'd make it easy to do the common stuff, while still giving me the ability to get down to the raw APIs if I really needed to. So, these routines were born. Note that this library doesn't eliminate the need to understand at least the basics of the XercesC API and classes.

The XmlDOMUtil namespace came first, with routines to parse XML documents into a DOM tree and do common operations on that tree. Later I added routines to create nodes, build trees and serialize them back into text form. Not much to say here, it's a fairly straightforward wrapper around the equivalent XercesC functions that lets you use standard C++ strings, character pointers and the like instead of XercesC Unicode strings.

The XmlSAXUtil namespace came later when, for a personal project, I needed something more memory-efficient. XmlSAXElementHandler objects encapsulate the handler code for a particular XML element during parsing. XmlSAXHandler is a XercesC handler class that you can configure with a map of element names and pointers to element handler objects. When the parser hits an element you've configured, it calls your element handler with the information about that occurrence of the element (technically, it calls your handler during endElement() processing). There's also an XmlSAXUserData class. You can attach an object derived from that class to the XmlSAXHandler object and a pointer to it'll be passed to your XmlSAXElementHandler routines. XmlSAXHandler also calls Reset() and Done() methods on the user data object at the start and end of processing of an XML document, so you can do your own setup and finalization conveniently.

The XmlUtil namespace got created to hold some common stuff needed by both the DOM and SAX routines, like the base Exception class and a class to help converting between XercesC Unicode strings and standard C++ strings.

One asymmetric bit: the only way to create XML documents is using the DOM functions. The SAX functions are for parsing only. If you think about it, it pretty much has to be that way. SAX is based on doing call-backs during parsing, you can't sensibly do that except while parsing a document. So be prepared to work with the DOM functions even if you're using SAX for parsing.

Future plans:

  • Fix const correctness. It's a bit lax at the moment.
  • Add the ability to monitor startElement() processing to XmlSAXElementHandler. Most stuff only needs endElement() handling with all the information passed in, and we don't have the contents until then, but for some context-sensitive processing it can be useful to set up on the way in (eg. invoking different handling depending on what the parent element is).
  • Add helper routines in both SAX and DOM classes to help handle elements and content of types other than strings. Users don't need to be constantly rewriting conversions to/from integers, doubles, booleans and the like.
  • Make saving of attributes more efficient. There's no way to avoid a stack. Attributes are only available in startElement() and the content won't be available until endElement(), since we want both when we do endElement() we've got to push the attributes onto a stack and pop them back off as we leave the element. The code's solid and free from bugs or memory leaks, but there's ways to do it that involve less copying.
  • Add better support for namespaces (both prefixes and URIs) to the DOM routines, and improve support in the SAX routines.
  • Shift things around to better support schemas. Allow for specifying the schema at parse time, and for validating parsers where the schema is expected to be given in the document.
  • Add support in XmlDOMUtil for basic multi-level searches without needing full XPath and it's overhead. Nothing overly complicated, just a simple interface for things like "Find all ADDRESS elements that're immediate children of CUSTOMER elements, ignore ones that're located anywhere else in the tree.".
  • Add support for XPath when searching for elements in XmlDOMUtil.

Licensing:

This code is licensed under the terms of the GPL v3, a copy of which you can find attached to this page.