Forgiving XML Parsers?

No, its not a new feature in Whidbey, but should it be considered? The [aus-dotnet] mailing list has been having a few problems getting the mailing list made accessible as an RSS feed, threads like this don’t usually catch my eye, but Stewart Johnson posted this link to an excellent blog entry on the history of XML with respect to how it copes with errors.

To be honest I always thought that forgiving HTML parsers were a bit of mistake that we are still paying for, but perhaps I was to hasty in coming to that conclusion. I strongly recommend that you read the entry linked to above, it is truely a facinating read.

One of the things that I pondered was what would an error correcting parser look like? I drew up a bit of diagram showing what a forgiving XML stack would look like. Error correction would have to involve a set of rules on how to handle certain types of errors, and the behaviour would need to be configurable by the application developer.

It would also need to be layered, so the first and most simple check “is the document well-formed?” would be done first. The error correction rules would come into play and attempt to fix things up before it got passed up the stack – for example, to a validating parser. Rules would hook in at this layer too.

The thing is, most developers are under amazing time pressures to get the job done, so for the average in-house corporate developer I doubt whether they would set the forgiving bit. Certainly in the first release of any application I produced, I would be fairly strict in what I accepted, if only because its easier on me.

One thought on “Forgiving XML Parsers?

  1. Tim Walters

    The issue with the Aus-dotnet mailing list RSS is not that it’s invalid XML, because it’s entirely well-formed.

    The issue is that it’s not compliant with the RSS schema definition, it’s a data issue not a structure issue. RSS is strongly defined if you know where to look, but many just smack together something that looks kinda like RSS and leave it to the RSS aggregators to fix any inconsistencies. There have been many discussions on these issues with RSS.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s