Parsing json with HTML style comments using python

I recently needed to add version control information into a file which had data stored in json format. As it turns out, I’m not the only one who learned about the issues surrounding json and comments. I wanted an HTML style comment at the end of the file. The solution was real simple, when using pyparsing. Below is an example so you can judge for yourself.

  1. import pyparsing
  2. import simplejson
  3.  
  4. class JsonAndHtmlCommentDecoder(simplejson.JSONDecoder):
  5.     def raw_decode(self, s, idx=0):
  6.         try:
  7.             obj, end = self.scan_once(s, idx)
  8.         except StopIteration:
  9.             raise ValueError("No JSON object could be decoded")
  10.         except Exception, e:
  11.             print e
  12.  
  13.         # The calling method will raise an error when the value of end is
  14.         # less than the length of the input string. 
  15.         try:
  16.             pyparsing.htmlComment.parseString(s[end:], parseAll = True)
  17.             end = len(s)
  18.         except pyparsing.ParseException, e:
  19.             pass
  20.  
  21.         return obj, end
  22.  
  23. if __name__ == '__main__':
  24.     # input string was copied from:
  25.     # http://json.org/example.html
  26.     json_input = '''{"menu": {
  27.   "id": "file",
  28.   "value": "File",
  29.   "popup": {
  30.     "menuitem": [
  31.       {"value": "New", "onclick": "CreateNewDoc()"},
  32.       {"value": "Open", "onclick": "OpenDoc()"},
  33.       {"value": "Close", "onclick": "CloseDoc()"}
  34.     ]
  35.   }
  36. }}
  37. <!-- Source control information -->
  38. '''
  39.     print simplejson.loads(json_input, cls=JsonAndHtmlCommentDecoder)
  40.     # The statement below will fail with a simplejson.decoder.JSONDecodeError
  41.     print simplejson.loads(json_input)

When executed, it will output this:

  1. {u'menu': {u'popup': {u'menuitem': [{u'onclick': u'CreateNewDoc()', u'value': u'New'}, {u'onclick': u'OpenDoc()', u'value': u'Open'}, {u'onclick': u'CloseDoc()', u'value': u'Close'}]}, u'id': u'file', u'value': u'File'}}
  2. Traceback (most recent call last):
  3.   File "...projecten\dx\aa.py", line 45, in ?
  4.     print simplejson.loads(json_input)
  5.   File "...\python2.4\lib\simplejson\__init__.py", line 413, in loads
  6.     return _default_decoder.decode(s)
  7.   File "...\python2.4\lib\simplejson\decoder.py", line 405, in decode
  8.     raise JSONDecodeError("Extra data", s, end, len(s))
  9. simplejson.decoder.JSONDecodeError: Extra data: line 12 column 1 - line 13 column 1 (char 242 - 256)

The first line prints the output of an successful attempt to decode json. The stacktrace is what you get when the default decoder attempts to decode the string and fails. I’ll describe what I did in the next paragraph.

The design of the simplejson.loads method allows for an alternate JSONDecoder. In the example above the custom decoder is named JsonAndHtmlCommentDecoder. This decoder parses the remaining bytes and kicks in when scan_once finishes without raising an exception. No problem when there is no remainder, no problem when the remainder contains a HTML style comment, but there is a problem when the remainder contains something else. The calling method uses the return value of raw_decode to determine the decode state; there is no error when it matches the length of the input string and there is an error otherwise.

This, of course, is just a simple example. One could do more magic when the results of parseString would be interpreted or when you would use scanString. Check the documentation at packages.python.org to see for yourself. When you take a minute to review the pyparsing documentation, be sure to take a look at the Variables. You’ll see it’s not limited to htmlComments.

Comments, improvements or praise? Let me know.

Leave a Reply

Your email address will not be published. Required fields are marked *