Monday, February 23, 2009

An argument against XML

My argument against using XML as a data format in certain situations is that it is too verbose. In other situations, however, the verbosity provided by XML is needed. Such as for human consumption. This is why XML exists, it is easy to use and read by both humans and computers.

The verbosity problem with XML stems from the use of tags. Every entity represented in XML needs needs to be enclosed in a tag. The opening tag indicating that a new entity definition has started and the ending tag indicating the end of that definition. For example, consider the following XML.
<person>
<name>adam</name>
</person>
This is a trivial example of a person entity with a single name attribute. Notice the duplication of the text "person" and "name" in the metadata. With XML this is required. However, tags may also have attributes. Our person definition could be expressed as follows.
<person name="adam"/>
Here there is no metadata duplication. But I think the second example negates the readability philosophy behind XML. What exactly is the difference between attributes and child entities in XML? Semantically, there is none. A child entity is still an attribute of the parent entity.

With JSON, there is no duplication of metadata or any confusion of how an entity is defined. This is because the JSON format is focused on lightweight data, not readability. For instance, here is our person in JSON.
{person:{name:"adam"}}
Now, if a person were reading this, the chances of them getting the meaning right are greatly reduced when compared to the XML equivalent. However, it is much less verbose in most cases. And verbosity counts when data is being transferred over a network. Another plus, the XML is not lost. JSON can easily be converted to XML and back. So if JSON-formatted data must be edited by humans as XML, this is not difficult to achieve.

Here is a simple Python demonstration of reducing the size of XML data with JSON.
#Example; XML string and JSON string

xml_string="""
<entry><title>mytitle</title><body>mybody</body></entry>
"""

json_string="""
{entry:{title:"mytitle",body:"mybody"}}
"""
if __name__=="__main__":
print 'XML Length:',len(xml_string)
print 'JSON Length:',len(json_string)
pcent=float(len(json_string))/len(xml_string)*100
print 'XML size as JSON:',pcent,'%'

Finally, since XML is based on tags, there is no opportunity for sets of primitive types. For example, some client says to the server "give me a list of names and nothing else". The client will likely name something along the lines of the following.
<list>
<item name="name1"/>
<item name="name2"/>
<item name="name3"/>
</list>
Here is the JSON alternative.
["name1", "name2", "name3"]