Wednesday, October 14, 2009

Working With Elements

There is virtually no escaping XML markup in one form or another when building modern applications. Whether HTML, SOAP, or some other dialect, XML has become an important standard to support in applications. Even if the application isn't a web application.

Thankfully, most programming languages have built-in library support for reading and manipulating XML. Some are better than others. For instance, the ElementTree Python package is probably the easiest library for developers to work with. It doesn't add unnecessary complexity on top of a simplistic standard.

The ElementTree package is now part of the growing set of standard Python modules included in the distribution. Since the module can be used to both read and write XML data, it cuts down on dependencies. Most XML libraries support both reading and writing of XML data, but like any other library type, it may do one well but not the other.

The following is an example of how easy it is to not only use the ElementTree package to build an XML document, but also to add abstractions around the elements that are created.
#Example; Abstracting elements.

#Do element tree imports.
from xml.etree.ElementTree import Element, tostring

#The base DOM element.
class Dom(object):
def __init__(self, name, **kw):

#Create the element and set the attributes.
self._element=Element(name)
for i in kw.keys():
self._element.attrib[i]=kw[i]

#Set an element attribute.
def __setitem__(self, name, value):
self._element.attrib[name]=value

#Get an attribute.
def __getitem__(self, name):
return self._element.attrib(name)

#Append a sub-element.
def append(self, value):
self._element.append(value._element)

#A specialized Dom class for accepting raw text content.
class DomContent(Dom):

#Constructor.
def __init__(self, name, content=None, **kw):
Dom.__init__(self, name, **kw)
self._element.text=content

#Common HTML elements.
class Head(Dom):
def __init__(self):
Dom.__init__(self, "head")

class Title(DomContent):
def __init__(self, content, **kw):
DomContent.__init__(self, "title", content, **kw)

class Body(Dom):
def __init__(self):
Dom.__init__(self, "body")

class Div(DomContent):
def __init__(self, content, **kw):
DomContent.__init__(self, "div", content, **kw)

#The root document.
class Document(Dom):
def __init__(self, title):
Dom.__init__(self, "html")

#Initialize the head, title, and body elements.
self.head=Head()
self.title=Title(title)
self.body=Body()

#Add the title element to the head element.
self.head.append(self.title)

#Add the head and body elements to the document.
self.append(self.head)
self.append(self.body)

#Actual output.
def __str__(self):
return tostring(self._element)

#Main.
if __name__=="__main__":

#Initialize the document with a title.
my_doc=Document("My Document")

#Create a div with content and an attribute.
my_div=Div("My Div", style="float: left;")

#Add the div to the body.
my_doc.body.append(my_div)

#Display.
print my_doc
In this example, we construct a simple HTML page. The Dom class is the topmost level abstraction that we create around ElementTree. The Dom class is meant to represent any HTML tags that are placed in the HTML page. The Dom._element attribute represents the actual ElementTree element. The Dom constructor will give the element attribute values based on what keyword parameters were passed to the constructor. The Dom.__getitem__() and Dom.__setitem__() methods allow element attributes to be get and set respectively. The Dom.append() method allows other Dom instances to be attached to the current instance as a sub-element.

The DomContent class is a simple specialization of the Dom class. The DomContent class accepts an additional content parameter, otherwise, the class isn't really any different than its' base class.

The Head, Title, Body, and Div classes are all standard HTML specializations of the Dom class. The main difference being that Title and Div inherit from DomContent instead of Dom because they support raw text content.

The Document class is a helper type of abstraction. It assembles Dom elements common in all HTML pages we might want to build. It is the Document class that makes the main program trivial to read and understand.