Occasionally I will write scripts which generate HTML documents. I understand the view that HTML is “object code”, and that its formatting doesn’t matter, but I’ve never been completely able to adopt that position. As a result, I usually try to emit “pretty-printed” HTML, and have developed a few small classes to assist in the effort.
Code
Here are the utility classes I use; since WordPress mangles code, they are also available as a download:
class HtmlTag(object):
endless = set(['link', 'input'])
def __init__(self, tag, *children, **kw_args):
self.tag = tag
self.children = children
self.attributes = kw_args.get('attributes', {})
def to_html(self, c_indent="", i_indent='\t'):
if (self.attributes):
attrs = ' '.join('%s="%s"'%(k,v) for k,v in self.attributes.items())
rv = ['%s<%s %s>' % (c_indent, self.tag, attrs)]
else:
rv = ['%s<%s>' % (c_indent, self.tag)]
for c in self.children: rv.extend(c.to_html(c_indent+i_indent, i_indent))
if (not self.tag.lower() in self.endless): rv.append('%s</%s>' % (c_indent, self.tag))
return rv
class HtmlLineTag(HtmlTag):
def to_html(self, c_indent="", i_indent='\t'):
return [c_indent + ''.join(HtmlTag.to_html(self, '', ''))]
class HtmlText(object):
def __init__(self, text):
self.text = text
def to_html(self, c_indent="", i_indent='\t'):
lines = self.text.split('\n')
return [c_indent+l+' ' for l in lines[:-1]] + [c_indent+lines[-1]]
A few qualifiers:
- These classes are designed to emit HTML 4.01, not XHTML
- These classes don’t support DOCTYPE declarations, those must be added in another way
- HtmlTag.endless is *not* an exhaustive list of endless HTML elements (i.e. those without end tags); add your own as needed
- There might be a better, built-in way to do this in Python, but this was fast enough to write that I’m too lazy to look; if you know of one, feel free to tell me
Examples
Defining and printing a simple <P> tag:
>>> t = HtmlTag('P', HtmlText('This is a test'))
>>> print '\n'.join(t.to_html())
<P>
This is a test
</P>
Some simple tags you don’t want to place on their own line:
>>> a = HtmlLineTag('A', HtmlText('click here'), attributes={'href':"http://www.example.com"})
>>> t = HtmlTag('P', a)
>>> print '\n'.join(t.to_html())
<P>
<A href="http://www.example.com">click here</A>
</P>
You’ll usually want to apply some initial intenting, to fit into an existing HTML BODY tag:
>>> print '\n'.join(t.to_html('\t\t'))
<P>
<A href="http://www.example.com">click here</A>
</P>
Subclasses are easy to write:
class Cell(HtmlLineTag):
def __init__(self, text):
self.tag = 'TD'
self.children = [HtmlText(text)]
self.attributes = {}
class Row(HtmlTag):
def __init__(self, items):
self.tag = 'TR'
self.children = map(Cell, items)
self.attributes = {}
They can result in simpler, more legible generation code:
>>> rows = ['Top', 'Center', 'Bottom']
>>> cols = ['Left', 'Middle', 'Right']
>>> t = HtmlTag('TABLE', *[Row(['%s %s'%(c,r) for c in cols]) for r in rows])
>>> print '\n'.join(t.to_html('\t\t'))
<TABLE>
<TR>
<TD>Left Top</TD>
<TD>Middle Top</TD>
<TD>Right Top</TD>
</TR>
<TR>
<TD>Left Center</TD>
<TD>Middle Center</TD>
<TD>Right Center</TD>
</TR>
<TR>
<TD>Left Bottom</TD>
<TD>Middle Bottom</TD>
<TD>Right Bottom</TD>
</TR>
</TABLE>
None of this is rocket science, but it makes me happy.