Python Constants

A recent project required me to handle a large-ish number of symbolic constants, and I found it useful to develop a little utility class to help me manage them. Weighing in at only 8 lines, it made my life a lot easier.

Requirements

I’ve been working on a Python disassembler (i.e. a disassembler for x86 machine code, written in Python, not a disassembler for Python bytecode) and need to handle a large number of register names. These were my requirements:

  • I wanted to refer to registers in my Python code with symbolic names (e.g. “reg = AL“, or “reg = registers.AL“)
  • I did not want to clutter my program’s namespace with several dozen register names
  • I did not want to use strings (e.g. “reg = 'AL'“)
  • I did not want to use numeric constants (e.g. “reg = 0x00“)
  • Since x86 addressing is rife with statements like “Select the AL, AX, or EAX register, based on the operand size”, I wanted to be able to construct register references algorithmically
  • I wanted to be able to easily get the name of a register, as text suitable for output, from its symbolic representation

Code

To meet these requirements, I decided to write a small utility class:

class Enum(object):
    def __init__(self, lut):
        self.lut    = lut
        self.rlut   = dict((v, k) for (k, v) in lut.items())
    def __getattr__(self, name):
        return self.lut.get(name)
    def __getitem__(self, value):
        return self.rlut.get(value)

Examples

The class is initialized with a dictionary, like this:

reg_set = Enum({'AL':0x00,'CL':0x01,'DL':0x02,'BL':0x03,
                'AH':0x04,'CH':0x05,'DH':0x06,'BH':0x07,
                'AX':0x10,'CX':0x11,'DX':0x12,'BX':0x13,
                'SP':0x14,'BP':0x15,'SI':0x16,'DI':0x17,
                'EAX':0x20,'ECX':0x21,'EDX':0x22,'EBX':0x23,
                'ESP':0x24,'EBP':0x25,'ESI':0x26,'EDI':0x27,
                'CS':0x38,'SS':0x39,'DS':0x3A,'ES':0x3B,
                'FS':0x3C,'GS':0x3D,})

The class is used with statements like these:

>>> reg = reg_set.AL
>>> reg
0
>>> reg_set[reg_set.AL]
'AL'
>>> reg_set[0x10 + 0x03]
'BX'

Capabilities

This class lets me define register names in a way which meets all my requirements:

  • I use symbolic names for all register references (e.g. “reg_set.AL“)
  • All register names are defined within the namespace of a single object
  • Obviously, I don’t have to use strings or integer constants to refer to registers
  • Since I defined the register literals carefully, s.t. “families” of registers all have the same low nibble, I can create a register reference by knowing only the family of the register (e.g. “the registers with a low nibble of 2”) and the size of the register (e.g. “16 bits”).
  • I can turn a symbolic register reference into a register name with a single lookup

Operation

The operation of the Enum class is pretty straight-forward. It is initialized with a dictionary which maps symbolic names, represented as strings, to unique literals. (The uniqueness is important, as the dictionary must be reversible.) The Enum stores the dictionary in its “lut” member, and also creates a reverse dictionary, which it stores as “rlut“. Two special methods handle all other operations.

The “__getattr__” method is called whenever a reference to a member of an instance of the Enum class cannot be resolved by Python’s normal resolution process. It simply looks up the member name (as a text string) in the “lut” dictionary. If the symbol was defined there, it returns the appropriate literal. This method handles expressions of the form “reg_set.AL“, since no “AL” member is defined in the class or instance.

The “__getitem__” method is called to resolve expressions involving the “[]” syntax. The expression within the brackets is passed to __getitem__, which looks it up in the “rlut” dictionary. This method handles expressions of the form “reg_set[0]“, or “reg_set[reg_set.DX]“. (In the latter case, of course, the “__getattr__” method is called first, to resolve the “DX” member of reg_set.)

Overall, I was pretty happy with the solution, which was the shortest and simplest I could think of.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in Python. Bookmark the permalink.

Comments are closed.