A recent project required me to handle a large-ish number of symbolic constants, and I found it useful to develop a little utility class to help me manage them. Weighing in at only 8 lines, it made my life a lot easier.
Requirements
I’ve been working on a Python disassembler (i.e. a disassembler for x86 machine code, written in Python, not a disassembler for Python bytecode) and need to handle a large number of register names. These were my requirements:
- I wanted to refer to registers in my Python code with symbolic names (e.g. “
reg = AL
“, or “reg = registers.AL
“) - I did not want to clutter my program’s namespace with several dozen register names
- I did not want to use strings (e.g. “
reg = 'AL'
“) - I did not want to use numeric constants (e.g. “
reg = 0x00
“) - Since x86 addressing is rife with statements like “Select the AL, AX, or EAX register, based on the operand size”, I wanted to be able to construct register references algorithmically
- I wanted to be able to easily get the name of a register, as text suitable for output, from its symbolic representation
Code
To meet these requirements, I decided to write a small utility class:
class Enum(object):
def __init__(self, lut):
self.lut = lut
self.rlut = dict((v, k) for (k, v) in lut.items())
def __getattr__(self, name):
return self.lut.get(name)
def __getitem__(self, value):
return self.rlut.get(value)
Examples
The class is initialized with a dictionary, like this:
reg_set = Enum({'AL':0x00,'CL':0x01,'DL':0x02,'BL':0x03,
'AH':0x04,'CH':0x05,'DH':0x06,'BH':0x07,
'AX':0x10,'CX':0x11,'DX':0x12,'BX':0x13,
'SP':0x14,'BP':0x15,'SI':0x16,'DI':0x17,
'EAX':0x20,'ECX':0x21,'EDX':0x22,'EBX':0x23,
'ESP':0x24,'EBP':0x25,'ESI':0x26,'EDI':0x27,
'CS':0x38,'SS':0x39,'DS':0x3A,'ES':0x3B,
'FS':0x3C,'GS':0x3D,})
The class is used with statements like these:
>>> reg = reg_set.AL
>>> reg
0
>>> reg_set[reg_set.AL]
'AL'
>>> reg_set[0x10 + 0x03]
'BX'
Capabilities
This class lets me define register names in a way which meets all my requirements:
- I use symbolic names for all register references (e.g. “
reg_set.AL
“) - All register names are defined within the namespace of a single object
- Obviously, I don’t have to use strings or integer constants to refer to registers
- Since I defined the register literals carefully, s.t. “families” of registers all have the same low nibble, I can create a register reference by knowing only the family of the register (e.g. “the registers with a low nibble of 2”) and the size of the register (e.g. “16 bits”).
- I can turn a symbolic register reference into a register name with a single lookup
Operation
The operation of the Enum class is pretty straight-forward. It is initialized with a dictionary which maps symbolic names, represented as strings, to unique literals. (The uniqueness is important, as the dictionary must be reversible.) The Enum stores the dictionary in its “lut
” member, and also creates a reverse dictionary, which it stores as “rlut
“. Two special methods handle all other operations.
The “__getattr__
” method is called whenever a reference to a member of an instance of the Enum class cannot be resolved by Python’s normal resolution process. It simply looks up the member name (as a text string) in the “lut
” dictionary. If the symbol was defined there, it returns the appropriate literal. This method handles expressions of the form “reg_set.AL
“, since no “AL” member is defined in the class or instance.
The “__getitem__
” method is called to resolve expressions involving the “[]
” syntax. The expression within the brackets is passed to __getitem__
, which looks it up in the “rlut
” dictionary. This method handles expressions of the form “reg_set[0]
“, or “reg_set[reg_set.DX]
“. (In the latter case, of course, the “__getattr__
” method is called first, to resolve the “DX” member of reg_set.)
Overall, I was pretty happy with the solution, which was the shortest and simplest I could think of.