Editorial note: This is the second of a two-part tutorial on reverse engineering executables. Today, we’ll walk through the process of analyzing a series of assembly instructions. These instructions implement the codewheel verification check from the 1989 game “Neuromancer”. We covered the process of finding these instructions last week.
Let’s dive right in, and begin to examine the code, instruction by instruction.
Stack setup
A brief note on the format of the following instructions: The first column of numbers (e.g. 13DB:5992
) contains segment:offset addresses describing the instructions’ locations in memory, as determined in last week’s exercise. The second column of numbers (e.g. 55
, or 8BEC
) contains the machine language bytes which comprise the instructions. The remainder of a row (e.g. PUSH BP
) contains the assembly language interpretation of those bytes (as 8086 real-mode machine code).
With that said, here are the first three instructions:
13DB:5992 55 PUSH BP
13DB:5993 8BEC MOV BP, SP
13DB:5995 83EC0C SUB SP, +0C
These instructions initialize the stack frame: they save the Base Pointer from the previous frame on the stack, set the Base Pointer for the current frame to the current Stack Pointer, and allocate 12 bytes of memory on the stack, addressable as [BP-0C]
through [BP-01]
.
File loading
13DB:5998 B87C6A MOV AX, 6A7C
13DB:599B 1E PUSH DS
13DB:599C 50 PUSH AX
13DB:599D B84A54 MOV AX, 544A
13DB:59A0 50 PUSH AX
13DB:59A1 E8ACCB CALL 2550
13DB:59A4 83C406 ADD SP, +06
Call the function at CS:2550
with a near pointer to DS:544A
, and a far pointer to DS:6A7C
. (I list the near pointer first, since it’s passed at a lower memory address on the stack.) After the function call, the arguments are deleted from the stack.
If we use DOS DEBUG, we can find out a little more about what this function call does. If we set a breakpoint just before the function call (with the debugger command “g 13db:59a1
“) we will see that, before the call, DS:544A
points to the NULL-terminated string “paxcodes.txh”, and DS:6A7C
points to garbage. (Use the debugger’s “d
” command – e.g. “d ds:544a
” – to examine the contents of memory.) If we run to the instruction following the call (with the debugger command “g 13db:59a4
“) we will find that DS:6A7C
now points to a series of NULL-terminated strings including words such as “Chatsubo”, “Cyberspace”, “Gemeinschaft”, and so on; the keywords found on the code wheel.
It seems reasonable to conclude that CS:2550
loads the contents of a file into memory; the filename is given by its first argument, the destination buffer by its second. (The file itself is presumably stored somewhere within the NEURO1.DAT or NEURO2.DAT files that ship with the game.)
Mystery function 1
13DB:59A7 B80100 MOV AX, 0001
13DB:59AA 50 PUSH AX
13DB:59AB E8D3F7 CALL 5181
13DB:59AE 83C402 ADD SP, +02
Call the function at CS:5181
with an argument of 1, then delete the argument from the stack. Its purpose is unclear.
Mystery function 2
13DB:59B1 B80800 MOV AX, 0008
13DB:59B4 50 PUSH AX
13DB:59B5 50 PUSH AX
13DB:59B6 E8F6F1 CALL 4BAF
13DB:59B9 83C404 ADD SP, +04
Call the function at CS:4BAF
with arguments of 8 and 8, then delete the arguments from the stack. The purpose of this function is not immediately obvious.
Display text
13DB:59BC B80200 MOV AX, 0002
13DB:59BF 50 PUSH AX
13DB:59C0 B85854 MOV AX, 5458
13DB:59C3 50 PUSH AX
13DB:59C4 E813F2 CALL 4BDA
13DB:59C7 83C404 ADD SP, +04
Call the function at CS:4BDA
with arguments of a near pointer to DS:5458
and the integer 2, then delete the arguments from the stack. The meaning of the “2” is obscure, but DS:5458
points (at the time of function invocation, as determined with DEBUG) to the NULL-terminated string “PAX – Public Access System”. Since this text appears at the top of the prompt screen, it seems likely that CS:4BDA
prints the text given by its first argument to the screen.
Problem setup
13DB:59CA E801A1 CALL FACE
13DB:59CD 250F00 AND AX, 000F
13DB:59D0 8946FC MOV [BP-04], AX
13DB:59D3 E8F8A0 CALL FACE
13DB:59D6 250F00 AND AX, 000F
13DB:59D9 8946FA MOV [BP-06], AX
13DB:59DC E8EFA0 CALL FACE
13DB:59DF 250F00 AND AX, 000F
13DB:59E2 8946F8 MOV [BP-08], AX
This code makes 3 calls to the function at CS:FACE
, and stores the low 4 bits of the results to WORDs of memory at [BP-04]
, [BP-06]
, and [BP-08]
. Since CS:FACE
is called 3 times, it’s probably expected to return different values each time (otherwise, it would be more efficient to simply copy the results of one call) despite the fact that it’s called with no arguments. This implies that CS:FACE
is a pseudo-random number generator. The pRNG hypothesis is bolstered by the facts that the results of the calls to CS:FACE
are clipped to one of the 16 values between 0 and 15, and that there are 16 labels on the inner and outer codewheel rings, and 16 codewheel windows. From context, it seems likely that this code is (pseudo-)randomly generating the PAX verification query to be posed.
Mystery function 2 (redux)
13DB:59E5 B81800 MOV AX, 0018
13DB:59E8 50 PUSH AX
13DB:59E9 B86000 MOV AX, 0060
13DB:59EC 50 PUSH AX
13DB:59ED E8BFF1 CALL 4BAF
13DB:59F0 83C404 ADD SP, +04
Another call to CS:4BAF
, this time with arguments of 0x60
, and 0x18
.
Mystery function 3
13DB:59F3 B80200 MOV AX, 0002
13DB:59F6 50 PUSH AX
13DB:59F7 FF76FC PUSH [BP-04]
13DB:59FA B87C6A MOV AX, 6A7C
13DB:59FD 50 PUSH AX
13DB:59FE E8BC21 CALL 7BBD
13DB:5A01 83C404 ADD SP, +04
Call the function at CS:7BBD
with arguments of a near pointer to DS:6A7C
, the contents of [BP-04]
(the first previously randomly-generated number between 0 and 15) and the integer 2. The meaning of the “2” is obscure, but DS:6A7C
points to a series of NULL-terminated strings including words such as “Chatsubo”, “Cyberspace”, “Gemeinschaft”, and so on: the keywords found on the code wheel. More particularly, the 1st 16 strings are from the outside ring of the codewheel, the next 16 from the inside ring, and the last 16 from the windows of the codewheel.
After the call to CS:7BBD
, remove the 1st two arguments from the stack, but leave the last – the integer “2” – in place for the next call.
Display text (redux)
13DB:5A04 50 PUSH AX
13DB:5A05 E8D2F1 CALL 4BDA
13DB:5A08 83C404 ADD SP, +04
Call the “print” function at CS:4BDA
with arguments of the CS:7BBD
return value and the integer “2”, then delete the arguments from the stack.
Display inner ring name
13DB:5A0B B82000 MOV AX, 0020
13DB:5A0E 50 PUSH AX
13DB:5A0F B86000 MOV AX, 0060
13DB:5A12 50 PUSH AX
13DB:5A13 E899F1 CALL 4BAF
13DB:5A16 83C404 ADD SP, +04
13DB:5A19 B80200 MOV AX, 0002
13DB:5A1C 50 PUSH AX
13DB:5A1D 8B46FA MOV AX, [BP-06]
13DB:5A20 051000 ADD AX, 0010
13DB:5A23 50 PUSH AX
13DB:5A24 B87C6A MOV AX, 6A7C
13DB:5A27 50 PUSH AX
13DB:5A28 E89221 CALL 7BBD
13DB:5A2B 83C404 ADD SP, +04
13DB:5A2E 50 PUSH AX
13DB:5A2F E8A8F1 CALL 4BDA
13DB:5A32 83C404 ADD SP, +04
This block of code repeats the 3 function calls we just saw, with subtle, but significant differences that allow us to draw some conclusions about what is going on.
First of all, the block begins with another call to CS:4BAF
, this time with arguments of 0x60
and 0x20
. We’ve now seen 3 calls to CS:4BAF
, with arguments of (0x08, 0x08)
, (0x60, 0x18)
, and (0x60, 0x20)
, each of which was quickly followed by a call to the “print” function at CS:4BDA
. This suggests that CS:4BAF
is doing some preparatory work for the “print” function; the most obvious guess is that it positions the cursor.
An examination of the prompt screen suggests that the game is using an 8×8 pixel monospaced font. If we assume that CS:4BAF
is a cursor positioning function, and that its arguments represent X and Y offsets in pixels, we’d expect the 2nd and 3rd lines to be indented 11 characters relative to the 1st line, the 2nd line to be 2 lines below the 1st, and the 3rd 1 line below the 2nd. This is, in fact, exactly what we see.
Secondly, another call is made to CS:7BBD
, but the 2nd argument is now the 2nd randomly generated number plus 16. Finally, the CS:7BBD
return value is printed to the screen.
It seems likely that CS:7BBD
returns the Nth NULL-terminated string from an array of character data. It also seems likely that [BP-04]
is the outer ring position, [BP-06]
the inner ring position, and [BP-08]
the codewheel window.
Display window name
13DB:5A35 B82800 MOV AX, 0028
13DB:5A38 50 PUSH AX
13DB:5A39 B86000 MOV AX, 0060
13DB:5A3C 50 PUSH AX
13DB:5A3D E86FF1 CALL 4BAF
13DB:5A40 83C404 ADD SP, +04
13DB:5A43 B80200 MOV AX, 0002
13DB:5A46 50 PUSH AX
13DB:5A47 8B46F8 MOV AX, [BP-08]
13DB:5A4A 052000 ADD AX, 0020
13DB:5A4D 50 PUSH AX
13DB:5A4E B87C6A MOV AX, 6A7C
13DB:5A51 50 PUSH AX
13DB:5A52 E86821 CALL 7BBD
13DB:5A55 83C404 ADD SP, +04
13DB:5A58 50 PUSH AX
13DB:5A59 E87EF1 CALL 4BDA
13DB:5A5C 83C404 ADD SP, +04
The third almost-repetition of the code block. Based upon the conclusions we’ve already drawn, we can assume that these instructions print the name of the codewheel window at character position (0x0C
, 0x05
).
Display prompt
13DB:5A5F B83800 MOV AX, 0038
13DB:5A62 50 PUSH AX
13DB:5A63 B80800 MOV AX, 0008
13DB:5A66 50 PUSH AX
13DB:5A67 E845F1 CALL 4BAF
13DB:5A6A 83C404 ADD SP, +04
13DB:5A6D B80200 MOV AX, 0002
13DB:5A70 50 PUSH AX
13DB:5A71 B87454 MOV AX, 5474
13DB:5A74 50 PUSH AX
13DB:5A75 E862F1 CALL 4BDA
13DB:5A78 83C404 ADD SP, +04
This code positions the cursor at character position (0x01, 0x07)
, and prints the string at DS:5474
– “Enter verification code:”.
Position cursor
13DB:5A7B B83800 MOV AX, 0038
13DB:5A7E 50 PUSH AX
13DB:5A7F B8D000 MOV AX, 00D0
13DB:5A82 50 PUSH AX
13DB:5A83 E829F1 CALL 4BAF
13DB:5A86 83C404 ADD SP, +04
This code positions the cursor at character position (0x1A, 0x07)
– 1 character to the left of the string just printed.
Mystery function 4
13DB:5A89 2BC0 SUB AX, AX
13DB:5A8B 50 PUSH AX
13DB:5A8C B80600 MOV AX, 0006
13DB:5A8F 50 PUSH AX
13DB:5A90 E84405 CALL 5FD7
13DB:5A93 83C404 ADD SP, +04
Call the function at CS:5FD7
with arguments of 6 and 0, then delete the arguments from the stack. The purpose of this function isn’t immediately obvious, but it’s perhaps suggestive that the longest number on the codewheel is only 6 digits long.
Check for input
13DB:5A96 3DFFFF CMP AX, FFFF
13DB:5A99 7508 JNZ 5AA3
13DB:5A9B 83FAFF CMP DX, -01
13DB:5A9E 7503 JNZ 5AA3
13DB:5AA0 E904FF JMP 59A7
Test the results of the call to CS:5FD7
: If AX
equals 0xFFFF
and DX
equals -1
, sign-extended (which comes to the same test for each register, really) then execution jumps back to CS:59A7
, which is just after the loading of “paxcodes.txh”, and just before the problem is generated or any text is written to the screen.
In practice, the jump back to CS:59A7
seems to occur when the user hits “Return” at the PAX verification prompt without entering any data. This lets us conclude two things:
CS:5FD7
is probably responsible for gathering keyboard input from the user at the promptCS:5181
is probably some sort of screen-clearing function (it’s the only still-unexplained function we’ve seen, and it seems likely that some function is clearing the screen).
Display acknowledgement
13DB:5AA3 B85800 MOV AX, 0058
13DB:5AA6 50 PUSH AX
13DB:5AA7 50 PUSH AX
13DB:5AA8 E804F1 CALL 4BAF
13DB:5AAB 83C404 ADD SP, +04
13DB:5AAE B80200 MOV AX, 0002
13DB:5AB1 50 PUSH AX
13DB:5AB2 B88E54 MOV AX, 548E
13DB:5AB5 50 PUSH AX
13DB:5AB6 E821F1 CALL 4BDA
13DB:5AB9 83C404 ADD SP, +04
This code positions the cursor at character position (0x0B, 0x0B)
, and prints the string at DS:548E
– “Verifying access…”.
Calculate LUT column
13DB:5ABC 8B5EF8 MOV BX, [BP-08]
13DB:5ABF 8A875620 MOV AL, [BX+2056]
13DB:5AC3 2AE4 SUB AH, AH
13DB:5AC5 0346FC ADD AX, [BP-04]
13DB:5AC8 2B46FA SUB AX, [BP-06]
13DB:5ACB 250F00 AND AX, 000F
13DB:5ACE 8946FE MOV [BP-02], AX
Look up the window index in the BYTE array at DS:2056
, then add the outer ring index to it, and subtract the inner ring index from it. Store the low 4 bits of the result in the WORD at [BP-02]
. To understand the significance of these calculations, we must consider the physical construction of the codewheel.
The codewheel consists of two paper disks, one slightly larger than the other, which are mounted on a common axis, and which may turn independently. The larger disk is divided into 16 radial slices; each slice contains a column of 8 codes and a label (“Chatsubo”, “Cyberspace”, “Gemeinschaft”, etc.) above the codes, on the disk’s perimeter.
Each code on the larger disk may be identified by a (rank, name) pair, where “name” is taken from the set of labels on the larger disk, and “rank” is a number from 0 to 7, where 0 represents the topmost/outermost code in a column. For instance, (0, “Cyberspace”) is 021655, (2, “Chatsubo”) is 44312, and (7, “Cyberspace”) is 045.
Each slice of the larger disk may also be assigned a number, from 0 to 15. In particular, we can assign each slice an index based upon the number of slices clockwise it falls from the “Chatsubo” slice. This allows us to identify each code on the larger disk with a pair of numbers. Restating the previous examples in these terms, (0, 1) is 021655, (2, 0) is 44312, and (7, 1) is 045. This sort of addressing lets us represent the set of codes on the larger disk as an 8 row, 16 column table.
The codewheel’s smaller disk is also divided into 16 radial slices, each of which is labelled with a name (“Ratz”, “Holografix”, “Larry Moe”, etc.) on the disk’s perimeter. The smaller disk is mounted in front of the larger disk, such that only the labels on the larger disk’s perimeter are visible at all times. The smaller disk also has 16 windows cut into it, which are aligned so that some of the codes on the larger disk are visible; which particular codes are visible depends upon the rotation of the smaller disk with respect to the larger disk. Each window is assigned a label (“Zion Cluster”, “Chiba City”, “Asano Computing”, etc.).
Each slice of the smaller disk may be assigned a number, from 0 to 15. In particular, we can assign each slice an index based upon the number of slices clockwise it is from the “Ratz” slice. We may also assign indicies to the windows by listing the windows in each slice from perimeter to center, and by ordering the slices by increasing clockwise distance from the “Ratz” slice. This convention would assign an index of 0 to “Zion Cluster”, 1 to “Chiba City”, 2 to “Asano Computing”, and so on, through 15 for “Fuji Electric”.
Each window may be characterized by a (rank, slice) pair, describing that window’s location on the smaller disk. The “slice” value is equal to the index of the slice in which the window is found, while “rank” is equal to the rank of the codes over which the window falls. For instance, the “0” or “Zion Cluster” window has a characteristic of (2, 0), while the “2” or “Asano Computing” window has a characteristic of (0, 1).
Finally, the relative rotation of the larger and smaller disks may be summarized by the index of the outer disk slice aligned with the inner disk’s “0” or “Ratz” slice. If the “Ratz” and “Chatsubo” slices are aligned, the disk’s rotation is 0. If the “Ratz” and “Cyberspace” slices are aligned, the disk’s rotation is 1.
With all that said, we can make some sense of the preceeding code. First of all, a disk rotation is represented in the verification query by one of the 16 inner and outer slice pairs that are aligned in that particular rotation. For instance, a rotation of 5 might be represented by an (outer, inner) pair of (7, 2), or (2, 13) – in terms of slice labels, these pairs would be (“Donut World”, “Larry Moe”) and (“Gemeinschaft”, “Cowboy”). To convert such a pair to a rotation, just subtract the “inner” index from the “outer” index, modulo 16. (The modulo computation addresses the fact that indices -11, 5, and 21 all refer to the same slice on a 16-segment disk.)
The rotation tells us which slice of the larger disk is aligned with the “0” slice of the smaller disk. If we add the slice characteristic of a particular window to this rotation (modulo 16), we will compute the index of the slice of the larger disk which lies behind that window. This computed index, combined with the window’s rank, yields the (rank, slice) pair of a code, which can ultimately be checked against user input.
Now we can understand the ASM fragment we are currently examining. The BYTE array at DS:2056
contains these 16 values:
00 00 01 01 02 03 03 04 05 05 06 06 07 08 09 09
These values are the “slice” halves of the window characteristics; they describe, for each window, that window’s clockwise rotation from the “Ratz” slice of the smaller disk.
The code adds the outer ring index to, and subtracts the inner ring index from, a slice offset taken from this array; this effectively adds the disk’s rotation to the slice offset, and computes the index of the slice of the larger disk which lies behind the verification query’s window. By storing only the low 4 bits of this index, the code performs a modulo 16 operation on the computed index, ensuring it falls between 0 and 15.
That’s probably too much explanation for 21 bytes of code, which is why I prefer programming to writing.
Calculate LUT row and index
13DB:5AD1 8A876620 MOV AL, [BX+2066]
13DB:5AD5 2AE4 SUB AH, AH
13DB:5AD7 B104 MOV CL, 04
13DB:5AD9 D3E0 SHL AX, CL
13DB:5ADB 0146FE ADD [BP-02], AX
Look up the window index in the BYTE array at DS:2066
, multiply the value there by 16, and add the result to the slice index computed by the previous piece of code.
The BYTE array at DS:2066
contains these 16 values:
02 05 00 07 03 01 06 04 00 07 02 04 06 01 03 05
These values are the “rank” halves of the window characteristics; they describe, for each window, the rank of the codes exposed on the larger disk by that window. When a rank is multipled by 16 and added to a slice index, the result is an index into a 16 column table stored in row-major form.
Calculate LUT value
13DB:5ADE 8B5EFE MOV BX, [BP-02]
13DB:5AE1 D1E3 SHL BX, 1
13DB:5AE3 8B877620 MOV AX, [BX+2076]
13DB:5AE7 8946FE MOV [BP-02], AX
Use the index we just computed to retrieve a WORD from the WORD array at DS:2076
, and store it in the WORD at [BP-02]
. The WORD array at DS:2076
begins with these values:
9BD1 23AD 97B7 ...
These don’t have any obvious relationship to the 1st three codewheel codes (115721, 021655, 113667), but let’s see what the rest of the code does …
Input processing – loop setup
13DB:5AEA 2BC0 SUB AX, AX
13DB:5AEC 8946FC MOV [BP-04], AX
13DB:5AEF 8946FA MOV [BP-06], AX
13DB:5AF2 EB03 JMP 5AF7
Initialize a loop: Set the WORDs at [BP-04]
and [BP-06]
to zero, then skip the “increment” step of the loop (at CS:5AF4
) by jumping to the instruction at CS:5AF7
.
Input processing – loop increment
13DB:5AF4 FF46FC INC WORD PTR [BP-04]
The “increment” step of a loop: add 1 to the WORD at [BP-04]
, which is, presumably, the loop counter.
Input processing – loop test
13DB:5AF7 8B5EFC MOV BX, [BP-04]
13DB:5AFA 80BF84693C CMP BYTE PTR [BX+6984], 3C
13DB:5AFF 7412 JZ 5B13
Check if the Ith (where i is the loop counter, the WORD at [BP-04]
) element of the BYTE array at DS:6984
is equal to 0x3C
, or the ASCII character ‘<‘. If it is, exit the loop by jumping to instruction CS:5B13
.
Input processing – process character
13DB:5B01 B103 MOV CL, 03
13DB:5B03 D366FA SHL WORD PTR [BP-06], CL
13DB:5B06 8A878469 MOV AL, [BX+6984]
13DB:5B0A 98 CBW
13DB:5B0B 2D3000 SUB AX, 0030
13DB:5B0E 0146FA ADD [BP-06], AX
Shift the WORD at [BP-06]
left by 3 bits, and then add the difference between the Ith (where i is the loop counter, the WORD at [BP-04]
) element of the BYTE array at DS:6984
and 0x30
, or the ASCII character ‘0’. This has the effect of building up in the WORD at [BP-06] a binary-coded-decimal (BCD) representation of a series of ASCII digits stored at DS:6984
– assuming that no digit is greater than 7, of course.
Experiment reveals (enter “g 13db:5af7
” into DOS DEBUG, then access the PAX terminal, enter some text, and check memory with “d 6984
“) that whatever the user enters at the PAX verification prompt is stored at DS:6984
(followed by a ‘<‘ character), and it can be seen that no code on the PAX codewheel contains an ‘8’ or ‘9’ digit. (Also, no 6-digit PAX code contains a leading digit other than ‘0’ or ‘1’, which makes sense, since a 16-bit WORD has space for only 5 3-bit BCD numbers, and 1 extra bit.)
When the 1st 3 codewheel codes (115721, 021655, 113667) are encoded with this algorithm implemented by this loop, they match the 1st 3 elements of the WORD array at DS:2076
(9BD1 23AD 97B7
).
Input processing – loop!
13DB:5B11 EBE1 JMP 5AF4
Loop back to instruction CS:5AF4
; process the next character.
Delay
13DB:5B13 B81400 MOV AX, 0014
13DB:5B16 50 PUSH AX
13DB:5B17 E816DF CALL 3A30
13DB:5B1A 83C402 ADD SP, +02
Call the function at CS:3A30
with an argument of 0x14
, then delete the argument from the stack. Its purpose is unclear. By setting breakpoints at 13DB:5B17
and 13DB:5B1A
, however, and doing a little “wristwatch benchmarking”, it seems that CS:3A30
generates the delay experienced by the user during PAX verification; it may serve no function other than giving the user the idea that the PAX system is taking a while to perform validation.
Let’s take a moment to contemplate code written specifically to make a 1989-era PC respond more slowly, in order to make a simulated 21st century computer seem more realistic.
Ok, that was fun. Let’s move on.
Cursor positioning
13DB:5B1D B85800 MOV AX, 0058
13DB:5B20 50 PUSH AX
13DB:5B21 50 PUSH AX
13DB:5B22 E88AF0 CALL 4BAF
13DB:5B25 83C404 ADD SP, +04
Position the cursor at character position (0x0B, 0x0B)
.
Validation test
13DB:5B28 8B46FE MOV AX, [BP-02]
13DB:5B2B 3946FA CMP [BP-06], AX
13DB:5B2E 741F JZ 5B4F
Jump to CS:5B4F
iff the value computed from the verification problem’s inner, outer, and window codewheel indexes (stored in the WORD at [BP-02]
) matches that computed based upon user input (stored in the WORD at [BP-06]
).
This code controls whether the function exits successfully, or loops and queries the user again.
Failure message
13DB:5B30 B80200 MOV AX, 0002
13DB:5B33 50 PUSH AX
13DB:5B34 B8A254 MOV AX, 54A2
13DB:5B37 50 PUSH AX
13DB:5B38 E89FF0 CALL 4BDA
13DB:5B3B 83C404 ADD SP, +04
Print the string at DS:54A2
– ” Access denied “.
Mystery function 5
13DB:5B3E B80600 MOV AX, 0006
13DB:5B41 50 PUSH AX
13DB:5B42 9A6E33DB13 CALL 13DB:336E
13DB:5B47 83C402 ADD SP, +02
Call the function at 13DB:336E
with an argument of 6, then delete the argument from the stack. Its purpose is unclear.
Retry
13DB:5B4A E95AFE JMP 59A7
Jump back to CS:59A7
, which is just after the loading of “paxcodes.txh”, and just before the problem is generated or any text is written to the screen. (This jump is taken iff the earlier comparison of the WORDs at [BP-02]
and [BP-06]
failed.)
An orphan
13DB:5B4D EB1A JMP 5B69
Unreachable code!
Success message
13DB:5B4F B80200 MOV AX, 0002
13DB:5B52 50 PUSH AX
13DB:5B53 B8B654 MOV AX, 54B6
13DB:5B56 50 PUSH AX
13DB:5B57 E880F0 CALL 4BDA
13DB:5B5A 83C404 ADD SP, +04
Print the string at DS:54B6
– ” Access allowed “.
Mystery function 5 (redux)
13DB:5B5D B80B00 MOV AX, 000B
13DB:5B60 50 PUSH AX
13DB:5B61 9A6E33DB13 CALL 13DB:336E
13DB:5B66 83C402 ADD SP, +02
Call the function at 13DB:336E
with an argument of 0xB
, then delete the argument from the stack. Its purpose is unclear.
Stack cleanup
13DB:5B69 8BE5 MOV SP, BP
13DB:5B6B 5D POP BP
Restore the caller’s stack frame; set the Stack Pointer equal to this frame’s Base Pointer, then restore the caller’s Base Pointer by popping it off the stack.
Return (at last!)
13DB:5B6C C3 RET
Return to caller.
Conclusions
What can we learn from this exercise? Well, even though we were working with an executable targeted to an older, simpler platform (a real-mode x86 processor running DOS), I think there are a few general lessons:
- Executables can be read just like “source code”. They’re just written in a very unfriendly language. Perhaps only Perl is harder to read.
- The first step in reading an executable is “mapping” it; finding the parts which implement the features you’re interested in.
- Calls to known interrupts and system code can help you identify the parts of a program which are performing certain functions.
- Data references are even more helpful, but require some understanding of how the program behaves. There is a small chicken-and-egg problem with using data references: you can’t find good places to break the program’s execution by searching for data references until you know a little bit about how the program organizes its data, and you can’t investigate run-time data organization until you can set good breakpoints. Resolve this by beginning with breakpoints on interrupts and system calls.
- Better debugging tools (and better debugging support from the CPU) can make the process of understanding an executable much easier, but executables can be read using only the most primitive of tools.
- If you’re stuck, making some guesses is always a good idea.
- When trying to make sense of machine/assembly instructions, an understanding of the problem domain, and an eye for context, can be very helpful.
- The process of understanding an executable is really just a matter of persistance, care, and diligence. They’ve given you the code; you just have to read it. (May not apply to exectuables for systems with DRM hardware, which you ought to avoid for that reason.)
…and last, but not least:
- Neuromancer’s code wheel values are simply stored in a table, not derived from a formula computed at run-time. Unsurprising, perhaps, but vaguely disappointing to me.