Disassembly DIY

The following sections describe how to use SkoolKit to get started on your own Spectrum game disassembly.

Getting started

The first thing to do is select a Spectrum game to disassemble. For the purpose of this discussion, we’ll use Manic Miner (the original Bug Byte version). Grab a copy of it, load it in an emulator, and save a Z80 snapshot called mm.z80 in the directory containing SkoolKit.

The next thing to do is create a skool file from this snapshot. Run the following command from the SkoolKit directory:

$ ./sna2skool.py mm.z80 > mm.skool

Now take a look at mm.skool. As you can see, by default, sna2skool.py disassembles everything from 16384 to 65535, treating it all as code. Needless to say, this is not particularly useful - unless you have no idea where the code and data blocks are yet, and want to use this disassembly to find out.

Once you have figured out where the code and data blocks are, it would be handy if you could supply sna2skool.py with this information, so that it can disassemble the blocks accordingly. That is where the control file comes in.

The control file

A control file contains a list of start addresses of code and data blocks. Each address is marked with a ‘control directive’, which is a single letter that indicates what the block contains:

  • b indicates a data block
  • c indicates a code block
  • g indicates a game status buffer entry
  • i indicates a block that should be ignored
  • t indicates a block containing text
  • u indicates an unused block of memory
  • w indicates a block containing words (two-byte values)
  • z indicates an unused block containing all zeroes

(If these letters remind you of the valid characters that may appear in the first column of each line of a skool file, that is no coincidence.)

For example:

c 24576 Do stuff
b 24832 Important data
t 25088 Interesting messages
u 25344 Unused

This control file declares that:

  • Everything before 24576 should be ignored
  • There is a routine at 24576-24831 which should be titled ‘Do stuff’
  • There is data at 24832-25087
  • There is text at 25088-25343
  • Everything from 25344 onwards is unused (but should still be disassembled as data)

Addresses may be written as hexadecimal numbers, too; the equivalent example control file using hexadecimal notation would be:

c $6000 Do stuff
b $6100 Important data
t $6200 Interesting messages
u $6300 Unused

A skeleton disassembly

So if we had a control file for Manic Miner, we could produce a much more useful skool file. As it happens, SkoolKit includes one: manic_miner.ctl. You can use it with sna2skool.py thus:

$ ./sna2skool.py -c src/manic_miner.ctl mm.z80 > mm.skool

This time, mm.skool is split up into meaningful blocks, with code as code, data as data (DEFBs), and text as text (DEFMs). Much nicer.

By default, sna2skool.py produces a disassembly with addresses and instruction operands in decimal notation. If you prefer to work in hexadecimal, however, use the -H option:

$ ./sna2skool.py -H -c src/manic_miner.ctl mm.z80 > mm.skool

The next step is to create an HTML disassembly from this skool file:

$ ./skool2html.py -f mm.skool html

Now open html/mm/index.html in a web browser. There’s not much there, but it’s a base from which you can start adding explanatory comments.

To replace the ‘mm’ in the page titles, we need to give the game a name. This can be done by creating a ref file called mm.ref that contains the following lines:

[Game]
Game=Manic Miner

Then run skool2html.py again to re-generate the HTML. Alternatively, you could create a game logo image (in PNG format) and copy it to html/mm/images/logo.png; the image will be used instead of the game name if it is present.

See Ref files for more information on the sections that may appear in a ref file.

Generating a control file

If you are planning to create a disassembly of some game other than Manic Miner, you will need to create your own control file. To get started, you can use the -g option with sna2skool.py to perform a rudimentary static code analysis of the snapshot file and generate a corresponding control file:

$ ./sna2skool.py -g game.ctl game.z80 > game.skool

This will do a reasonable job of splitting the snapshot into blocks, but you will need to examine the resultant skool file (game.skool in this case) to see which blocks should be marked as text or data instead of code, and then edit the generated control file (game.ctl) accordingly.

By default, sna2skool.py generates a control file and a skool file with addresses and instruction operands in decimal notation. If you prefer to work in hexadecimal, however, use the -h option to produce a hexadecimal control file, and the -H option to produce a hexadecimal skool file:

$ ./sna2skool.py -h -H -g game.ctl game.z80 > game.skool

Blocks whose contents resemble text are given a title like this:

Routine/text? at 26836

Blocks whose contents resemble neither code nor text are given a title like this:

Routine/data? at 26624

Any other blocks are assumed to contain code and are given a title like this:

Routine at 24576

Developing the skool file

When you’re happy that your control file does a decent job of distinguishing the code blocks from the data blocks in your memory snapshot, it’s time to start work on the skool file.

Figuring out what the code blocks do and what the data blocks contain can be a time-consuming job. It’s probably not a good idea to go through each block one by one, in order, and move to the next only when it’s fully documented - unless you’re looking for a nervous breakdown. Instead it’s better to approach the job like this:

  1. Skim the code blocks for any code whose purpose is familiar or obvious, such as drawing something on the screen, or producing a sound effect.
  2. Document that code (and any related data) as far as possible.
  3. Find another code block that calls the code block just documented, and figure out when, why and how it uses it.
  4. Document that code (and any related data) as far as possible.
  5. If there’s anything left to document, return to step 3.
  6. Done!

It also goes without saying that figuring out what a piece of code or data might be used for is easier if you’ve played the game to death already.

Annotating the code and data in a skool file is done by adding comments just as you would in a regular ASM file. For example, you might add a comment to the instruction at 35136 in mm.skool thus:

 35136 DEC (HL)      ; Decrement the number of lives

See the skool file format reference for a full description of the kinds of annotations that are supported in skool files. Note also that SkoolKit supports many skool macros that can be used in comments and will be converted into hyperlinks and images (for example) in the HTML version of the disassembly.

As you become more familiar with the layout of the code and data blocks in the disassembly, you may find that some blocks need to be split up, joined, or otherwise reorganised. You could do this manually in the skool file itself, or you could regenerate the skool file from a new control file. To ensure that you don’t lose all the annotations you’ve already added to the skool file, though, you should use skool2ctl.py to preserve them.

First, create a control file that keeps your annotations intact:

$ ./skool2ctl.py game.skool > game-2.ctl

Now edit game-2.ctl to fit your better understanding of the layout of the code and data blocks. Then generate a new skool file:

$ ./sna2skool.py -c game-2.ctl game.z80 > game-2.skool

This new skool file, game-2.skool, should contain your reorganised code and data blocks, and all the annotations you carefully added to game.skool.

skool2ctl.py preserves annotations by using an extended control file syntax, which is explained in the following section.

Extended control file syntax

Besides the declaration of block types, addresses and titles, the control file syntax also supports the declaration of the following things:

  • Block descriptions
  • Register values
  • Mid-block comments
  • Block end comments
  • Sub-block types and comments
  • DEFB statement lengths in a ‘B’ sub-block

Block descriptions

To provide a description for a code block at 24576 (for example), use the D directive thus:

c 24576 This is the title of the routine at 24576
D 24576 This is the description of the routine at 24576.

Register values

To declare the values of the registers upon entry to the routine at 24576, add one line per register with the R directive thus:

R 24576 A An important value in the accumulator
R 24576 DE Display file address

Mid-block comments

To declare a mid-block comment that will appear above the instruction at 24592, use the D directive thus:

D 24592 The next section of code does something really important.

Block end comments

To declare a comment that will appear at the end of the routine at 24576, use the E directive thus:

E 24576 And so the work of this routine is done.

Sub-block syntax

Sometimes a block marked as one type (code, data, text, or whatever) may contain instructions or statements of another type. For example, a word (w) block may contain the odd non-word here and there. To declare such sub-blocks whose type does not match that of the containing block, use the following syntax:

w 32768 A block containing mostly words
B 32800,3 But here's a sub-block of 3 bytes at 32800
T 32809,8 And an 8-byte text string at 32809
C 32821,10 And 10 bytes of code at 32821 too?

The directives (B, T and C) used here to mark the sub-blocks are the upper case equivalents of the directives used to mark top-level blocks (b, t and c). The comments at the end of these sub-block declarations are taken as instruction-level comments and will appear as such in the resultant skool file.

If an instruction-level comment spans a group of two or more sub-blocks of different types, it must be declared with an M directive:

M 40000,21 This comment covers the following 3 sub-blocks
B 40000,3
W 40003,10
T 40013,8

If the length parameter is omitted from an M directive, the comment is assumed to cover all sub-blocks from the given start address to the end of the top-level block.

Three bits of sub-block syntax left. First, the blank sub-block directive:

c 24576 A great routine
  24580,11 A great section of code at 24580

This is equivalent to:

c 24576 A great routine
C 24580,11 A great section of code at 24580

That is, the the type of a blank sub-block directive is taken to be the same as that of the parent block.

Next, the address range:

c 24576 A great routine
  24580-24590 A great section of code at 24580

This is equivalent to:

c 24576 A great routine
  24580,11 A great section of code at 24580

That is, you can specify the extent of a sub-block using either an address range, or an address and a length.

Finally, the implicit sub-block extent:

c 24576 A great routine
  24580 A great section of code at 24580
  24588,10 Another great section of code at 24590

This is equivalent to:

c 24576 A great routine
  24580,8 A great section of code at 24580
  24588,10 Another great section of code at 24588

But the declaration of the length (8) of the sub-block at 24580 is redundant, because the sub-block is implicitly terminated by the declaration of the sub-block at 24588 that follows. This is exactly how top-level block declarations work: each top-level block is implicitly terminated by the declaration of the next one.

DEFB statement lengths in a ‘B’ sub-block

Normally, a B sub-block declared thus:

B 24580,12 Interesting data

would result in something like this in the corresponding skool file:

24580 DEFB 1,2,3,4,5,6,7,8 ; {Interesting data
24588 DEFB 9,10,11,12      ; }

But what if you wanted to split the data in this sub-block into groups of 3 bytes each? That can be achieved with:

B 24580,12,3 Interesting data

which would give:

24580 DEFB 1,2,3    ; {Interesting data
24583 DEFB 4,5,6
24586 DEFB 7,8,9
24589 DEFB 10,11,12 ; }

That is, in a B directive, the desired DEFB statement lengths may be given as a comma-separated list of numbers following the sub-block length parameter, and the final number in the list is used for all remaining data in the block. So, for example:

B 24580,12,1,2,3 Interesting data

would give:

24580 DEFB 1        ; {Interesting data
24581 DEFB 2,3
24583 DEFB 4,5,6
24586 DEFB 7,8,9
24589 DEFB 10,11,12 ; }

Adding pokes, bugs and trivia

Adding ‘Pokes’, ‘Bugs’, and ‘Trivia’ pages to a disassembly is done by adding Poke, Bug, and Fact sections to the ref file. For any such sections that are present, skool2html.py will add links to the disassembly index page.

For example, let’s add a poke. Add the following lines to mm.ref:

[Poke:infiniteLives:Infinite lives]
The following POKE gives Miner Willy infinite lives:

POKE 35136,0

Now run skool2html.py again:

$ ./skool2html.py -f mm.skool html

Open html/mm/index.html and you should see a link to the ‘Pokes’ page in the ‘Reference’ section.

The format of a Bug or Fact section is the same, except that the section name prefix is Bug: or Fact: (instead of Poke:) as appropriate.

One Poke, Bug or Fact section should be added for each poke, bug or trivia item to be documented. Entries will appear on the ‘Pokes’, ‘Bugs’ or ‘Trivia’ page in the same order as the sections appear in the ref file.

See Ref files for more information on the format of the Poke, Bug, and Fact (and other) sections that may appear in a ref file.