Reconstucted source code for Duke Nukem II

This is a reconstruction of the source code of the game Duke Nukem II (Apogee Software, 1993), based on disassembly of the original executable. It compiles with era-appropriate toolchains and produces a binary that's 100 % identical to the original one (see Versions of the game).

screenshot of the Borland C++ IDE from 1991 showing some of the source code

This project is meant for documentation and preservation, and as a way to create modified versions targeting DOS (source mods). If you're looking for a way to play the game on modern systems ("source port"), check out RigelEngine.

Background

In order to create RigelEngine, I spent countless hours dissecting the original game's assembly code in IDA Pro. But RigelEngine's code is very different from what the original might have looked like, and many of the low-level systems aren't represented at all since they have been replaced with modern APIs in RigelEngine.

I think it's worthwhile to also preserve the original code as closely as possible, since - to me at least - it's really interesting, educational, and fascinating. With my reverse-engineering work, I had already done the hardest part of recreating the code, it just wasn't in a shape that would be easily accessible/readable to other people: The results were captured in handwritten notes, cryptic pseudocode, comments inside my IDA Pro project, and knowledge in my head.

So this is where this project comes in: Thoroughly commented C & Assembly code, which can actually be compiled into a functional DOS executable using an era-appropriate toolchain.

Acknowledgements

This project owes a lot to Scott Smitelli's excellent Cosmore project. Duke Nukem II is an evolution of the Cosmo game engine, sharing many file formats and pieces of code. Because of this, Cosmore helped me get a headstart on my own recreation project and made certain things significantly easier to figure out.

How close to the original code is this?

TL;DR: The semantics and to some extent also syntax are identical, but the code organization, formatting, and naming of identifiers are completely made up by me.

We know that the semantics of the code are 100% accurate, since the executable built from the source code is byte-for-byte identical to the original binary after LZEXE compression. But how about other aspects of the source? Not everything survives the compilation process, so a decompilation can only approximate the original code to some extent. A lot more can be determined from the assembly code than one might expect at first, though. The game was built using a more simplistic compiler compared to what we have nowadays, and not all optimizations were enabled. Thus, the translation from C code to assembly is very direct, allowing a pretty accurate reconstruction of the constructs used. But of course, there are still limits. In cetain cases, multiple different variations on the same construct produce identical assembly, for example.

Overall, the following aspects are preserved in the compiled machine code:

Control structures like loops, conditions, and switch statements
Order of the branches in if/else statements, ternaries etc.
Order of cases in switch statements
Order of local variable declarations in functions
Size of variables (8-bit, 16-bit, 32-bit)
To some extent, signedness of variables
Order in which functions are defined
Order in which initialized global variables are declared
Division of the code into translation units
Compiler flags, to some extent

The following aspects aren't preserved, and can only be guessed at:

Comments
File names
Formatting, use of optional parantheses etc.
Identifiers like variable, type, and function names
Type aliases
Whether variables are global or function-local static
Usage of enums vs. individual constants (#defines)
Anything involving the preprocessor: macros, conditionally compiled code, header files, etc.
Whether code was written by hand or generated by a tool
Project setup (did it use Makefiles or an IDE project, etc.)
Exact location of global variables (i.e., in which translation unit they are declared)

Structure of the code

The project is structured very similarly to Cosmo (see Cosmore). The bulk of the code is arranged into only two very big translation units, called UNIT1.C and UNIT2.C here (in Cosmore, the files are called game1.c and game2.c). Scott Smitelli's theory is that this was done because the compiler generates slightly more efficient near call instructions for functions in the same translation unit, whereas calling into a separate unit incurs the overhead of a far call.

Also like in Cosmo, a set of low-level assembly routines is included into the C runtime startup code section, by manually assembling and linking a modified version of the startup code (C0.ASM) which has these additional routines included.

Unlike Cosmo, Duke 2 has two additional translation units with sound playback code, at least one of which was almost certainly provided as a compiled library, not as source code. Perhaps both of them were libraries, but it's not possible to tell definitely from the machine code for the other one.

We don't know how exactly the original code was organized at the source level, it might have actually been two very big files, or it might have been split up into multiple files using the Unity build technique.

Since it's impossible to know the original organization, I decided to focus on readability and split the code into multiple files via preprocessor includes. I tried to group related functions together as best as possible, but this wasn't always easy - the original order of functions is often a bit random, with a few completely unrelated functions sometimes appearing in the middle of a larger block of functions that clearly belong together.

To keep the original order intact while still achieving some level of organization, I therefore split up some pieces of code into multiple numbered files like ui1.c, ui2.c etc., so that they can be included in the right places with other unrelated functions appearing in-between.

Useful context & background knowledge

To fully understand the code, it's useful to have some understanding of DOS programming, especially the memory model and the extensions to the C language necessary to support it:

Knowledge of x86 Assembly is also helpful, since there is a bit of inline Assembly sprinkled throughout the code, and some of the low-level graphics routines are completely written in Assembly. This isn't a necessity though, the vast majority of the code is C and I've also tried my best to document the Assembly code thoroughly.

On top of that, the game also directly interfaces with various hardware components, often using Port I/O. This is sometimes done using inline Assembly, but mostly using helper functions.

This was very common in DOS games. DOS itself only offered very limited access to the hardware (primarily filesystem and text mode output). Another option was the BIOS, which provided some additional functionality. But often, the only way to get graphics and sound at high performance was to talk to the hardware directly. As a consequence, the code features a fair bit of hardware interaction, which would normally be implemented in drivers and abstracted behind OS APIs on a modern system. Most prominent is the use of the VGA/EGA interface to display images on screen, followed by the SoundBlaster and AdLib interfaces for sound output. Other hardware components used are generally standard features of any IBM PC (or compatible), like the PC Speaker, keyboard controller, hardware timer, and Joystick port (gameport).

Some relevant resources:

Fabian Sanglard's Game Engine Black Book: Wolfenstein 3D also covers many of these topics in a very easy to understand manner.

Naming conventions

The code makes heavy use of global variables. To help distinguish these from local variables, and generally improve readability, most of them are "namespaced" using short prefixes, like ui, hud, sys (for "system") etc.

Some of the perhaps less obvious prefixes are gm for "game state", and pl for "player state".

Functions are generally not namespaced in this way, with some exceptions:

MM is the memory manager
SB is the SoundBlaster digital audio library
Map is for code that interacts with the world map data during gameplay

Comments

Some comments are prefixed with a tag: [NOTE], [BUG], [HACK], [PERF] for 'performance', or [UNSAFE]. Comments without a tag are explaining the code itself: What it does, how, and why, giving nec

Duke2Reconstructed

Install / Use

README