SkillAgentSearch skills...

Dosmc

C compiler driver to produce tiny DOS .exe and .com executables

Install / Use

/learn @pts/Dosmc
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

dosmc: C compiler and assembler to produce tiny DOS .exe and .com executables ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dosmc is a C compiler, assembler, linker and librarian for producing tiny DOS .exe and .com executables for the 8086 (16-bit) architecture. It contains and uses the wcc C compiler in OpenWatcom V2 and also NASM, and it has its own C library (libc) and custom optimizing linker for tiny executable output.

Download on Linux and macOS:

$ git clone --depth 1 https://github.com/pts/dosmc $ cd dosmc $ ./dosmc --prepare # Download executables, set up Docker image if needed.

The --perpare command above also compiles the C library (libc) to dosmc.dir/dosmc.lib from its sources in dosmclib/ .

Alternatively, if you don't have Git installed, you can download and extract https://github.com/pts/dosmc/archive/master.zip instead.

Usage:

$ ./dosmc examples/prog.c # Creates examples/prog.exe .

$ ./dosmc -mt examples/prog.c # Creates examples/prog.com .

!! To try it, run dosbox examples' (without the quotes), and within the DOSBox window, run prog.exe or prog.com . The expected output is ZYfghiHello!' (without the quotes).

dosmc is an acronym for Deterministic Optimizing Small Model Compiler, where ``small model'' signifies the 16-bit pointer size and the resulting 64 KiB memory limits (of the executable). The prefix DOS also refers to the target system (MS-DOS and compatible, including DOSBox and FreeDOS).

dosmc is a cross-compiler: you can run it on a modern (32-bit or 64-bit) host system to produce 16-bit DOS executables.

If you want to write tiny DOS .exe and .com executables in assembly instead, see http://github.com/pts/pts-nasm-fullprog

If you want to write tiny Linux i386 executables in C instead, see http://github.com/pts/pts-xtiny

dosmc limitations:

  • Host build system must be Linux i386, Linux amd64 or macOS. On macOS, Docker needs to be installed first. (It's possible to make it work on other Unix systems on which wcc is available.) Porting to Windows (Win32) is underday, proof-of-concept compilation already works. Porting to FreeBSD should be easy (with Linux compatibility `kldload linux'). Porting to DOS (32-bit, with DOS extenders) may work, but we need Perl first: https://perldoc.perl.org/perldos.html , also Perl 5.8.8 has been ported: https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.2/repos/pkg-html/perl.html . Other host systems are unlikely to work, because OpenWatcom hasn't been ported to them.
  • It depends on Perl (standard packages only).
  • It depends on the wcc C compiler in OpenWatcom V2.
  • Target is DOS 8086 (16-bit) .exe or DOS 8086 (16-bit) .com.
  • Only 2 memory models are supported: tiny for .com executables (maximum size of code + data + stack is ~63 KiB), and small for .exe executables (maximum size of code is ~64 KiB, maximum size of data + stack is ~64 KiB).
  • The supplied C library (libc) is a bit limited, it contains functions for unbuffered file I/O (e.g. open(), read(), write(), lseek(), close()), string manipulation (e.g. strcmp()), character classes (e.g. isspace()) and some control (e.g. exit()). It doesn't contain printf() or malloc(). For most additional functionality, inline assembly with DOS calls (int 21h) should be used.
  • There is no convenient way yet to get the command-line arguments and the environment.
  • There is no stack overflow detector.
  • It can't generate debug info.
  • There is no convenient way to use more than 64 KiB of data, because the C library doesn't have functions which take far pointers.
  • It doesn't support code longer than 64 KiB.
  • It doesn't support 32-bit (i386) code or DOS extenders.
  • It's not possible to run the compilation on DOS yet. To make it happen, the dosmc shell script (and its substantial Perl code for linking) has to be rewritten in C, and the DOS version of wcc.exe from OpenWatcom V2 (uses the DOS extender DOS/4GW) can be used.
  • malloc() or dynamic memory allocation isn't provided, you have to preallocate global arrays to emulate it.
  • Dynamic linking (.dll, .so, shared libraries) is not possible. This is an OpenWatcom limitation for DOS targets.

dosmc advantages over wcc and owcc in OpenWatcom:

  • dosmc generates a tiny .exe header, without explicit relocations.
  • dosmc doesn't add several KiB of C library bloat.
  • dosmc doesn't align data to word bounary, thus the executable becomes smaller.
  • dosmc uses the wcc command-line flags to generate small output by default.

It's possible to write inline assembly snippets in your C code using #pragma aux (see dosmc.h for examples) and `__asm { ... }'. However, it's not possible to write entire functions in assembly, because there is no syntax for that in the OpenWatcom C language. Alternatively, you can use entire .asm files as sources (see some in the examples/ directory), in either NASM or WASM syntax.

Source file formats:

  • If the extension is .c, then the bundled wcc (OpenWatcom C compiler) is used to create the .obj file (in OMF format).
  • If the extension is .nasm, then the bundled NASM 0.99.06 is used to create the .obj file. NASM is recommended or WASM for writing assembly code, because of the versatily and the clean syntax. dosmc also provides some convenience macros (e.g. __LINKER_FLAG) and defaults, see how compact examples/helloc.nasm is. (Also compare examples/helloc2.nasm to examples/helloc2w.wasm for compactness.) It's also possible to write your program in assembly only (no .c code), and use dosmc to compile it to .com or .exe, see examples/com0o1.nasm and examples/helloc.nasm for examples.
  • If the extension is .wasm, then the bundled WASM (OpenWatcom assembler) is used to create the .obj file. Convenience macros are not provided. It's also possible to write your program in assembly only (no .c code), and use dosmc to compile it to .com or .exe, see examples/com0o2.wasm for an example.
  • If the extension is .asm, then dosmc looks at the first directive in the file and autodetects it as .nasm or .wasm.
  • If the extension is .obj, then the file is used as is for linking. The file format is DOS OMF .obj. Typical sources of .obj files: output of wcc (e.g. dosmc -c file.c), output of NASM (e.g. dosmc -c file.nasm), output of WASM (e.g. dosmc -c file.wasm), output of other assemblers (e.g. see examples/helloc2a.asm for MASM, TASM and A86; see examples/helloc2l.asm for LZASM). Most modern assemblers (e.g. YASM and FASM) can't create OMF .obj files, thus are incompatible with dosmc. NBASM uses a differnet sytnax, and we didn't managed to make it produce an .obj file, starting from examples/helloc2a.asm.
  • If the extension is .lib, then the .obj modules stored in the specified static library are used as is for linking. `dosmc -cl' can be used to create a .lib file. .lib files created by other compilers and linkers will probably not work with dosmc. A .lib file is a concatenation of .obj files, with an extra header.

Program entry points for dosmc (choose any):

  • void _start(void) { ... }. Calling exit(0) in the end is optional. Command-line arguments are not parsed or passed. To get the least amount of file size overhead, use _start, use -mt if possible (to generate a .com file), make _start the very first function in the .c file (possibly predeclaring other functions), and have no global variables without initial value (in segment _BSS).
  • int main(void) { ... }. Return exit code (0 means success). Command-line arguments are not parsed or passed.
  • int main(int argc, char **argv) { ... }. Return exit code (0 means success). DOS supports a command-line up to 127 bytes (excluding argv[0], the program name). When parsing this, the dosmc C library splits on spaces and tab, ignoring quotes and backslashes. This adds 114 bytes of argv parsing code. If you don't need argc or argv, use _start to make the executable smaller.

Global variables without initial value (e.g. `int myvar;') (in segment _BSS) are auto-initialized to 0, stack isn't initialized.

What is the minimum executable file size dosmc can produce?

  • For .com output, the theoretical minimum is 1 byte (`ret' instruction), and dosmc produces it for examples/exit0.c and examples/empty_start.c.
  • For .exe output, the theoretical minimum is 28 bytes, because DOSBox refuses to load an .exe (without an error message) if it's shorter than 28 bytes. The .exe header is 28 bytes, but the last 4 bytes are not used if there aren't any relocations. The shortest 8086 code to exit (for .exe files) is 5 bytes, so the minimum is 29 bytes, and dosmc produces it for examples/exit0.c, examples/exit42.c and examples/empty_start.c. It's possible to put the 5 bytes of code to the middle of the 28-byte .exe header at the expense of using 317 KiB of conventional memory, but dosmc doesn't waste that much.

How much overhead does dosmc add?

  • For .com output, the overhead can be as low as 0 bytes, see examples/exit0.c, examples/exit42.c, examples/empty_start.c, examples/hello.c . For examples/hello.c, the output .com file is just 26 bytes, 2 bytes more (because of push dx' and pop dx') than hand-optimized assembly.
  • For .exe output, the overhead can be as low as 34 bytes (including the mandatory .exe header of 28 bytes). By some additional code mangling at link time to avoid the call _start_' and the ret', the 34 bytes could be decreased to 30 bytes.

The .com, .exe, .lib and .bin output files are deterministic (i.e. you get the same output file if you compile the same input files again), but .obj output isn't, because there is a timestamp in .obj files created by wcc (.c source) and WASM (.wasm and maybe .asm source).

dosmc has a optimizing linker: if it encouters an .obj file which doesn't define any symbols which are currently undefined, then it skips the entire .obj file. If there are unde

Related Skills

View on GitHub
GitHub Stars27
CategoryDevelopment
Updated1mo ago
Forks4

Languages

Perl

Security Score

75/100

Audited on Feb 12, 2026

No findings