You are on page 1of 24

OS/2 Museum

OS/2, vintage PC computing, and


random musings

DOS 2.11 From Scratch


Posted on September 21, 2021 by Michal Necasek

Warning: Long post!

After having good luck with rebuilding core PC DOS 1.1 from source code, I thought I’d do the same with the DOS
2.11 source code released by the CHM. What follows is largely a collection of notes that I wrote down while
banging the released source code into shape. That turned out to be a lot harder than building DOS 1.1 for two
reasons.

One is that the released DOS 2.11 source code is a lot more extensive and includes source code for numerous
utilities (CHKDSK, DEBUG, EDLIN, SYS, etc.). The other, bigger reason is that the CHM unfortunately created a
bit of a mess when releasing the code and sorting out the pieces was not trivial.

Microsoft style DOS 2.11 boot


The CHM placed all DOS 2.x related files in just two directories, ‘v20object’ and ‘v20source’. It is now clear that
the files came from at least three distinct sources:

MS-DOS 2.00 OEM distribution disks


MS-DOS 2.11 source code of unknown provenance
Miscellaneous debris such as WordStar 3.20 overlay files

Fortunately for me, Jeff Parsons has done a lot of legwork reconstructing the DOS 2.0 OEM distribution disks.
These disks were clearly an early version of what Microsoft later called OAK (OEM Adaptation Kit). The disks
contain generic DOS 2.0 binaries, with the notable exception of IO.SYS which had to be supplied by the OEM.
There is example “skeletal” IO.SYS source, together with source for PRINT.COM that OEMs might modify, and
example OEM source module for FORMAT.COM which OEMs had to write.

There are also development tools (MASM and CREF) on the disks, together with LINK which is part of the DOS
2.0 distribution binaries and was meant to be shipped to end users. And last but not least, there’s fairly extensive
documentation that was meant to aid OEMs in adapting DOS 2.0 to their hardware.

As far as can be ascertained, the CHM released complete and unmodified (except for timestamps perhaps)
contents of the DOS 2.00 OEM distribution disks, with most but not all of the files stored in the v20object
directory. So far so good.

The easy part is the debris, the files WSBAUD.BAS, WSMSGS.OVR, and WSOVLY1.OVR. Those just don’t belong
and are part of WordStar. There’s also an odd collection of .txt files in the v20source directory; those are exact
copies of .DOC files in the v20object directory. The likely intention was to not hurt Windows’ feelings by
presenting it with the difficult task of recognizing that files with the .DOC extension are plain ASCII files. At any
rate, the .txt files are completely redundant.

The hard part is the actual MS-DOS 2.11 source files. First of all, there are several “duplicates”, for example
DOSMAC.ASM and DOSMAC_v211.ASM. CHM never explained what that’s about, but it’s apparent that
DOSMAC.ASM is an older version from the MS-DOS 2.00 OEM distribution disks, while DOSMAC_v2.11.ASM is
a newer DOS 2.11 file which belongs with the rest of the source code.

Everything is lumped into a single directory, with no hint how to build anything, except for two files named
DOSLINK and COMLINK. Those are clearly LINK input files which show how to link IBMBIO.COM/MSDOS.SYS
and COMMAND.COM. Those are very extremely useful because they show exactly what goes into the two largest
DOS 2.11 components and which order the object files are linked in (which naturally matters).

Where Did This Come From?

The ever unreliable Wikipedia claims that “Microsoft made the code to […] a mixture of Altos MS-DOS 2.11 and
TeleVideo PC DOS 2.11 available to the public”, which is at best half right, and probably not even that. The part
about TeleVideo Personal Computer DOS 2.11 is not wrong but it may be misleading (more on that below), and the
Altos bit is a major misunderstanding. The DOS 2.0 OEM disks came with a file called SKELIO.ASM, which is
titled “IO.SYS for the ALTOS ACS-86C.” — in other words, Microsoft provided source code to IO.SYS for Altos
ACS-86C machines as an example of IO.SYS implementation, something that OEMs needed to adapt to their own
hardware. This was DOS 2.0, not 2.11, and the OEM distribution disks were obviously not at all OEM specific.

The bulk of the ‘v20source’ files is indeed the source code to DOS 2.11 which had something to do with TeleVideo.
It is apparent that in late 1983, the IBM PC was important enough that TeleVideo wanted their COMMAND.COM
built with IBMVER set true and MSVER set false—two macros controlling conditional compilation.

The catch is that a COMMAND.COM built with ‘IBMVER’ set to true would by default print an IBM copyright
message starting with “The IBM Personal Computer DOS”. TeleVideo clearly couldn’t use that and the source files
(TDATA.ASM and UINIT.ASM) were modified to say TeleVideo instead of IBM.

Sadly, this was done in the era before the PC/AT, when most PCs had no real-time clock. And so we have most of
the COMMAND.COM source file dated 08/18/1983, but the ones obviously modified for TeleVideo are dated
01/01/1980 because of course the programmer responsible did not bother setting the date when prompted to do
so at system start-up.

Now, it would be tempting to assume that all files dated 01/01/1980 must have been modified for TeleVideo.
Sadly, a single look at the MS-DOS 2.00 OEM distribution disks suffices to convince us that Microsoft was sloppy
and some (only some!) of the source files distributed by Microsoft were likewise dated 01/01/1980.

The upshot is that there’s no easy way to tell which files might have been modified for TeleVideo based on the
timestamp alone. It is obvious some source files were modified but not how many. That’s assuming the files with
1983 timestamps are unmodified, which is of course not a given.
It’s also not clear who modified the files. The only files with obvious TeleVideo modifications are part of
COMMAND.COM, which normally wouldn’t need to be modified by OEMs. Maybe TeleVideo had the
COMMAND.COM source code, but it’s also possible that Microsoft simply modified a couple of strings and built a
custom COMMAND.COM for TeleVideo without ever giving anyone the source code.

The latter is in fact quite likely for one simple reason: The source code released by the CHM contains no TeleVideo
hardware specific code. There’s no IO.SYS source code, no OEM module for FORMAT, no SYS adaptations,
nothing. There’s a lot of source code for utilities that OEMs could not normally modify (CHKDSK, EDLIN,
DEBUG) and none of the source code that OEMs would need to write. It is therefore likely that the code came
from some Microsoft archive and the TeleVideo connection is a bit of a a red herring.

In summary, the files published by the CHM are a mixture of MS-DOS 2.00 OEM distribution disks and generic
DOS 2.11 source code with COMMAND.COM modified to say TeleVideo instead of IBM.

Binary Comparisons

Although we know very little about the precise origin and lineage of the released DOS 2.11 source code, we can
guess a thing or two if we compare it against extant OEM DOS 2.x distributions. The obvious target is a disk with
TeleVideo MS-DOS 2.11 that’s floating around, although someone unfortunately managed to lose its IO.SYS and
MSDOS.SYS.

Naturally before comparing anything, the source code needs to be built. Which is not totally trivial, but can be
done. More on that below.

CHM vs. TeleVideo

Let’s see how the rebuilt DOS 2.11 source code provided by the CHM compares with the lone TeleVideo DOS 2.11
disk.

First of all, note that .EXE files tend to exhibit a 4-byte difference in the header at offsets hex 12-13 and 1C-1D. The
word at offsets 12h is a checksum, but there’s no agreement on what the word (or dword) at offset 1Ch actually is.
Some claim it is “overlay information”, others indicate that it has varying uses in practice. Microsoft used to say
that it is “offset of symbol table file”. At any rate, this has no bearing on the functionality of DOS 2.0 EXE files.
After some experimentation, I was able to confirm that even the same version of LINK working with the same
input files on the same machine produces different executable files depending on where in memory it is loaded
etc. Most likely the word at offset 1Ch is essentially uninitialized data, but it is also reflected in the checksum at
offset 12h. At any rate, this four-byte difference can be considered a linker artifact and can be safely ignored when
comparing .EXE files.

Back to the actual comparisons:

CHKDSK.COM, COMMAND.COM, DEBUG.COM, EXE2BIN.EXE, FC.EXE, FIND.EXE, MORE.COM, and


RECOVER.COM all match TeleVideo binaries
EDLIN.ASM has ‘roprot equ true’ even though it’s false in EDLPROC.ASM; setting ‘roprot’ to false in both files
produces EDLIN.COM matching TeleVideo’s
DISKCOPY.COM is different, TeleVideo clearly wrote their own
PRINT.COM and SORT.EXE do not match but are somewhat similar
SYS.COM/FORMAT.COM are very different, TeleVideo clearly wrote their own

Interestingly, PRINT.COM built from the CHM source code is almost identical to the one shipped with Compaq
DOS 2.11. There is a one byte difference — the DOS version check in the CHM source code (DOSVER_HIGH) is
for version 2.11, while Compaq checks for 2.10.

RECOVER.COM on the same Compaq DOS 2.11 disk has the string ‘Vers 1.51’ in it, while the CHM-provided
RECOVER.COM has ‘Vers 1.50’, suggesting that Compaq used slightly newer DOS as a basis. The timestamps also
suggest as much, since the newest source file provided by the CHM is dated 10/20/1983, while Compaq DOS 2.11
has files dated 5/30/1984, about 7 months newer.

EXEFIX

The SORT.EXE utility built from source exhibits additional differences in the EXE header when compared with
the TeleVideo version, even when the actual code/data within the file matches exactly. Apart from the above
mentioned differences at offsets hex 12-13 and 1C-1D which are irrelevant, there are also differences in the
minimum and maximum paragraph allocation fields (hex 0A-0B and 0C-0D). The MS-DOS 3.3 OAK reveals that
after it’s built, SORT.EXE is processed with the EXEFIX utility included in the OAK. This sets the minimum and
maximum paragraph allocation in the EXE header to 1.
Clearly the SORT.EXE utility that at least some OEMs shipped with MS-DOS 2.11 was run through the same or
equivalent EXEFIX utility. This is completely missing from the CHM source code release and since there are no
build recipes, there isn’t even any hint that such a utility existed. However, OEMs must have used it because such
EXE file header cannot be produced by LINK alone.

Building Fun

In the interest of historical accuracy and a perfect match, I always try to find tools old enough that they could
plausibly have been used at the time. While that was not too hard when reconstructing PC DOS 1.1, it turned out to
be quite tricky with DOS 2.11.

Reading the documentation provided with the MS-DOS 2.00 OEM distribution reveals interesting clues.
README.DOC provided on the OEM disks contains the following slightly cryptic note: COMMAND.ASM is
currently too large to assemble on a micro. There is another clue in the DOS 2.11 CHKMES.ASM file: The DOST:
prefix is a DEC TOPS/20 directory prefix. Remove it for assembly in MS-DOS assembly environments using
MASM. Except there is no instance of ‘DOST’ in that file except in the comment. But in two other files,
GENFOR.ASM and PRINT.ASM (older files from the DOS 2.0 OEM disks) there is an odd looking ‘INCLUDE
DOST:DOSSYM.ASM’ directive.

The upshot is that at least in the days of DOS 2.0 (1982 or early 1983), Microsoft built DOS on a DEC TOPS/20
system and not on a PC. We can guess that things changed during DOS 2.1 or 2.11 times, sometime in 1983. For
example EDLIN.ASM contains the following comment dated 7/23/83: Split EDLIN into two seperate [sic]
modules to allow assembly of sources on an IBM PC.

As an aside, building DOS 2.0 on top of PC DOS 1.x would have been exceedingly painful. The source code for just
the DOS kernel alone is about 400 KB in size, well beyond the capacity of the 320K floppies supported by DOS 1.1,
or even the 360K floppies supported by DOS 2.0 for that matter. The time required to assemble all that code on an
8088 CPU was also quite significant. DOS 2.0 at least solved the storage problem thanks to hard disk support.

What is also abundantly clear is that some of the DOS source files push old MASM versions beyond their limits.
Ancient MASM either runs out of memory or hangs/crashes.

The MS-DOS 2.00 OEM disks come with MASM 1.10, which is a useful starting point but not more. OEMs may
have used MASM 1.10 to adapt MS-DOS 2.0 for their machines, but they would not build the bulk of DOS 2.0 with
it. They’d build IO.SYS, FORMAT.COM, and perhaps whatever other utilities they wanted to provide, but not
those much larger projects like MSDOS.SYS or COMMAND.COM.

The linker provided on the MS-DOS 2.00 OEM disks (LINK.EXE version 2.00) on the other hand doesn’t cause
any trouble (apart from the annoying EXE header differences). It’s only MASM that is so finicky. Note that LINK
likes to complain that “there was 1 error detected”, namely no STACK segment. This is not a problem for .COM
files and can be safely ignored.

It is worth mentioning that OEMs clearly built DOS 2.11 with LINK version 2.00 or 2.01. Both older (1.xx) and
newer (2.30 and above) versions of Microsoft’s LINK produce more mismatches. It is also notable that LINK 2.00
and 2.01 produce different executables, but the only difference is again in the four EXE header bytes previously
mentioned.

Initially I was able to build most of the DOS 2.11 source code with IBM MASM 2.00 (MASM.EXE dated 7-18-84,
file size 76,544 bytes). This MASM version is slightly too new, though only slightly. While this version in most
instances produces identical code to whatever Microsoft used, it’s just different enough that Microsoft must have
used something a bit older.

For example, when building COMMAND.COM from the source provided by the CHM, the result is the same size
as COMMAND.COM on the TeleVideo DOS 2.11 disk, but not identical. The actual difference is not big but there is
one.

MASM Version Zoo

In the TCODE5.ASM module, on line 512 there is the instruction ‘CMP [SINGLECOM], 0FFF0H’. IBM MASM
2.00 translates that (at offset 28CC in the resulting binary) as 83 3E BC 09 F0 (‘CMP W,[009BC],0F0’), but in the
actual TeleVideo COMMAND.COM file, sadly with a destroyed timestamp, the same instruction was translated as
81 3E BC 09 F0 FF (‘CMP W,[00BC],0F0’). That is in fact what for example MASM 1.10 produces.

In other words, the newer IBM MASM 2.00 is a little cleverer. It knows that the F0h constant can be sign
extended to produce FFF0h, thus saving one byte of opcode. This is a useful marker which provides a clue as to
what MASM version Microsoft may have used.
IBM MASM 2.00, Microsoft MASM 3.00, 4.00, and later versions all produce the shorter instruction encoding.
MASM versions 1.x produce the longer encoding up to and including MASM 1.25 (1983), but MASM 1.27 (1984)
generates the shorter encoding. (In general, MASM 1.27 seems to be significantly more different from 1.25 than
what the version number difference might suggest.)

MASM 2.04 (1982) runs out of memory when assembling TCODE5.ASM, but otherwise generates the older,
longer encoding.

The real takeaway is that MASM version numbers prior to 3.00 or so are completely meaningless. Microsoft
MASM up to 1.25 and 2.04 exhibits the old behavior when dealing with the FFF0h immediate, while MS MASM
1.27 and IBM MASM 2.0 show the new behavior.

It is probably not a coincidence that MASM 1.27 and IBM MASM 2.0 display 1984 copyright dates, while the other
versions show 1983 or older.

The probable chronological order of MASM versions before 1985 is 1.00, IBM 1.0, 2.04, 1.10, 1.12, 1.25, IBM 2.0,
3.00, 1.27. No, it does not make any sense.

MSDOS Segment Order

I encountered an odd problem with the file MSDOS.ASM. My first attempt was again to build it with IBM MASM
2.00. The assembly succeeded with no errors or warnings, but the resulting MSDOS.SYS was completely
nonfunctional and immediately hung. It turned out that IBM MASM 2.00 generated the segments in the wrong
order. Nasty.

MASM 1.10 (from the DOS 2.00 OEM disks) just hung when assembling the file.

On the other hand, MASM 1.12, 1.25, 1.27, 3.00 and later all assembled MSDOS.ASM without problems and
produced the correct segment order.

In the end I determined that the entire DOS 2.11 source code can be successfully built with MASM 1.25 from 1983,
about the same vintage as the source code. Whether that was actually the version used is anyone’s guess at this
point, but it easily could have been.

Messy Source
Another stumbling block was the file MISC.ASM in the DOS kernel. This file fails to cleanly assemble with any
version of MASM, but the errors it produces vary wildly across MASM versions.

The troublemaker is an instruction on line 432 of MISC.ASM: ‘TEST BYTE PTR [SI+SDEVATT], ISSPEC’. The
ISSPEC symbol is nowhere to be found, which tends to cause an impressive cascade of phase errors in older
MASM versions.

Checking the OAK for DOS 3.21/3.3 reveals the cause of the problem. Newer OAKs have the following in
DEVSYM.INC:

ISSPEC EQU 0010H ;Bit 4 - This device is special

The DEVSYM.ASM from the CHM instead has:

ISIBM EQU 0010H ;Bit 4 - This device is special

The cause of problem is clear: Somehow the CHM only provided the DEVSYM.ASM from the DOS 2.00 OEM kit,
not the DOS 2.11 version of DEVSYM.ASM matching the rest of the source code. Microsoft must have renamed the
ISIBM equate to ISSPEC between DOS 2.0 and 2.11. Editing DEVSYM.ASM and changing ISIBM to ISSPEC solves
the problem.

Unclean Text

One thing that the CHM did right was providing the DOS source files as a ZIP archive. Naive people might think
that checking the files into a git repo is all it takes, but that would be a terrible mistake, for two reasons. One is
that the timestamps would be lost, and the other is that the source files only look like text files at first glance, but
they really are binary files.

For some reason, many of the source files are padded to a size that is a multiple of 256. The files are padded with
null characters… mostly. Some of the files (e.g. TCODE3.ASM) have an ASCII CR thrown in with the nulls, not
followed by LF as it is elsewhere in the file.
Some files, such as SYSINIT.ASM, have junk null characters at the end, but their size is not a multiple of 256 (or
even a multiple of 8). The null characters in SYSINIT.ASM are in fact followed by a sequence of CR, LF, CR, LF,
ESC. Most likely the file originally had trailing nulls but was later edited and ended up with a newline (and escape)
at the end.

Then there are files that are just plain bizarre. TDATA.ASM contains four instances where CR is not followed by
LF but rather 8Ah, which is LF (0Ah) with the high bit set. MASM appears to strip the high bit and does not mind
at all. I do not know what significance this has, if any, but it certainly does not look like random corruption.

At least some of the source files may have come from a DEC TOPS-20 system or perhaps some other midrange
computer. They may have been copied indirectly, using some kind of remote link. Whatever it was, the source files
do not look like text files created on a PC, where one would expect exact file sizes and likely ESC at the end.
However, some of the source files were almost certainly edited on a machine running DOS.

While MASM is extremely forgiving, many DOS-based text editors are not and may modify the source files in
undesirable ways when editing.

Source Organization

The CHM lumped all the source files into a single directory. It is unclear whether that was how the files were
originally built or not. In later DOS OAKs (3.21, 3.30) there’s a sensible hierarchical structure, but that’s hard to
achieve with an old MASM for one trivial reason: Before MASM 4.00 (1985), there was no way to specify an
include path.

Now, this problem can be easily worked around using the APPEND utility… but that did not exist in the DOS 2.x
days. The APPEND utility only started shipping with DOS 3.3 (1987), although it originally appeared as part of the
IBM PC Network Program (1985).

Maybe all the files were really shoved into one gigantic directory, or maybe I’m missing something.

FORMAT

The source code provided by the CHM does not allow building a functional FORMAT.COM; as mentioned above,
there is no OEM specific code, and FORMAT requires an OEM-provided module that’s typically called
OEMFOR.ASM. Said module has to provide several routines: INIT, DISKFORMAT, BADSECTOR, DONE, and
WRTFAT, plus miscellaneous variables.

I decided to take PC DOS 2.1 FORMAT.COM as a basis and reconstruct OEMFOR.ASM based on that. It turned
out that when IBMVER is defined in the CHM-provided source code and OEMFOR.ASM is reconstructed, the
result is an almost perfect match for PC DOS 2.1. There is one byte difference at offset 16h in the file. For reasons
that are not obvious, the source code defines DOSVER_HIGH as 020Bh (2.11) while PC DOS 2.1 defines it as
0200h (2.0). The upshot is that normally FORMAT.COM would require DOS version 2.11 or higher, but IBM’s
version requires DOS 2.0 or higher. The FORMAT.COM binary shipped with PC DOS 2.1 could have been built
from different/modified source or it could have been patched after building.

While reconstructing OEMFOR.ASM, I learned that IBM’s FORMAT.COM uses an unpublished interface to the
IBMBIO.COM module. IBM’s FORMAT looks at the very first word of loaded IBMBIO (at 70:0) and takes that to
be an offset to a hard disk table internal to IBMBIO. There are BPBs of up to two hard disks which FORMAT uses
to obtain hard disk geometry.

I also learned that IBM’s FORMAT.COM is a bit lazy and when it finds any problem (a track that won’t format or
verify without error), it reports the entire track as bad and does not attempt to report individual bad sectors
(which the generic format code can deal with). Back in the day, that was motivation for cleverer third-party
utilities.

The documented FORMAT /B switch is interesting in that it creates a floppy with 8 sectors per track (either
single- or double-sided) which is not bootable but can be made bootable under either DOS 1.x or 2.x using the SYS
command. FORMAT creates a disk with “bogus” IBMBIO.COM (1,920 bytes) and IBMDOS.COM (6,400 bytes) big
enough for PC DOS 1.1. It’s not big enough for both IBMBIO.COM and IBMDOS.COM in PC DOS 2.x, but it’s
more than enough for IBMBIO.COM, and in DOS 2.x IBMBIO.COM can load a non-contiguous IBMDOS.COM.

There’s also an interesting generic /O switch which was not documented by IBM but was documented by
Microsoft: The /O switch causes FORMAT to produce an IBM Personal Computer DOS version 1.X compatible
disk. The /O switch causes FORMAT to reconfigure the directory with an 0E5 hex byte at the start of each entry
so that the disk may be used with 1.X versions of IBM PC DOS, as well as MS-DOS 1.25/2.00 and IBM PC DOS
2.00. This switch should only be given when needed because it takes a fair amount of time for FORMAT to
perform the conversion, and it noticably[sic] decreases 1.25 and 2.00 performance on disks with few directory
entries.
This refers to exactly the one difference between PC DOS 1.1 (aka DOS 1.24) and the released MS-DOS 1.25 source
code: Version 1.25 (and 2.x) stops searching a directory when it encounters an entry starting with zero, while older
versions do not and all unused/deleted entries must start with 0E5h.

Unsurprisingly, IBM’s FORMAT.COM includes boot sectors for both DOS 2.x and 1.x in order to create disks that
can be made bootable under DOS 1.x.

IBMBIO.COM

As with PC DOS 1.1, I set out to reconstruct IBMBIO.COM source code. Unlike the DOS 1.x case, I was not able to
reproduce an identical IBMBIO.COM file.

The reason is that unlike DOS 1.x, DOS 2.x IBMBIO.COM/IO.SYS includes a relatively large module called
SYSINIT provided by Microsoft. This was normally provided to OEMs in the form of an object file (SYSINIT.OBJ),
as seen on the MS-DOS 2.0 distribution disks.

The SYSINIT module was not hardware specific but it was responsible for initialization that needed to be
performed before the DOS kernel (IBMDOS.COM) could run. SYSINIT was also responsible for loading
IBMDOS.COM and for processing CONFIG.SYS and loading device drivers.

The trouble is that SYSINIT.ASM provided by the CHM in source form is too new for PC DOS 2.1. It notably
includes support for the COUNTRY statement in CONFIG.SYS, which was not part of PC DOS 2.1.

There’s also SYSINIT.OBJ provided on the MS-DOS 2.0 OEM disks, but that is not suitable either because it was
built with IBMVER set to FALSE and MSVER TRUE. One difference is that the MSVER variant of SYSINIT calls a
function called RE_INIT (provided by the OEM) at the end of its initialization phase, while the IBM variant has
no such function at all. Presumably this was something OEMs other than IBM needed.

Combining an OEM BIOS module matching PC DOS 2.1 IBMBIO.COM with the SYSINIT.ASM from DOS 2.11
fortunately produces perfectly satisfactory results, and I was able to get a 100% match on the reconstructed OEM
specific part of IBMBIO.COM.

Puzzling out IBMDOS.COM/MSDOS.SYS


Building a DOS kernel matching some known existing binary turned out to be remarkably difficult. Not least
because somehow the CHM “forgot” one source file, IO.ASM. Fortunately, John Elliott already reconstructed it,
saving me quite a bit of boring work. Thanks!

My first IBMDOS.COM target was PC DOS 2.1 but I gave up after realizing that IBM must have used somewhat
different and almost certainly older source code with numerous minor differences.

Reproducing IBMDOS.COM from Compaq DOS 2.11 seemed more promising. The source code matches what
Compaq shipped fairly closely, but there are major differences in the initialization code. For reasons that are very
unclear, Compaq’s IBMDOS.COM includes quite a bit of hardware specific initialization code that really should
have been in IBMBIO.COM.

Compaq also has additional code in the Ctrl-C logic (CTRLC.ASM) which invokes INT 17H. Again, this is code that
should be in IBMBIO.COM. Compaq was obviously able to modify DOS significantly more than a typical OEM
could, and the modifications suggest that unlike other OEMs, Compaq probably had the full DOS source code.

It is also notable that unlike the majority of MS-DOS 2.11 OEMs, but like IBM, Compaq built the DOS kernel with
the IBM switch set to TRUE and MSVER set to FALSE.

Given the unexpected amount of hardware specific code in Compaq’s IBMDOS.COM and complete lack of
TeleVideo’s IBMDOS.COM, I then decided to reproduce a MSDOS.SYS from one of the other OEM MS-DOS 2.11
releases.

After checking a couple of OEM DOS 2.11 releases (Corona, Eagle, Tandy, Wyse) I realized that many of them have
near-identical MSDOS.SYS, with a file size of 17,176 bytes (note that in some cases, OEMs call the file
IBMBIO.COM; that is not relevant).

There are interesting differences between those releases. For example Eagle and Wyse differ in one single byte at
offset 5D5h. Eagle clearly didn’t want the ‘HEADER’ message to be displayed and set the first byte of the sign-on
message to ‘$’, probably through binary patching.

Tandy and Wyse shipped 100% identical MSDOS.SYS.


Corona’s MSDOS.SYS exhibits two differences: At offset BF2h, Corona has 3Bh instead of FFh. This is the ‘OEM
number’ assigned by Microsoft which most OEMs clearly didn’t bother with. Note that Microsoft documented (in
DOSPATCH.TXT) how to patch the OEM number in an existing MSDOS.SYS. At offset 3639h, there is a difference
in the ‘CANCEL’ character defined in STDSW.ASM (Corona sets it to 18h, the others to 1Bh).

While trying to produce a MSDOS.SYS matching the OEM releases, I kept stumbling over the IBM and MSVER
defines. I just could not figure out how they should be set because no combination produced satisfactory results.

Then I finally realized that the DOS 2.11 OEM distribution kits almost certainly shipped MSDOS.SYS in the form
of object files, only DOSMES.ASM was likely provided in source form. The object files would have been built with
IBM set to FALSE and MSVER TRUE. But OEMs could easily build DOSMES.ASM with the defines flipped
around.

So I tried that… and bingo! If all source files are built with IBM FALSE and MSVER TRUE, while only
DOSMES.ASM is built with IBM TRUE and MSVER FALSE, the resulting MSDOS.SYS is nearly identical to the
OEM files. It’s identical with Corona MSDOS.SYS except for the OEM number, and it only differs from Tandy and
Wyse MSDOS.SYS in the previously mentioned ‘CANCEL’ character at offset 3639h (the CHM-provided source
builds it as 18h, the others have 1Bh). I consider that a success.

There is one curious difference between DOS built with IBM set to TRUE vs. FALSE. In the IBM variant, the code
for the EXEC system call (INT 21h/4Bh) is built into COMMAND.COM while the non-IBM variant has it in
MSDOS.SYS. The rationale is unclear, except the “IBMVER” style EXEC matches PC DOS 1.x where EXE file
loading logic resided in COMMAND.COM.

The upshot is that an IBMDOS.COM built with IBMVER set to TRUE had better be matched with a
COMMAND.COM also built with IBMVER set TRUE, or the EXEC functionality will be missing.

There is a similar dependency with IBMBIO.COM/IO.SYS; if the DOS kernel is built with MSVER set to TRUE
and includes EXEC logic, the BIOS SYSINIT module can use it to load COMMAND.COM. But when
IBMDOS.COM is set with IBMVER set TRUE, IBMBIO.COM must include its own minimal EXEC
implementation. A BIOS module built with IBMVER can be used with a DOS kernel built with MSVER, but not
vice versa.

Microsoft vs. IBM


For reasons that may be lost to the mists of time, Microsoft very early on started maintaining those two versions of
DOS, which might be called IBM style and Microsoft style. During building, the desired version was typically
selected by defining either IBMVER or MSVER, as previously mentioned.

In some cases, the IBM version included PC hardware specific logic, such as timer code in PRINT.COM or
interrupt controller tweaks in DEBUG.COM. In some cases, the code was adapted to IBM PC conventions, such as
the use of function keys for line editing in the DOS kernel.

IBM style DOS 2.11 boot

Some of the differences were rather non-obvious, like placing the EXEC functionality into either MSDOS.SYS
(Microsoft style) or COMMAND.COM (IBM style) as detailed above. It is likely that Microsoft considered the MS-
style behavior sensible, but IBM had some reason to insist on the IBM-style variant.

Most OEM releases of MS-DOS 2.11 were built Microsoft style, and that’s also what Microsoft provided on OEM
distribution disks (clearly visible in the case of the MS-DOS 2.00 OEM distribution disks provided by the CHM).
Compaq was a notable exception and built their DOS IBM style. TeleVideo likewise used IBM-style
COMMAND.COM and other utilities (and presumably the DOS kernel, too, even if that has not been preserved).

As noted above, OEMs liked to mix things up. At minimum Corona, Eagle, Tandy, and Wyse all built MSDOS.SYS
(whether they named it MSDOS.SYS or IBMDOS.COM) with the DOSMES module assembled in the IBM style.
Rather strange is the case of DEBUG.COM. At least Corona, Eagle, and Tandy all shipped identical DEBUG.COM
with the SYSVER equate set to TRUE in DEBMES.ASM, even though the rest of the code was built with SYSVER
set FALSE. As a result, the OEM versions of DEBUG.COM included two redundant messages (BADDEV and
BADLSTMES) which could never be shown.

The IBM style version of MORE.COM used 25 lines, while Microsoft style used 24 lines (recall that IBM’s 25-line
screens were unusual, with 24-line terminals being standard at the time). The IBM version of MORE.COM
queried the screen width from the BIOS (INT 10h/0Fh). The versions also differed in control character handling:
IBM style MORE.COM printed them except for BEL, Microsoft style did not print them at all.

By changing the IBMVER/MSVER constants, it is possible to build binaries that are an extremely close match for
e.g. Tandy 1000 MS-DOS 2.11 (files dated 10/20/1984) from otherwise unmodified source code provided by the
CHM.

It is apparent that over time, as OEM hardware trended towards a high degree of PC compatibility, IBM-style DOS
became dominant. But in the MS-DOS 1.x and 2.x days, OEMs were much more likely to ship Microsoft-style DOS
and OEMs like Compaq who desired a high degree of IBM compatibility were the exception.

Code Commentary

Comparing the DOS 1.1 source with DOS 2.11 it is obvious that DOS 2.0 was a very major update and almost the
entire core of the operating system was either heavily modified or written from scratch.

The list of user-visible changes was accordingly quite significant. Hierarchical directory structure, support for
hard disks, handle-based file I/O modeled on UNIX, environment variables, I/O redirection, loadable device
drivers, system configuration via CONFIG.SYS—those were all big changes, largely designed to take DOS further
away from CP/M and much closer to UNIX.

On the source code level, it’s apparent that DOS 2.0 additions were all developed with the MASM assembler in
mind. The code relies on numerous not entirely trivial macros which do not necessarily make the source any
easier to understand (an echo of C++ templates). There is a clear trend away from old upper-case only assembly
code with short identifiers and towards lower case code with mixed-case and sometimes quite long (over 20
characters) identifiers.
It’s probably also fair to say that DOS 2.0 was the last major rewrite of DOS. In many ways, DOS 2.0 is closer to
DOS 6.x than it is to DOS 1.x. There were many changes and improvements since then, but nothing even remotely
approaching the level of fundamental changes that occurred between DOS 1.x and 2.0.

Putting It All Together

As ought to be apparent from the preceding paragraphs, massaging the source code provided by the CHM into a
buildable and functional form is not a trivial task, but it can be done, and the result is here. Here’s what I did with
the DOS 2.11 source files provided by the CHM:

Organized source files into a directory structure that matches DOS 3.21/3.3 and later
Added John Elliott’s reconstructed IO.ASM/IO2.ASM (via pcjs.org), merged into a single and slightly reduced
IO.ASM
Duplicated source for Microsoft style DOS into parallel directories (MSDOS vs. DOS, CMDMS vs. CMD)
Replaced far too broken MASM 1.10 with MASM 1.25
Added EXEFIX.EXE from DOS 3.3 OAK (used for SORT.EXE)
Kept LINK.EXE and EXE2BIN.EXE provided by the CHM
Added batch files to build source files, in either IBM (MK.BAT) or Microsoft style (MKMS.BAT)
Reconstructed OEM portion of IBMBIO.COM and FORMAT.COM to match PC DOS 2.1
Made a handful of trivial changes to the CHM-provided source code, as detailed above

The build environment is not self-hosting due to the dependency on the APPEND utility which was not available
in DOS 2.x days. The source was successfully built on PC DOS 2000, in Windows XP, and in 32-bit Windows 7. It
should build in any DOS 3.21/DOS 3.3 or newer environment with functioning APPEND utility.

The result is a limited version of DOS 2.11. Included is core DOS, i.e. IBMBIO.COM, IBMDOS.COM,
COMMAND.COM, as well as the following utilities: CHKDSK, DEBUG, DISKCOPY, EDLIN, EXE2BIN, FC, FIND,
FORMAT, MORE, PRINT, RECOVER, SORT, SYS. This makes for a fairly minimal but fully functional DOS 2.11
environment.

The files can be copied over an existing bootable DOS 2.x disk. Care must be taken that the system file names
match what the boot sector expects (IO.SYS plus MSDOS.SYS vs. IBMBIO.COM plus IBMDOS.COM). Note that
the system files can be renamed, but IBMBIO.COM/IO.SYS must be a contiguous file at the start of the disk’s data
area (i.e. occupying the first few clusters right after the root directory).
Perhaps the most significant missing piece is FDISK. No attempt was made to reconstruct FDISK source code
because FDISK was provided entirely by OEMs, with no Microsoft source code (unlike FORMAT and SYS), or at
least not until DOS 3.2 in 1986. More or less any FDISK utility from an existing DOS 2.x release should be usable.

In closing, it is excellent that the CHM and Microsoft were able to release the historic DOS 2.11 source code. It is a
shame that (not counting the OEM-provided bits) the code was only 99% complete, making it highly non-trivial to
build functioning binaries.

This entry was posted in Development, DOS, Microsoft, PC history. Bookmark the permalink.

26 Responses to DOS 2.11 From Scratch

zeurkous says:
September 21, 2021 at 10:52 pm

That wasn’t a long post; it was of the right length to give some real
insight.

It’s funny how you once complained that there’s so much material on
5.0 out there, in the light of how little actually changed between
versions. Mainly hot air?

Yuhong Bao says:


September 22, 2021 at 10:31 am

At the time MS was already talking about Windows, and the internal DOS data structures had only one difference between
IBMVER and MSVER. Later on IBMCOPYRIGHT was added as a variation of IBMVER.

Michal Necasek says:


September 22, 2021 at 10:46 am

I’d say the “core” DOS (IO.SYS/MSDOS.SYS/COMMAND.COM) did not dramatically change since DOS 2.0. The most
significant change was probably file system redirection support in DOS 3.0/3.1, and there were capacity changes like larger
disk support in DOS 3.31/4.0 or high memory in 5.0. But on the application API level, nothing nearly as dramatic as DOS
2.0.
That said, there were lots of changes on the periphery. High memory support, UMBs, EMM386, DOS Shell, then all the
goodies like disk compression, defragmentation, etc. etc. One could say that the focus shifted from expanding the capabilities
of DOS to fighting its deficiencies, but that didn’t really make the amount of user-visible changes any smaller.

And then there was the whole background story with Microsoft, IBM, and DRI. That had a pretty big influence on DOS, even
though it was indirect.

Yuhong Bao says:


September 22, 2021 at 11:11 am

Sadly one of the differences between IBMVER and MSVER were CurrentPDB, which had to be put after OEM_HANDLER
which was MSVER only.

zeurkous says:
September 22, 2021 at 2:57 pm

@Necasek:
Me’s aware of how much the goodies changed. Mesupposes that much of the
material is about those goodies, then…?

Or mainly the drama you mentioned

Michal Necasek says:


September 22, 2021 at 3:00 pm

Both. But the “drama” as you call it is much harder to figure out, because of course there are various conflicting stories.

vbdasc says:
September 23, 2021 at 12:06 am

“I’d say the “core” DOS (IO.SYS/MSDOS.SYS/COMMAND.COM) did not dramatically change since DOS 2.0.”

Yes, if we don’t count the “European MS-DOS 4.0” a.k.a. Multitasking MS-DOS. I’d say that the differences between it and
the main DOS line were pretty dramatic.
Yuhong Bao says:
September 23, 2021 at 1:22 am

Windows was widely consider vaporware, but there is a reason why MS was talking about Windows in 1983.

Michal Necasek says:


September 23, 2021 at 1:37 am

Oh certainly. Both multitasking DOS 4.0 and OS/2 (once called DOS 5.0) were direct offshoots of DOS, derived from the DOS
2.x/3.x line. But multitasking DOS 4.0 was dead on arrival and OS/2 was already very different from DOS on its initial
release, even if at first glance it may not have looked very different at all.

I was talking about the product that was called DOS and ended somewhere with MS-DOS 6.22/PC DOS 2000 as a standalone
product, not counting the remnants that survived in Windows 9x.

Yuhong Bao says:


September 23, 2021 at 4:35 am

I wonder how many actually used the OEM_HANDLER BTW.

Yuhong Bao says:


September 23, 2021 at 6:28 am

Think about it, I wonder how net2.com/anet2.obj in NetWare worked, especially regarding EXEC which was different
between MSVER and IBMVER

Richard Wells says:


September 23, 2021 at 6:47 am

The DEC Rainbow includes firmware function INT 1Ch that relocates INT 20h thru 27h over to INT A0h to A7h. The
Rainbow MS-DOS versions might be a good place to look for OEM_HANDLER. Another strange thing with the Rainbow is
“All diskette IOCTL-type functions are invoked using INT 65H. This is instead of
the usual MS-DOS IOCTL function 44H with INT 21H. because using function 44H would cause drive motor problems.”

Another offshoot of the DOS lineage is Handheld DOS which mostly follows standard DOS except that the functions are
launched through INT 42h and some additional functions were added to handle the battery backed memory storage.
Minuszerodegrees has a copy of manual and disks for the Tallgrass Hardfile which is a hard disk plus tape drive product that
can be installed in a 5150 with drivers to run with PC DOS 1 or 2 or CP/M-86. Programs are included that replace CHKDSK
and FDISK. Tallgrass was one of the companies whose controllers stopped working after IBM patched DOS 3.1.

Michal Necasek says:


September 23, 2021 at 2:44 pm

The “OEM handler” was subfunctions of INT 21h, not a separate interrupt. So far I haven’t found any OEM documenting its
usage but there must have been someone.

The Tallgrass thing is interesting, the TGTBIO.COM on the minuszerodegrees disk images looks like a normal DOS 2.x+
device driver. I’m guessing that for DOS 1.x, they replaced IBMBIO.COM.

Richard Wells says:


September 23, 2021 at 7:03 pm

Yeah, it does help to read the code closely to see what function call was assigned. The RBIL shows some OSes that did use
OEM_HANDLER.

INT 21 – DOS v2.11-2.13 – SET OEM INT 21 HANDLER


AH = F8h
DS:DX -> OEM INT 21 handler for functions F9h to FFh
FFFFh:FFFFh disables OEM handler
Notes: this function is known to be supported by Toshiba T1000 ROM MS-DOS
v2.11, Sanyo MS-DOS v2.11, and TI Professional Computer DOS v2.13
at least potentially this is still available with (OEM versions??? of)
MS-DOS 6.0.
calls to AH=F9h through AH=FFH will return AL=00h if no handler set
the user handler is called immediately on entry to the main DOS INT 21h
function dispatcher with interrupts disabled and all registers and
stack exactly as set by caller; it should exit with IRET
SeeAlso: AH=F9h”OEM”

Michal Necasek says:


September 24, 2021 at 9:06 am

I saw that, but the “SET OEM INT 21 HANDLER” function is really supported by any MS-DOS 2.x built with “IBM EQU
FALSE”. To me it just means that those OEMs documented the function, not necessarily that they used it. Conspicuously
missing from RBIL is any documentation of an actual OEM function handler. Or at least I can’t find any. If some OEM
installed their own handler, it should show up in RBIL somewhere under INT 21h, subfunction F9h to FFh. But of course
whichever OEM asked for it may never have documented it and in any case it would have very likely been something pretty
obscure.

Jonathan Wilson says:


September 24, 2021 at 2:27 pm

I wonder if the mess provided by the CHM (e.g. missing code) is because of what they did or because that’s what they got
from Microsoft…

Michal Necasek says:


September 27, 2021 at 12:14 pm

Wherever the code came from exactly, I can easily imagine that the people involved in the source code release did not know
how to build it and therefore did not realize it was incomplete. Although mixing up the DOS 2.0 and 2.11 bits is probably on
the CHM.

Richard Wells says:


September 28, 2021 at 2:02 am

The acknowledgements section for the CHM release shows where the code came from. It was not a recent clean archive direct
from MS. I think it was intended to be incomplete.

Michal Necasek says:


September 28, 2021 at 6:38 pm

Well… it says “I had the source code for version 2.0 on 5″ floppy disks in my attic for 30 years, but we needed Microsoft’s
permission to release it.” There are contents of the MS-DOS 2.0 distribution disks, but the source code is largely DOS 2.11.
Since 2.0 is not 2.11… I still don’t know where the DOS 2.11 code came from.

Richard Wells says:


September 28, 2021 at 11:24 pm

That would be a question for Len Shustek. I suspect he only kept the MS original 2.0 disks and a working copy of the last
upgraded 2.11. The Nestar documentation indicates a number of alterations that would have to be tested against the latest
versions of DOS, especially making sure a patch doesn’t break the network code.

Yuhong Bao says:


September 29, 2021 at 7:11 am

From https://www.folklore.org/StoryView.py?
project=Macintosh&story=A_Rich_Neighbor_Named_Xerox.txt&sortOrder=Sort+by+Date&topic=3rd+party+developers :
“He was trying to get them to forget about the OS business, since the applications business would be much bigger total
dollars.”
I wonder what if they did.

Yuhong Bao says:


October 4, 2021 at 10:27 am

I wonder why NEC did not change the switch characters in Japanese DOS. The fact that backslash became an yen would itself
be a good reason to do it.

MiaM says:
November 19, 2021 at 3:20 pm

Side track: Were the file name IBMBIO.COM due to 6+3 file name limit on the DEC system? (Not sure if that was a thing on
TOPS-20. It seems to have been a thing on TOPS-10 which Microsoft might had used at some point in time).

Michal Necasek says:


November 19, 2021 at 4:28 pm

I doubt it. COMMAND.COM does not conform to 6+3 format. I don’t believe there is any reason to think the 8+3 naming
convention did not come from CP/M (even if CP/M itself may have gotten that from DEC systems).

Yuhong Bao says:


January 14, 2022 at 12:10 pm

I wonder what about MS-DOS 3.x. I’d hope they put the OEM_HANDLER at the right position this time so redirectors do not
have to be different between MSVER and IBMVER.
Michal Necasek says:
January 16, 2022 at 5:37 pm

I’m not aware of MS- vs. IBM-specific redirectors. I believe the data structures were the same but the MS versions had a
function call to set the OEM handler to something non-zero.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

OS/2 Museum
Proudly powered by WordPress.

You might also like