Project - 2ine - Ryan C. Gordon On Patreon

Log In
Ryan C. Gordon
Jan 21 at 6:33am
Project: 2ine
You have no idea how much effort went into getting this stupid white square on
the screen.
I thought I’d explain what all this is about in more detail, since I mentioned it in the
December wrapup. This is a long and technical post about weird low-level stuff
and you can totally skip this if you don't care about weird low-level stuff.
I've been building a project called “2ine,” pronounced “twine,” which emulates
OS/2 binaries in the same way that Wine emulates Windows binaries. I named it as
such to continue the fine tradition of programmers producing good code and bad
product names.
Any introduction to computers starts with a whole lot of lies. It’s complexity all the
way down to the bottom, and you don’t need to know it all when you just want to
write a program that prints out “hello world.” This is a blessing and a curse, but
mostly a survival mechanism; if you knew everything you were about to stumble
into, you’d never start. In this way, I stumbled into writing an OS/2 emulator.
In 2003, when I was working on what would become Unreal Tournament 2004, I
was trying to get the software renderer working on Linux. The market was rotten
with fast CPUs and lousy GPUs, and OpenGL support on Linux was spotty back
then in any case, so having a software fallback was super useful. For a software
renderer, UT2004 uses RAD Game Tools’ Pixomatic, which is one of those
Log In
amazing pieces of code that seems to defy physics.
Pixomatic is itself a technical marvel of lowlevel x86 sorcery; Michael Abrash

wrote about it in detail here and here and here. But of all the things it can do, all
you need to know for now is that the “Linux port” of Pixomatic consists of a
Windows .dll and a small piece of C code to load it.
I’m not kidding.
All Pixomatic cares about is lighting up pixels in a memory buffer and only needs
the win32 API to manage pages of memory. So this C code would get the Windows
DLL loaded, provide a few function pointers that would call mmap() when
Pixomatic thought it was calling VirtualAlloc() and such, and then everything just
sort of works on Linux. At the end of our trip to Win32 Land, UT2004 would use
SDL to flip that memory buffer to the screen as if nothing unusual happened here
at all. It’s wild.
What surprised me about this wasn’t the approach—although that was unexpected
too—but the simplicity of the C code that loads the DLL. 618 lines of code! Is this
all a DLL needs?
The answer, as always, is yes and no.

Shared libraries on most operating systems work like this: you get a table of
Log In
information of where to place data in memory and the system loader blasts it out
there. There’s more to the effort than that, but the juiciest tasks are:
Load a data block into memory. It might be code, or initial data for global
variables, or some constant thing like string literals.
Fix up specific bytes in memory with real addresses of things it needs. When
your program calls printf(), this puts the actual address of printf into memory
for the call instruction. There are lots of different kinds of fixups for different
needs.
Keep track of symbols exported from the library so you can fix up other
things that need this library later.
And that’s it! In most cases, programs you run use the same format as shared
libraries with minor differences. (Linux users! Did you ever try to run libc.so.6 like
it was a program instead of a shared library? I’ll wait while you try it.)
Pixomatic’s freakishly simple DLL loader got stuck in my head. Once you do the
loading and fixing up, this isn’t really a win32 program at all any more. It’s code in
a Linux process. Could we load other things like this? You bet.
Ten years later, I stumbled into writing an ELF loader for Linux, because dlopen()
needs a filename and I wanted to load shared libraries from a memory buffer. ELF
is more complicated than the Windows format, and this code is more robust in
general, getting us to about 1200 lines of code, in a project I called MojoELF.
Once I had built that, I thought: if I load an ELF binary on a Mac, it’s no longer a
Linux program. It’s code in a Mac process. And since the Mac has all that POSIX
goodness and a quality SDL port, if we fix up the POSIX calls to function
addresses that bridge differences in data layout, and fix up calls into SDL to the
real Mac SDL library, well…maybe we can play a Linux build of Quake 3.
Log In
None of this is novel at all: this is roughly how Wine has always done things. They
just had to work harder to deal with system calls into an OS that’s completely alien
on Linux.
But in any case, this is an idea that works.
I don’t know what prompted me to do an OS/2 loader in the first place, but it was
probably a straightforward case of nerd-sniping. I keep a list of interesting-waste-
of-time projects and sometimes when my mind wanders, I foolishly look at this list
and my productivity drops to zero until I can build some ridiculous thing that
caught my eye.
Documents that explain the OS/2 file format (the binaries are called Linear
Log In
Executable format, LX for short) are easy to find on the Internet, so why not try? It
should send up alarms when I tell myself “Why not spend an hour and see how far
you get?” There should be a Degrassi High School episode about this exact
scenario, to serve as a warning to future programmers during their formative years!
Like that Pixomatic C code, loading an LX binary into memory is mostly easy. But
even if you discount the ugly corner cases, you still have to implement the OS/2
API for the loader to be useful.
The first version of my OS/2 loader ran exactly one program, a Hello World thing,
written in assembly because loading a C runtime was too complex at this point.
(Strictly speaking, the true first version was probably a program that set EAX to 42
and returned, just to see if the process exit code was also 42 when 2ine
terminated, demonstrating we could bounce into OS/2 Land and back safely.)
OS/2, like Windows, doesn’t offer a single C runtime like Linux and macOS do, and
all of them do some complicated tap dancing with system calls before main() even
runs, and I didn’t want to mess with that yet. Eventually, I got to filling in some
basic Unix-like bits, with the naming OS/2 uses: DosOpen() to open files,
DosWrite() to write to a file handle, etc. The nice thing about DosWrite() is that file
Log In
handles 0, 1, and 2 match up with Unix stdin, stdout and stderr; this helped get a
bunch of OS/2 command line programs running without added drama, and you can
even pipe them through to other Linux processes.
System APIs are written in C, as native Linux code. When the OS/2 module reports
that it needs the (system-provided) DOSCALLS.DLL, 2ine dlopens its own
libdoscalls.so with these reimplemented APIs, and uses the native Linux entry
points to fix up the OS/2 module. Now when the app calls DosWrite, the CPU calls
directly into a Linux ELF shared library where this function was reimplemented,
not knowing the difference. The 32-bit calling conventions happen to match up
well enough between the two platforms that it just happens to work.
An OS/2 app can run under 2ine using a mix of native Linux libraries that
reimplement system APIs and real OS/2 DLLs, so long as those DLLs don't do
weird things or depend on DLLs that do weird things (what qualifies as a "weird
thing" could fill a whole other blog post, though). Right now, some things will run
on 2ine if they have access to a handful of IBM's system DLLs with native libraries
spackling in the cracks, that otherwise would fail to operate.
With some effort, and with enough APIs filled in, the OS/2 port of GCC that I used
in high school (2.8.1! It doesn’t even support Pentium 1 instructions!) started
running, since it doesn’t need much more than stdio and a way to launch child
processes, allowing me to build OS/2 programs on Linux, using the compiler under
my emulator. Now we’re getting somewhere!
And then I thought, hey, let’s get Watcom C working too, and here my troubles
Log In
began. Specifically, my troubles began with the compiler’s help screen saying
“press return to continue.”
(That’s right, you can debug OS/2 binaries on Linux with GDB by running them
under 2ine, as long as you don’t expect debug symbols or source code views! Also,
as long as you can convert 16:16 pointers to a linear address in your head!)
OS/2 2.0 was a 32-bit operating system. 1.0 was not. Many of the APIs from 1.0
survived into the 32-bit transition, but they never got converted to 32-bit APIs
themselves, even in the final 4.5 releases, years later. I suppose this was because
IBM wanted developers to write Presentation Manager (the new GUI/window
functions) programs instead of VIO (text mode/command line) programs. APIs for
things like file management and thread primitives continued to exist as 16-bit APIs
while also adding new 32-bit entry points for the same functions, but things that
dealt with text-based programs (Vio* for writing to a console, Kbd* for keyboard
input, etc) never got 32-bit equivalents.
(The mythical PowerPC port of OS/2 fixed this, making these APIs 32-bit clean,
apparently, but these features never returned to the Intel port.)
Now one could definitely write a 32-bit command line program on OS/2, but if one
needed to call these older system functions, one had to call into 16-bit code. This
is done through the magic of thunking and some wizardry with memory segments.
Imagine my surprise when Watcom C would print out a page of command line
information, then wait for a keypress with KbdCharIn(). To do this, it would jump
Log In
into a 16-bit code segment, which would save off some registers and call the
never-updated-for-32-bit API call, restore some registers afterwards and jump
back to 32-bit land with the results.
First problem: I don’t have a 16-bit code segment! Second: I don’t have a way to
generate 16-bit code with GCC.
After some googling, I found there’s a Linux-specific system call to help with this,
which Wine and dosemu use to support Win16 and MS-DOS. It’s called
modify_ldt(), and it lets you map pages from your 32-bit linear address space to a
16-bit selector. LDTs are a feature of the x86 processor; you can read up on them
on Wikipedia. Operating systems rely heavily on them, and userspace code doesn't
unless it's doing wacky things like emulating ancient OSes.
Okay, now I can create 16-bit segments, so what segments do I create?
If you’re OS/2 2.0, the answer is: all of them. OS/2 would “tile” the entire address
space, so any 32-bit pointer you might have automatically exists in some 16-bit
segment, and you could do some simple math on the pointer itself to determine it
(shift the top 16 bits left by 3 and bitwise OR with 7 to get the selector, bottom 16
bits are your offset).
The problem with this approach is that you only have 8192 possible selectors (you
only get to use 13 of the bits!), times 64 kilobytes in each segment, which means
32-bit OS/2 apps can only access the first 512 megabytes of their address space in
this system. In IBM’s defense, if your machine had more than 4 megabytes of total
physical RAM at the time, that was a powerhouse computer.
Later versions of OS/2 stopped tiling like this, and offered an API to do the
conversion for you (“DosFlatToSel”), but lots of programs rely on tiling and do the
pointer math themselves without using the API. In hopes that this matches what
OS/2 ended up doing, I tile the main thread’s stack (since this is probably where
most data you want to reach in 16-bit code lives; temporary local variables for an
API call) and any memory segments in the LX module that were marked as 16-bit.
Non-tiled LDTs are then allocated when a DosFlatToSel() call is made, using
cached selectors from previous allocations and tiles when possible. So far, it’s
working out okay and we aren’t limited to 512 megabytes of memory, under the
assumption most 16-bit calls happen at a handful of locations and the things that
assume the pointer math works only try it in unsurprising ways. Knock on wood.
Now I probably have the address space politics worked out (minus Thread Local
Log In
Storage, which does something bonkers I'll explain some other time), but I still
can’t generate 16-bit code with GCC. The solution is not to. Just because the OS/2
app wants to call a function in a 16-bit code segment doesn’t mean we need the
function to be 16-bit code. All we need is a little bit of bridge code for the OS/2
app to land in that moves us into our native implementation. As an added benefit,
it means these APIs are usable to OS/2 apps recompiled from source as native
Linux apps with no 16-bitness at all; think Wine vs Winelib.
So you get some macro salsa to define functions:
As you can see, this writes the bytes of the 16-bit x86 instructions directly to a
memory buffer, instead of trying to get GCC to assemble them. The macro is used
once for each “16-bit” API we export. The code got assembled with Netwide
Assembler, then disassembled with ndisasm and pushed through a perl script to
produce this code.
And then, like doing skateboard tricks, there’s nothing left to do but say “watch
this,” and see if you pull off something awesome or just crash.
This is the sort of inefficiency and trouble that drives engineers mad, but here’s
how we kept the CPU happy through this process:
OS/2 app saves some things and calls into a 16-bit code segment that likely
exists just to call into a 16-bit API.
16-bit code segment saves some things and calls into 2ine’s 16-bit bridge
code.
2ine’s 16-bit bridge saves some things and jumps directly back to 2ine’s 32-
bit bridge code.
2ine’s 32-bit bridge code saves some stuff and calls the native
implementation of whatever API.
Whatever API is implemented in C as a real piece of Linux code in an ELF
shared library, talking directly to various Linux interfaces.
Native implementation returns to 32-bit bridge code.
32-bit bridge code restores things, jumps back to 16-bit bridge code.
16-bit bridge code restores things, returns to OS/2 app’s 16-bit code segment.
OS/2 app’s 16-bit code segment restores things, returns to OS/2 app’s 32-bit
code segment.
OS/2 app’s 32-bit code carries on like it made a simple function call.
Whew!
One more piece of magic for 16-bit support: when writing x86 Linux code, you
Log In
probably don’t think about your 32-bit linear address space as having a “code
segment,” but it does. It’s not guaranteed to be any specific value, but is currently
hardcoded (0x23 if you’re running a 32-bit app on an amd64 kernel, 0x73 if you’re
on a real 32-bit kernel). You have one because code segments are how x86
processors keep track of code privilege level; the kernel runs in a different
segment with higher privileges, which lets it have instructions that your userland
code can’t use.
OS/2 also has a hardcoded code segment for 32-bit code, too. It’s 0x5B. I spent a
week trying to get IBM’s command line FTP.EXE client to not crash because it
wants to read passwords from the keyboard without echoing them to the screen,
and that needs a 16-bit API, even though all the rest of the input is just a 32-bit
DosRead() on stdin. After much head scratching about why it was trying to jump
back from 16-bit land to a totally bogus 32-bit code segment, I found an ancient
IBM CourseWare document on the Internet Archive that explained this. Since
the OS/2 kernel hardcoded code segment 0x5B, IBM’s CSet/2 compiler hardcoded
it into a bunch of apps, too, to get back to 32-bit land. EMX (which was basically
an OS/2 port of GCC) was smart enough to save off the CS register and not do
this, avoiding this problem.
2ine can’t map code segment 0x5B; it’s a GDT entry, not an LDT entry, which you
can’t really mess with in userland, so the best we can do is sniff through 16-bit
code segments we load from an OS/2 binary for far jumps to that segment and fix
them up. That code is dirty-nasty-gross, though.
With that fix in place, and enough implementation of TCPIP32.dll, we were really

flying now.
There were other fixes to be made, and features to implement, but now that most
of the 16-bit drama was handled, and most of the “fun” problems with command
line apps were done, it was time to move on to Presentation Manager apps, which
generally don’t make 16-bit calls at all. The problem here isn’t binary compatibility
but that Presentation Manager is a massive API surface that I’d have to write from
scratch.
My pep-talk sounds a lot like this, echoing in my dark office at 2am: “this had to
run on 386 machines with 2 megabytes of RAM and was built with caveman-
primitive development tools. It couldn’t be that complex.” Or, more succinctly:
“Simplicity encourages speed.” Often when trying to imagine how some system
was implemented, in 1992, I tried to imagine the cleanest, easiest way to write it,
Log In
and prayed that’s actually how it went down at Big Blue, too. This is straight-up
cockeyed optimism on my part.
I tried to write the simplest PM program possible. I can’t even call it a “hello world”
app because rendering text is an extra layer of complexity. Instead, I went for
something that creates a 100x100 pixel window, at screen coordinate (100, 100). It
paints it white when it needs painting, and if you click on it, it quits the program.
Here’s the code:
The function names are different, but this should look familiar to you if you’ve ever
done any Windows programming at the win32 (or win16) API level. It quickly
becomes apparent that Windows and OS/2 started with the same API and drifted
apart as their parents slid into divorce.
That program looks like this, running on OS/2:
(“Netscape Communicator” is not the most obsolete web browser installed on this
system, believe it or not.)
See that white square? That’s our app! If you’re wondering why it’s so low on the
desktop, OS/2’s coordinate system puts (0, 0) at the lower left corner, so that’s 100
pixels from the bottom, not the top.
To get this working, I just need to implement 14 functions! Unfortunately, some of

these functions are hella complex. WinCreateWindow(), for example, is basically
the core of the entire paradigm, so there are tons of arguments that do different
things, flags that alter behavior, etc. WinGetMsg() needs to produce hundreds of
possible window system events, and WinDefWindowProc() needs to recognize all
of them.
Don’t panic: aim for simplicity first. We don’t need all those events right now, and
even if we did, WinDefWindowProc() responds to almost all of them by just
returning zero.
I’ve been collecting up books on OS/2 programming from the web (the Internet
Archive has PDFs of so many programming books that are otherwise cluttering
landfills now) and physical copies from Amazon where I can. I’m looking for quirks
and tiny implementation details of these APIs. But, unexpected to me, the best
resource turned out to be IBM’s SDK documentation.
Most things are covered, not just in basic function call info, but subtle
Log In
interactions, window messages it generates, etc. I’m not sure why I assumed this
would be lacking, but it seems to be the best route towards reimplementation.
So let’s reimplement! I knew immediately that I didn’t want to talk to X11 directly,
because X11 sucks in general to work with and with luck it’s a dying system
anyhow. Since I am (ahem) familiar with SDL, and Epic Games and I had spent so
much time working with SDL as the backend of a GUI toolkit, I figured I’d start
there. In 2ine, top-level windows (things with the desktop as their parent) generate
an SDL window. Using the terminology of Java’s Swing framework, this is a
“heavyweight” window. Child windows slice that heavyweight window into chunks,
and since they don’t make an operating system window of their own, but just
maintain some logical state about themselves, they’re “lightweight” windows.
The heavyweight window also does something else interesting: it creates an

SDL_Renderer and an SDL_Texture to draw to. This lets us render drawing
primitives to the window with OpenGL and keep a backing store of any rendering
(so no need to send paint messages just because a window got dra ed out of the
way). Other benefits: clipping is basically free when a child window is drawing, and
we can scale up apps that thought 800x600 was an impossibly massive screen
resolution.
So about 1600 lines of C later, I had enough of the Presentation Manager

implemented to get our little white square popping up on a Linux desktop,
produced by an OS/2 binary.
Going further on this is non-trivial, though. The effort involved is probably about
equivalent to the man-hours the Wine project needed to work before you could
reasonably assume a Win16 program would function correctly. Trust me: there’s a
lot of work to be done.
I do hope to do this work at some point, but I’m probably at the point where
continuing on this is trying everyone’s patience, so I’m going to move back to
something more practical next (and something impractical, like a video game). Still,
it feels to good to have set out to climb a mountain, as ridiculous as that mountain
might be to climb, and stand on a hill some distance above the ground. I will set
camp here for now and examine some other mountains for a while.
26 Likes 26
2 f 36
2 of 36
Log In
Jeff Joshua Rollin 2d
I have never run OS/2 - I got into PCs, as opposed to home computers, just as OS/2 was on
the rocks - and have literally never even seen it outside of a book or a YouTube video,
except maybe once at a bank; but I am seriously impressed by the work and by the writeup.
Kendall Bennett 2d
That’s awesome! We did similar stuff years before at SciTech for our Binary Portable DLL
project we used to load and run graphics device drivers we developed across multiple OSes.
We built them into PE executable libraries using the Watcom (and later GCC compiler for
64-bit support) compiler an loaded the core on any OS using stubs to call back into a
portable library of OS support functions. It’s the tech IBM licensed and used for years to
keep OS/2 alive for those banks still using it until about 2007! Alas none of the graphics
card companies would let us open source the device driver code so it never saw the light of
day but the front end stuff and loader code was all Open Sourced as GPL years ago. I threw
it up on git hub a while ago and the pe loader code is still there :)
https://github.com/kendallb/scitech-mgl/blob/master/src/common/peloader.c
Load 3 replies
Log in to comment ...
BECOME A PATRON
REWARDS
RECENT POSTS
About Help Center & FAQs

Careers Developers
Log In
Create on Patreon App Directory
Brand Creator Blog
Press Creator Guides
Partners Community Guidelines
Sitemap Terms of Use
Privacy Policy
© 2018 Patreon, Inc.

Project - 2ine - Ryan C. Gordon On Patreon

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project - 2ine - Ryan C. Gordon On Patreon

Uploaded by

Copyright:

Available Formats

Log In

Pixomatic is itself a technical marvel of lowlevel x86 sorcery; Michael Abrash

I’m not kidding.

The answer, as always, is yes and no.

But in any case, this is an idea that works.

Okay, now I can create 16-bit segments, so what segments do I create?

So you get some macro salsa to deﬁne functions:

With that ﬁx in place, and enough implementation of TCPIP32.dll, we were really

That program looks like this, running on OS/2:

To get this working, I just need to implement 14 functions! Unfortunately, some of

The heavyweight window also does something else interesting: it creates an

So about 1600 lines of C later, I had enough of the Presentation Manager

Log in to comment ...

About Help Center & FAQs

Create on Patreon App Directory

Brand Creator Blog

Press Creator Guides

Partners Community Guidelines

Sitemap Terms of Use

© 2018 Patreon, Inc.

You might also like