You are on page 1of 8

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE

-1-

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE


THE FACTS It is the source code The Windows 2000 source code leaked was first reported by Neowin on 12 February 2004 and soon confirmed by a public statement from Microsoft. There are two major downloads circulating, one 229 MB archive with source for Windows NT 4 and one 203 MB archive with source code for Windows 2000. No Windows XP, Windows 2003 or Windows "Longhorn" source code is being circulated. What is source code? What is source code? Many report have called it a blueprint, but that's wrong. A blueprint for a building tells you how to create that building. Source code isn't a blueprint for how to create a program. Source code is the program. The difference between source code and the program you use is one of translation. Programs are written in a computer language, that's a language for writing computer programs, but one that humans can learn to read and write. The text they write - in one or more computer languages - is source code. That source code is translated into a form you can actually use, and that form is called an executable. The source code and the executable are the same program. They are not different programs, just different forms of the same program. The important thing about the availability of source code is that the source code is not only written in a language humans can understand, but also contains suggestive names and helpful comments to aid in understanding it. It is possible to learn a lot about how a program works by studying just the executable, but it is much easier to understand a program by studying its source code. Not the first Microsoft source code leak This is not the first time that Microsoft source code leaked onto the net. In 2000, the source code for MS-DOS 6 was leaked. It received considerable less attention, as most journalist considered it obsolete, despite the fact that it still had millions of users around the world, and that MS-DOS is actually the basis for many versions of Windows still in use today. That leaked source is still being passed around. Security breach In October of 2000, Microsoft had to confirm that crackers had broken into their network and actually gained access to the Windows source code. That breach was done using the Qaz trojan. Microsoft has stated that this time round, their security has not been breached. How did this happen It is a little known fact that Microsoft has been providing access to Windows source code to Universities, strategic partners and consulting companies for a long time. As Microsoft expanded the number of license programs and the number of licensees, the risk of a source code leak increased. It has been reported in a BetaNews exclusive, that evidence inside the Windows 2000 source code leaked on Thursday 12 February 2004 suggests that this particular leak originated at longtime Microsoft partner MainSoft. The leaked source would implicate Eyal Alaluf, MainSoft's Director of Technology. MainSoft MainSoft is a commercial company that provides a product called MainWin. The MainWin product makes it relatively easy for third-party software companies to make the programs they already created for Windows available on Unix as well.

-2-

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE


The MainWin product is based on actual Windows source code. Microsoft provided MainSoft with its first Windows source code license in 1994. Access to the actual Windows source code makes it easier for MainSoft to ensure that the resulting Unix programs made with the MainWin toolkit work just like the original Windows program. Neither Microsoft nor MainSoft has acknowledged that MainSoft is the cause of the leak yet. Both companies have issued statements that they are investigating. Windows source code The availability of Windows source code is a big issue, but not in the way many reported it. Access to Windows source code can be had legally from Microsoft. Microsoft has a number of source sharing programs. Many of these require you to sign paper agreements, but a package containing the source code for Windows CE 3.0 can be downloaded from the Microsoft web site after agreeing to an electronic one. Apple makes source code available too. The full source code for the latest version of MacOS can be downloaded from Apple's web site. So, the mere availability of source code isn't a big issue. The issue is that this particular source code, which Microsoft has always presented and is still protecting as a trade secret, has leaked outside its licensing program. It has been made available for download to anyone, without Microsoft's consent, and without any of the recipients agreeing to any license first. This creates all kinds of problems (see below). the leaked Windows 2000 source code is old The leaked source is more than three year old. The newest files in the Windows 2000 source code are dated 25 July 2000. The source probably corresponds to Windows 2000 Service Pack 1, while the current Service Pack for Windows 2000 is Service Pack 4. The Windows NT 4 source code probably corresponds to Windows NT 4 Service Pack 3, while the current Service Pack for Windows NT 4 is Service Pack 6a, and a Service Roll-up Pack has already followed it. the leaked Windows 2000 source code is incomplete The distributed Windows 2000 source is reported to consist of 30,915 files that take up roughly 650 MB of disk space (just about how much will fit on a single CD) and contain some 13,5 million lines of code. That sounds like a lot, but it is only part of Windows. One question people keep asking is just how much it is. To answer that question, you must know how big Windows is. How big is Windows Many news sources keep quoting each other, saying that the estimated size of Windows 2000 is 35 million lines of code, with no attribution to any reliable source. The 35 million is just someones estimate. Most code sizes mentioned are either estimates or numbers for other versions of Windows. Most numbers mentioned are too high. The actual numbers are all available from Microsoft sources if you know where to look. A Microsoft press announcement dated 15 Feb 2000 about the introduction of Windows 2000 in the Middle East clearly states that Windows 2000 consists of 29 million lines of code. On 24 Sep 1997, Microsoft Senior Vice President Jim Allchin told the attendees of the Professional Developer Conference that Windows NT 3.1 was six million lines of code, Windows NT 4 was 16.5 million lines of code, and that Windows NT 5.0 Beta 1 (Windows NT 5.0 was later known as Windows 2000) was 27 million lines of code. Some sources mentioned 50 million lines of code for Windows 2000. A document on Microsofts web site makes clear that that's the number for Windows 2003. Another document reveals that another number that is being passed around a lot, 45 million lines of code, is the number for Windows XP. Some very authoritative figures for the size of Windows 2000 are in presentation Microsoft distinguished engineer Marc Lucovsky gave at the Usenix 2000 conference. He told

-3-

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE


attendees that Windows NT 3.1 consisted of 6 million lines of code and that the Windows 2000 source code consisted of 180 projects, that total 29 million lines of code and take up 50 gigabytes of disk space. Heres a table summarising the official figures collected from various Microsoft sources year product MLOC 1993 Windows NT 3.1 6 1996 Windows NT 4.0 16.5 1999 Windows 2000 29 2001 Windows XP 45 2003 Windows 2003 50 MLOC = million lines of code Most numbers reported so far for the size of Windows 2000 are too high and as a result, the estimates for the percentage of code that has leaked are too low. CALCULATIONS explaining the 15 % quote; Microsofts calculation? Microsoft reportedly told analyst that the leaked source is roughly 15 % of the source for Windows, and many have doubted that number. Apparently, Microsoft later told the same reports that is was just one or two percent. Both numbers can be explained. Many analysts believe the actual percentage to be higher than 15 %, despite assuming high estimates for Windowss size but what if you assume that the 15 % is right? If 13.5 million lines of code is 15 % of the code, then 100 % would be 90 million of codes. It is not impossible that 90 million lines of code is the actual size of the code base for Windows Longhorn, the next major release of Windows. Assuming that this is the calculation Microsoft made, the problem is that that calculation just isnt right. If you were look up that 15 % in the current Windows source and compare it with the leaked source, you would undoubtedly find that it had been changed in thousands of places. It just isn't the same source. You may also find that the parts that do actually correspond with the leaked source aren't 15 % of the current source, but say 12 % or 18 %, depending on both the nature of changes and your definition of corresponding. The simple truth is that the leaked source isn't any percentage of the current code base at all, it is three and a half year old code that's a percentage of the official code base for Windows 2000 Service Pack 1. If this is indeed Microsofts calculation, Microsoft would consider the distribution of the complete Windows NT 3.1 source (6M LOC) as just 7 % of the Windows source, while it is in fact 100 % of a complete, working operating system. That does not mean that the Microsoft calculation is meaningless. It is only natural for Microsoft to compare to the current code base, as it represent the current extent of their intellectual property, and the quoted percentage is rough quick & dirty initial indication how much of the current intellectual property has leaked. Whether this is indeed Microsofts calculation or not, I make different calculations. The real calculation I believe we should compare the size of the leaked source to the size of the corresponding operating system. The leaked source is 13.5 million lines of code; Windows 2000 is 29 million

-4-

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE


lines of code. 13.5 million is roughly 47 % of 29 million. Thus, if both figures are correct, almost half the source code leaked out. However, another calculation can be made. The leaked source takes up 650 MB of disk space. The complete source takes up 50 GB of disk space. Thus, again assuming that both figures are correct, only slightly more than one percent got out. Which calculation is better? With Microsofts official figures public, it is possible to calculate the percentage of source code and possible to calculate the percentage of disk space. Theses percentages differ considerably. The question which calculation is best doesnt matter that much. Having 47 % of the source code is nothing to sneeze at, even if it just 1 % of all files. Windows NT 4 numbers What everyone seems to have missed in trying to work out the actual percentage is that the numbers reported for Windows NT 4 do not make sense. Many ignored the Windows NT source code simply because Window NT 4 itself is considered old news. However, a brief look at the numbers reported for Windows NT 4 leads to a startling conclusion Windows NT 4 source code size paradox: leaked source is larger than full source? One pair of number being reported for the size of the leaked Windows NT 4 source code is 95,103 files and 28 million lines - but Microsoft tells us that the full source code for Windows NT 4 is just 16.5 million lines of code, so that would be 170 % of the source code. Some media has reported these numbers repeatedly, apparently without anyone noticing or wondering about the obvious impossibility. There are two explanations for this paradox: either the distribution contains more than just the source code for Windows NT 4 or the numbers are wrong. Possibly both. What is a line of code? A simple explanation for this paradox is that the two numbers do not count the same thing. A line of code is not always a line of code. Just how you count the lines does matter, and matters a lot. It is possible that the reported count of 28 million lines of code includes empty lines and comment lines, whereas Microsofts count of 16.5 million lines number does not. If so, because an all lines count and a Microsoft count are not the same thing, and the two numbers are not directly comparable. Thus, the reported 28 million lines could actually be less than the reported 16.5 million lines that Windows NT 4 is made of. This is confusing. The only certainty is this: uncertainty about just what is being counted throws all reported numbers into doubt. Another 15 % explanation So, another explanation for the 15 % statement is that the source code for Windows NT 2000 is 13.5 lines of code if you count all empty lines and comment lines, but assuming Microsofts 15 % was correctly calculated - only 4.35 million lines of code when counted Microsofts way. IMPORTANCE Having corrected, explained, calculated and doubted the numbers, Ill now say this: none of these numbers means much. The issue is not what percentage got out, but what got out. The real observation is that what got out is not just any part, but an important part of Windows, and you do not even need to read the leaked code to figure that out what MainWin does

-5-

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE


MainSoft's MainWin product allows developers to create Unix versions of their existing Windows programs. There are all kinds of technicalities, but the basic idea behind the MainWin product is very simple: MainWin pretends to be Windows. MainSoft has incorporated considerable parts of the Windows code into its MainWin product. In a very real sense, large parts of the MainWin product do not just pretend to be Windows, but are Windows. WISE In support of the MainWin product, Microsoft provided MainSoft with a license in its Windows Interface Source Environment (WISE) program. The WISE license provides source code access to the very core of Windows, the basis the rest of Windows is built on. The WISE program is so exclusive that it is not listed on Microsoft Shared Source Licensing Programs page. What is provided under the WISE license is so essential, that only a few companies ever got one. That fact alone already indicates the value Microsoft places on this particular source code license. The source that leaked is part of what MainSoft got under that rather exclusive WISE license, and what it got is the hottest part of Windows. If the leaked code is indeed 47 % of Windows, it may very well be more than 50 % of Microsofts most jealously guarded Windows secrets. Cracker threat Many commentators have suggested that the availability of the source code is a boon to crackers and virus writers. I do agree with that, but I also believe that many have overstated the extent and misrepresented the nature of the effect. Keep in mind that virus writers have been quite successful so far without access to the source code. The availability of source code does not make it possible to find programming mistakes that can be exploited, it only makes it easier. Still, crackers are likely to study the code in search for programming mistakes. A temporary increase in the number of new exploits as these are found is not impossible. It is a cause for concern, but not a reason to panic. Many of the defects in Windows 2000 Service Pack 1 have already been fixed in subsequent Service Packs. First exploit The first potential exploit based on the code was reported to SecurityTracker by gta, a white hat hacker, who points out a programming defect in the source for Internet Explorer 5 and explains how this could be exploited by creating particular pictures. This specific defect had already been discovered by Microsoft during an internal audit and has already been fixed in Internet Explorer 6 Service Pack 1, released on 30 August 2002. Conspiracy theories There are conspiracy theories that Microsoft leaked this source on purposes. BetaNews's report that it wasn't Microsoft, but MainSoft that somehow leaked the code did not put an end to those. Small details like that do not deter a good conspiracy theorist. I have been watching Microsoft for most of its existence, and I do not believe Microsoft wanted this to happen. I believe that Microsoft management is seriously embarrassed by and is very unhappy about it all. I do not believe that that the public distribution of old Windows source code impacts the schedule for Windows "Longhorn" either. Longhorn slips will be caused by the usual reasons. WHAT TO DO What can Microsoft do There is only so much Microsoft can still do about it.

-6-

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE


Microsoft will of course continue to provide updates and fixes for current Windows versions, and will continue development of Windows Longhorn, the next version of Windows. Microsoft is investigating the matter, and will certainly try to prevent a repeat occurrence. The genie is out of the bottle; the distributed source code is public now. Windows was never Open Source, but it is Opened Source now. That is an issue for the whole industry that Microsoft will have to deal with. One practical thing Microsoft can do is identify what current source corresponds to the leaked source and give it a higher priority in its ongoing Secure Computing initiative. That is not the only thing Microsoft can do. A more radical response to would be to offer free upgrades away from Windows NT and Windows 2000. What developers should do I strongly advise software developers to resist the temptation to look at the source code. If you are a developer, do not even download it. You should not possess a copy. Once you have looked at the source, you are irrevocably contaminated. I am not a lawyer, but you don't have to be a lawyer to understand that if you ever look at the source code and later create something very similar, you may have a hard time proving that what you did is not directly or indirectly derived from that source. Copyright violation may be a complex topic in the Internet age, but it's only the beginning. If it is known that you may have viewed the Windows source code you are much more likely to find yourself charged with trade secret violation and software patent infringement. So heres the advice from a Windows internals expert: Do not look at it. Do not download it. Avoid all contact with it. Treat it as an infectious disease. If you believe that you have a specific problem that could be solved with access to source code, contact Microsoft or one of its licensed partners with your specific request. If you are just interested in learning about how a system like Windows works by studying the source, download either the Windows CE 3.0 source from the Microsoft web site. The most interesting source package Microsoft makes available as a free download is the Shared Source CLI (SSCLI), also known as Rotor. What system managers and users should do User of older versions of Internet Explorer should upgrade to the latest version, a free download from the Microsoft web site. System Managers and Users of Windows NT 4 and Windows 2000 systems in particular should take pro-active action by ensuring that their systems have the latest free Service Packs or upgrade to a newer version of Windows - Windows XP and Windows 2003 are recommended for clients and servers respectively. Microsoft provides tools that make keeping up to date relatively easy. Internet Explorer users can visit the Windows Update to check for any updates their system might need, simply by choosing the Windows Update item on the Tool menu. Windows 2000 Service Pack 3 and later includes the Windows Automatic Update Client that will download critical patches automatically. This is not available for Windows NT 4. Windows NT was released in 1996, more than seven years ago. Microsoft is retiring support for Windows NT 4 and users are recommended to upgrade their systems to a more recent version of Windows. And oh, do make regular backups and do use an antivirus program with automatic updates. Copyright 2004 by Tamura Jones

-7-

STATEMENT ABOUT THE PUBLIC DISTRIBUTION OF WINDOWS SOURCE

-8-