Franco Milicchio
Dept. Computer Science and Engineering, University Roma Tre Via della Vasca Navale, 79 – 00146 Roma – Italy

ABSTRACT In this paper we show that the initial philosophy used in designing and developing UNIX in early times has been forgotten due to “fast practices”. We question the leitmotif that microkernels, though being by design adherent to the KISS principle, have a number of context switches higher than their monolithic counterparts, running a test suite and verify the results with standard statistical validation tests. We advocate a wiser distribution of shared libraries by statistically analyzing the weight of each shared object in a typical UNIX system, showing that the majority of shared libraries exist in a common space for no real evidence of need. Finally we examine the UNIX heritage with an historical point of view, noticing how habits swiftly replaced the intents of the original authors, moving the focus from the earliest purpose of is avoiding complications, keeping a system simple to use and maintain. KEYWORDS UNIX; Statistics; Case Studies; Operating Systems.

UNIX is the eldest operating system still in use, having its roots in the 1960s Multics system. Not by chance its original name was Unics, later changed to its renowned denomination. It was designed and developed at the Bell Labs by Thompson, Ritchie and McIlroy, trying to avoid some complications its ancestor introduced, keeping the system small and simple. This philosophy, originated by complex systems engineering, gained fame under the acronym KISS, “keep it simple, stupid”, and dates back to the 14th century with the lex parsimoniae by the philosopher William of Ockham, who stated “entia non sunt multiplicanda praeter necessitatem”, best known as Ockham's razor “entities should not be multiplied beyond necessity”. We can easily recognize that the whole project followed this rule of thumb even from its first version. Small programs were preferred instead of big ones, developing programs that do a single task but very efficiently. To run complex jobs, these small applications could, and still are, connected by I/O redirection. After many years, and after many death prophecies, UNIX is still one of the most used operating systems. A question may arise, whether UNIX has observed in its history the KISS principle, or it has forgot this basic rule and followed other habits. The operating system core is one of the major concerns. Microkernels have been developed during years of research, but the common understanding of them was poor. They were always addressed as neat and simple academic design projects, following the KISS principle, but with bad performances due to the number of context switches necessary to run an application. Even though there is commercial evidence that microkernels are not just academic proofs of concepts, with mission-critical real-time operating systems like QNX, and even end-user UNIX systems like MacOS X, the leitmotif of monolithic kernels being “better” never faded. Dynamic linking was one of the features inherited by UNIX from its ancestor Multics. Reusing code through shared libraries is a common practice not only in the UNIX world, but in all modern operating systems. This praxis on one hand simplifies the developer's job, but on the other hand can add a high grade of complexity in a system if all softwares do actually share their code. Again, another common opinion is that sharing prevents a system to be overwhelmed by an ungovernable duplication of resources. However, this belief had never been proved right or wrong.

wait. Compression of random files with sizes of 1024 KB. We will evaluate the number of context switches with a set of tests. 2. was one of the major discussion arguments. and an Apple eMac PowerPC running Apple MacOS X 10. 2048 KB. 4.6 based on Darwin 8. inter-process communication and memory management. 2. opposed to the monolithic approach. commissioning all services to the servers. focusing exclusively on the context switch counts. 20 MB. By construction they have tight and often non-trivial dependencies between their components.04 based on the Linux 2. execution times). affecting the whole system in case of bugs. while on microkernels all these aspects are delegated to userland servers. 10240.4. 2. Download of a 650 MB CD-ROM ISO image via the HTTP protocol. 3. and not on performances (e. On the libraries side. running on both high and low TCP ports. Microkernels on the other hand tend to keep all aspects simple and neat. Process generation with classical UNIX calls fork. and thread spawning. sound and network control. The chosen hardware models for a test have been an IBM IntelliStation M Pro running Linux Ubuntu 5. 5120. and compression of the same random sequence repeated 15 times. TIFF and PostScript. Much of the discussion between these two major schools is about performance. giving the minimum abstraction layer for hardware handling.In this paper we address these issues from an historical and statistical point of view. On monolithic systems all services are implemented in kernel space. Freedman. by selecting two UNIX (or UNIX-like) operating systems available to a general public. 5. All the tests have been then evaluated in their statistical significance to validate the results. Thus. Behavior of the Secure Shell daemon while uploading and downloading a 650 MB CD-ROM ISO image from a client. This decision was taken to avoid biased results for any statistical analysis by choosing a niche OS. 2001. and exec.10 kernel versionm. divided into 7 categories. 7. validating the results with state of the art statistical analyses. a kernel may provide other services like device management. The chosen tests are the following: 1. A monolithic kernel is thus less adherent to the KISS principle. 2048 KB. We analyzed both MacOS X and Linux with a suite of tests to prove or confute this efficiency issue. . 50 MB. Multiplication of two randomly-generated integers with 1024. For our survey we chose Linux and MacOS X. Each suite have been repeated fifty times on a freshly-rebooted machine in order to obtain a sufficient number of cases to achieve statistical significance (Casella and Berger. inspecting the weight and impact of such libraries to corroborate or contradict the habit of sharing them. 6. 2048. A monolithic kernel should be more efficient than a microkernel because it requires less context switches for all tasks. Creation of a random file from /dev/random with sizes of 1024 KB. 5120 KB. The most famous flaming discussion on the topic started between Andrew Tanenbaum and Linus Torvalds. and 100 MB.6. Conversion of images from PNG to JPEG. the creators of Minix and Linux respectively. This fact had always been addressed but never sufficiently investigated or statistically proved. 10 MB. we will point out the historical heritage of UNIX.g. The test suite was designed to be independent of the underlying hardware. we will take a survey of all shared resources present on our systems. This quality affects microkernels on the design side by needing a great care in planning their features. 5120 KB.1 Test Suite The suite consisted of 31 tests. showing how the KISS philosophy had become less important than common habits. Along with these.0 kernel version. evident even by the needed number of lines of code. KERNEL WARS A kernel is the core of an operating system. Finally. involving a high grade of complexity.6. 2005). and 20480 digits. In the history of operating systems the debate between the supporters of microkernel design. we achieve an impartial and fair comparison.

29 2.76 127.68 1. The results are presented in Table 1 for both Linux and Darwin (MacOS X).82 158212.36 13.01 121994.86 0.85 87982.90 8.96 60.82 282313. These results are a clear indication that the assertion that a microkernel has a number of context switches greater than a monolithic one has no real statistical evidence.10 244.96 4.68 3.83 89920.71 121.84 2. specifically the HTTP download.72 1. Low) Processes (fork) Processes (threads) Linux Mean Variance 87.96 1.08 0.35 88784.94 16854. the variance of MacOS X show that its kernel switches “irregularly” compared to Linux.04 76.18 1.26 14104.36 1.98 2774.50 0. High) scp (Upload.90 4.64 4.60 64.01 1.44 4.92 3.51 4.71 0.78 10812.64 0.95 0. a well-designed microkernel has a number of context switches less or at worst.82 4.83 0.48 25.46 5.74 6.74 75.22 0.42 10.97 2.85 0.42 0.28 0.54 3.2. these results do not provide any performance comparisons between the two kernels.94 27.2 Statistical Analysis A statistical analysis on the test suite has been conducted to check the validity of each test.65 85522.82 1.42 7.86 0.64 16.27 1.20 9.93 2.00 0.17 0.04 3.85 0.47 3.36 0.40 0.96 1.58 11.62 2026.18 3.30 2.78 8000.96 1.52 0.58 158.41 6.52 0. is not due to the number of context switches.14 21.36 1.00 5. This fact was confirmed also by the confidence intervals and skewness of both the systems.44 1.82 6.26 0. the number of context switches on the MacOS X system was less than the Linux counterpart with a significant difference.74 2868.52 0. In one test.86 10.42 . Low) scp (Upload. The statistical results of the test series Test bc 1024 bc 2048 bc 5120 bc 10240 bc 20480 dd 1024 K dd 2048 K dd 5120 K dd 10240 K dd 20480 K dd 51200 K dd 102400 K PNG to JPG (1) PNG to TIFF (1) PNG to PS (1) PNG to JPG (2) PNG to TIFF (2) PNG to PS (2) HTTP Download zip 1024 K zip 1024 K (15 times) zip 2048 K zip 2048 K (15 times) zip 5120 K zip 5120 K (15 times) scp (Download. where mean and standard deviation are shown in detail for each set of tests.10 0.78 0.00 1.44 50.01 MacOS X Mean Variance 0.15 381. Table 1.38 273975.96 2.61 466149. Apart from the number of context switches.50 0.68 1.27 779.62 13.84 7542.38 0.64 1. As a matter of fact.83 223. comparable to its counterpart.98 2703.60 14.78 0.65 1459. but they supply the important information that if a performance loss is present. 2. High) scp (Download.72 0.40 454540.68 0. Anyway.42 22.1 Test Results As shown in Table 1 the Linux kernel has an overall number of context switches greater than MacOS X.54 1.2.38 5.06 1.40 1.17 16.36 237.60 11675. which behaves smoothly with very sparse outliers.24 2.41 5.36 8.

thus considering as users of a library. We analyzed all the explicitly shared libraries present on the system.27 < 10-4 0.71 < 10-6 0. this procedure of using shared libraries have become so commonly spread. determining if the opinions about sharing system-wide libraries have a real foundation.94·10-3.2. Table 2. were not taken into account.17 < 10-6 0. Additionally a Sign test (Abdi.05 0. we followed the linkage up to the second order. In Table 2 we show the results for those tests that showed a non-significant difference in the 1-tailored test. since they do not share their dynamic libraries. This approach is obviously conforming to the KISS principle.19 < 10-6 0.05.2 Test Validation To validate the results we performed a standard t-Test (Hastie et al.87 < 10-6 0.2. signifying that the two tests have no significant difference: an analysis of the data showed the presence of an outlier in the Linux system with a total context switches count of 1755.05. SHARED LIBRARIES A common conduct in the UNIX world is to subdivide a complex program into small pieces. In only one test. another shared library.04 < 10-6 0. so that other softwares could use it as well.45 < 10-6 0. The consequence is that on a typical UNIX system we cannot recognize immediately who is a user of a library.32 < 10-6 0. 3. The sign test resulted in the probability of the two series not being statistically different of p < 2. 2003) in order to examine if the two series of results have statistically significant differences. that there is the ongoing belief that a software using a shared library should install it system-wide. usually libraries.and 2-tailored tTests. in particular we conducted both 1. meaning that self-contained applications like VMWare or Matlab.02 < 10-6 0. if there are any.01 < 10-6 All the tests have been proved being different in means and variances with statistical significance.. and of course is not limited to the UNIX world. In the next section we analyze the shared library distribution and relative weight on a UNIX system. 2006) was performed in order to compare the performances in the overall contexts. In this case a positive match was given to MacOS X if the 1-tailored p-value of the t-Test was less than the standard threshold of 0. Applying an outlier filtering the test showed a p-value < 10-6. the creation of a 10 MB file from /dev/random has a higher p-value of 0. Another opinion is that using system-wide shared libraries would decrease the amount of disk space wasted by an “uncontrollable” duplication of resources. sided by the corresponding 2-tailored study.09 0. The statistical results of the test series Test dd 10240 K PNG to PS (1) PNG to JPG (2) PNG to TIFF (2) HTTP Download zip 1024 K zip 1024 K (15 times) zip 2048 K zip 1024 K (15 times) zip 5120 K Processes (threads) t-Test p-value 1-taliroed 2-tailored 0. In order to obtain the total users of a library.1 Statistical Analysis We took a survey of all the shared libraries on our Linux system to find out if effectively those beliefs are supported by real evidence.09. During years of life. 3. and worse. having a p-value not exceeding the standard statistical limit of 0. .

would add 1% of the disk space decreasing the number of shared objects by 68%. have no reason to be used a number of times comparable to a system resource. The most used libraries. we focused on the weight of each libraries group.1 Self and Non-self Roughly counting the number of shared libraries.1. Table 4. A mere sorting by number of users shows which libraries can be counted as “self”. The great benefit is evident: reducing the number of unknown libraries avoiding possible orphans. we reach a total of 500 common resources present on a modern UNIX or UNIX-like system. . and the most used libraries (with more than 500 linkers) are actually about 2% of the total. we can move the least used libraries from a system-wide location to the application itself. To better understand the importance of these shared object. graphic environment and dynamic libraries. Shared libraries linked only once are about 28% of the total. and the half is shared less than 10 times.3. and they can hardly classified as “shared”. it remains far under acceptable bounds: removing libraries used less than 20 times on a 40 GB hard drive. weight only for 2% but are 54% of the total. and adds confusion to the maintainability of a system keeping its structure not simple and certainly not stupid. nowadays considered a small one. and having a better view of what is part of the system and what is not. We divided the libraries into six main categories by number of users. An operating system by its very definition manages software and hardware resources. and on modern systems provides a minimal set of libraries for third-party software development. This fact draws a shadow on the claim that shared libraries are a quasi-necessity. reducing disk space and of course helping the system to remain under the KISS principle. linked more than 100 times. Figure 2 shows the weight of each group. we would add 450 MB of space removing 340 shared objects. the most used libraries concern with low-level operations.2 Library Weight An interesting insight is given by the shared libraries distribution and relative weight on the Linux system. Although the disk space evidently increases. the nonself classified library should then be taken into account to be included in the operating system. The ten most used libraries with their respective linking count Linux Library ld-linux libc libm libdl libpthreads libz libX11 libglib libgobject libXext Count 3294 3291 1879 1871 1561 989 804 781 759 749 MacOS X Library libSystem libiconv libgcc_s CoreFoundation libncurses libcups libsasl2 libssl DirectoryService Kerberos Count 1159 185 127 109 104 92 83 78 75 70 3. For instance. About half of the libraries present on the system are used less than 10 times. In the table we read the total count of users of each library along with the number of times they are dynamically linked by other applications and libraries. and the results are shown in Figures 1 and 2. If this happens. if we had statically linked all libraries with less than 20 users on our system. with a static linkage or bundling them into an application directory à la NeXT. Softwares. In Table 4 we show the ten most used libraries on both operating systems. and which ones should be classified as “non-self” on an operating system. weight 85% of the total while they amount to 12% of the number of libraries. To avoid this intrinsic disorganization. It happens that more than one fourth of the libraries are effectively shared just by name. The disk space overhead caused by the duplication of resources is clearly dependent on the limit we impose on the number of library users. other than the operating system itself. The least used libraries used less than 10 times. as pictured in Figure 1. The weight of each library is the size that would have been occupied on a storage medium in case of static linkage.1. As expected. In this situation it is hard to classify a library as part of the operating system (the “self”).

1971). device drivers.g. as it pursues an extreme simplicity and coherence in handling files. such as Solaris that includes both CDE and JDE. thus properly the users directory. The “everything is a file” philosophy.We strongly stress the fact that we chose one of the cleanest and most coherent Linux distribution. The usr directory was the place where users had their own personal space. At its birth the directory structure on UNIX was also very simple. Number of libraries per number of linkers 4. UNIX HERITAGE As UNIX was developed. kernel extensions). In the first UNIX there were just few directories. configuration files). Other UNIX systems that include more environments.g. having just one and only one desktop environment and so not adding other shared libraries for just a single application. The first two were integral part of the operating system. referring to the Section 6 of the UNIX manual. have worse results. was present from the very beginning. etc. and where “other” things regarding the system were to be found (e. devices and even IPC-related files with a simple API. Reading the manual we can have the clear perspective of the author's intention: the clear distinction of roles between system and users. although it was not a requirement. The first UNIX system already contained the dev directory with the special device files. bin. files and directories is clearly KISS-compliant. This abstractive approach to devices. system libraries. and usr. as we can see in (Thompson and Ritchie. being respectively the place where the system binaries were stored. Figure 1. as the authors themselves say “user-maintained programs are not considered part of the UNIX system”. it followed the KISS principle in almost every aspect. On MacOS X the number of shared libraries are far less than those present on other systems: since all applications bundle all their resources. which is characteristic to every UNIX system. or AIX including KDE as well as Gnome. . directories. there is no library installed system-wide by applications with few obvious exceptions (e.

being by default sbin not present in the path environment variable on many UNIX systems. This fact still holds in these days. this “shared” directory contains application resources such as translations. Figure 2. Moreover the question is whether or not these files can be classified as truly shared: by their very definition those resources are not certainly shared by any other software. From a user data storage location. as we can see for example in the description of the cal program in the original UNIX manual. We can track the reason back to history. In fact. and of course the lack of standardization in the early UNIX systems. Apart from libraries. we can ask ourselves why a translation file for particular software should reside in a directory different from the application itself. there were more additions to the system. Over the years this location had been more and more used to store system binaries. we may ask what is now the difference between an administrative command stored in sbin from another one stored in usr/sbin. like applications. but still accessible by users. We may notice as this choice has nothing to do with a privileged access to system binaries. making the swap between tapes easier to handle. which have been analyzed in the previous section. but none of the causes that originally drove this conduct hold today. led to the creation of a plethora of directories and mount points. The ongoing growth. The system administration services were also moved to another directory called sbin from their original location etc. 1971). A library directory was added moving the system-shared objects from the etc to a more meaningful location. configurations. Moreover. as we can read in the boot command in (Thompson and Ritchie. but they were in etc “to lessen the probability of its being invoked by accident or from curiosity”. Library occupancy if statically linked (per number of linkers) The usr directory is one of the notable examples of how a system can grow in complexity. many directories were mount points for a storage mediums. While the habit of using many locations as mount points for different storage mediums was a necessary procedure in old times. Originally a user directory was a shared resource. We may recall that in the days when UNIX was conceived. becoming full of locations with an unclear meaning. nowadays had become just a habit: the sbin directory for example is there because we expect it to be there. we find that almost any UNIX system contains a directory dedicated to the system . this directory have been in years called with many acronyms. graphic servers and UNIX commands. and even administrative commands. Focusing on the applications. like User Shared Resources or UNIX System Resources. documentations.As UNIX grew.

the space required by strictly limiting the number of libraries not only have an insignificant impact on modern storage media. MA. 1994. pp 84-89. A Quarter Century of UNIX. 1971. D. thus negating the opposition to the first family of operating system cores. No. as in the MacOS X system. and Berger. We address NeXT in particular because it was a UNIX operating system. 2003. MacOS X and Linux. it introduced locations with significant names like Applications. USA. Institute of Electrical and Electronics Engineers. Thompson. but have no reason to exist anymore. Freedman. has no more context switches than a classic monolithic kernel. but bundles were actually not limited to the NeXT OS: for example BeOS applications were bundles even though it was not a UNIX. New York. T. Casella. The KISS principle was of course present even in the first version of Bell Labs UNIX. Moreover.. Ritchie. USA.. In the modern NeXT descendant. 2005. 2001. this habit is comparable to the practice of separating application resources from the application itself. Addison-Wesley Professional. G. Bundling headers and respective binary library in a single location is again a possible solution. Statistical Inference. Boston. Moreover. Bell Laboratories. The UNIX time-sharing system. Hastie. and re-establishing the Users directory. we examined the current shared library situation. but evidently it was swiftly replaced by habits that still in our times taint the simplicity and logic of the original intents. UNIX Programmer's Manual. Statistical Models: Theory and Practice. H. Binomial Distribution: Binomial and Sign Tests. Despite the efforts. and Thompson. D. Vol. 26. showing that a simplification process is needed to satisfy the simplicity of maintenance that modern systems require. that a microkernel. et al. The Elements of Statistical Learning. . This also proves that if there is a performance difference between the two. IEEE. ACKNOWLEDGEMENT A brief acknowledgement. 1983. 5. Bundling all the application-related files in a single location makes it simple to recognize an application resource from a system one. storing all the available header files in a single location does not help in keeping a resource simple and immediately recognizable. Communications of the ACM. We have proved. Duxbury Advanced Series. R. In addition. 2006. 2003. Standard for Information Technology-Standardization Application Environment Profile-POSIX Realtime and Embedded Application Support (AEP). 1. avoiding the spread of files in many directories and increasing the system simplicity. by using application and library bundles.... complying with the KISS principle by design. we can clearly see an effort in simplifying the system. Library. USA. is not due to the number of context switches. M. but reduce the number of shared resources at least by 50%. L. CONCLUSION We have analyzed some of the main concerns about the adherence to the KISS principle by two of the most used UNIX operating systems available to a general public. P. Although it might at first seem convincing. M. so by their purpose they should not be stored in different places. Cambridge University Press. Salus. Neil J. H. STD 1003.13-2003. MacOS X. and Ritchie. K. UNIX had become a huge and habit-prone system.. Salkind Editor. which keeps a system not certainly simple and stupid to understand and maintain. As for applications. New York. Born following the KISS principle. D. which of course could have been easily avoided retaining a compatibility with the past. NY. Duxbury Press.header files. Springer. with statistical evidence and validation. Inc. NY. Vestigial heritages are still present. System. the system still has common UNIX directories.. REFERENCES Abdi. Again a NeXT approach to software deployment gives a solution to this unreasonable complexity. Encyclopedia of Measurement and Statistics. USA. K. Sage Publications. A header by itself has almost no usage without the library it describes.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.