You are on page 1of 9

The GNU/Linux Platform and Freedom Respecting Software for Economists

A. Talha Yalta TOBB University of Economics and Technology

Riccardo Lucchetti Universit`a Politecnica delle Marche, Ancona

October, 2007

Abstract

The GNU/Linux operating system is rapidly gaining ground as an attractive alternative to the proprietary platforms particularly in government agencies and academic institutions. Here, we supply an assessment of this resource from the perspective of econometricians, discuss its benefits and disadvantages along with a basic overview of some of the popular free/libre and open sou rce programs hosted by this platform for conducting research in economics.

1

Introduction

GNU/Linux (pronounced guh-noo / lin-ux) is a Unix like, free and open source computer operating system, which provides an excellent platform for conducting research in eco- nomics. During the last few years, GNU/Linux has reached a level of maturity where it is now considered to be at the same level and in some ways sup erior to the leading commercial alternative from Microsoft Corporation. While adoption is still in its early stages, 1 GNU/Linux is quickly coming into widespread use, particularly in government agencies and academic institutions. Today, many economics departments around the world are using it for performing everyday tasks as well as for carrying out large scale research projects.

Correspondence to: TOBB University of Economics and Techno logy, Sogutozu Caddesi No:43, Sogutozu, 06560, Ankara, Turkey E-mail: yalta@etu.edu.tr 1 Estimating the total number of GNU/Linux users is difficult. Software companies know how many licenses they have issued. On the other hand, when it comes to Free Software, no-one knows how many copies have been produced. A very informal attempt by the authors to measure popularity of this platform using the Google search engine on 4-13-2007 resulted in about 611,000,000 entries for “Windows”, 359,000,000 entries for “Linux”, 248,000,000 entries for “OS X”, and 142,000,000 entries for “Unix” respectively. Without doubt, GNU/Linux is one of the most talked about operating systems on the Internet.

1

Author-created Version: The original publication is accessible from http://ideas.repec.org/a/jae/japmet/v23y2008i2p279-286.html .

In this paper, we review this resource from the perspective o f econometricians. In the next section, we will supply some background information ab out GNU/Linux and explain its benefits and disadvantages. In section three, we will discuss hardware requirements, installation, and other methods to start making use of GNU/Linux instantly. This will be followed by an overview of various open source econometric and statistical software packages hosted by this platform. Section five concludes. For additional information, the reader is referred to MacKinnon ( 1999), which provides a detailed, albeit slightly outdated, review of the Debian GNU/Linux distribution from the point of view of an econometrician.

2 The GNU/Linux Platform

The roots of GNU/Linux go back to 1985, when the programmer Richard Stallman re- signed from MIT and established the nonprofit Free Software Foundation (FSF) in Boston, MA, USA. Frustrated by the commercialization of computer programs, and believing that software should be free in accordance with the classical spirit of scientific collaboration, Stallman ( 1985) had already started working toward a free and Unix compatible operat- ing system called GNU. 2 Over the next several years, Stallman and his team developed a number of tools such as the “gcc” compiler, the “emacs” text editor and the “bash” shell, along with most of the core libraries of a standard Unix system. However, by 1991, the GNU project was still missing a kernel, the essential component of an operating system responsible for communication between hardware and software. This piece came from Linus Torvalds, a second year college student at the University of Helsinki, who used the GNU tool set to start writing the Unix compatible Linux kernel as a hobby project. In the years that followed, with the help of an ever increasing number of volunteers who shared their ideas and contributions over the Internet, GNU/Linux came to be the biggest collaborative project in human history. GNU/Linux is free/libre and open source software 3 (FLOSS) and is released under the GNU General Public License (GPL) of the FSF. 4 The GPL enables users to freely study, modify, and share programs, also ensuring that the modified versions can never become proprietary. Today, GNU/Linux is a versatile and powerful operating system which offers a number of ethical, technical, and practical advantages, that we will describe below. For some, the main motivation for adopting GNU/Linux is a matter of principle. Among the few users who actually bother reading the “End-User License Agreement” (EULA) of a commercial operating system, many discover that they do not agree with all of its contents. On the other hand, according to most GNU/Linux users, free circulation of ideas, as embodied in a software program, is the key to human progress. Therefore,

2 GNU is a recursive acronym, which stands for “GNU’s Not Unix” . 3 The FSF is based on assigning a high value to freedom in softwa re, however, this principle is not properly captured by the word “free” in the English language. Consequently, the French/Spanish word “libre” is often used to distinguish freeware (gratis software) from free (libre) software, which liberates computer users from proprietary software under restrictive licensing terms. 4 The terms under which software is released are generally referred to as a license. Although the GPL is the predominant type of license in the FLOSS world, some pr ograms are released under different terms. See de Laat (2005) for a review.

2

Author-created Version: The original publication is accessible from http://ideas.repec.org/a/jae/japmet/v23y2008i2p279-286.html .

source code should be made public and has to be protected from becoming proprietary. This idea goes hand in hand with the ongoing trend toward openness in economic analysis. With GNU/Linux, computer users have access to a completely transparent and restriction

free computing environment, where every protocol and process is open to inspection and intervention. Plus, it is freely available to all, greatly facilitating peer review and research replication. As a result, choosing GNU/Linux is considered a step in the right direction by many users, who feel that it is their responsibility to support the use and development

of free/libre software tools for conducting research in economics.

Among the practical reasons for choosing GNU/Linux is the outstanding stability of this platform, which is a particularly important consideration for econometricians who regularly run extensive computational tasks that take days and even weeks to complete.

A properly configured GNU/Linux system will run without interruption for as long as the

hardware keeps working. In addition, with full multi-tasking support, multiple devices can be accessed simultaneously, and numerous programs can b e run in the background. Due to these features, 376 of the world’s top 500 supercomputers are now running on the GNU/Linux platform. 5

Another reason is that GNU/Linux offers excellent security. This is partly due to its architecture, which makes it almost impossible for harmful programs to cause system- wide damage. Moreover, it is very difficult for a virus to hide itself in open code. Also, virus writers normally choose more popular operating systems as their target. Peeling and Satchell ( 2001) report that the Windows operating system is threatened by about 60,000 known viruses. These programs cannot cause damage on GNU/Linux, which

is known to have around 40 native viruses, most of which were designed as a research cu-

riosity. 6 The majority of GNU/Linux users do not even bother installing virus protection software on their computers. In addition, GNU/Linux offers the flexibility of being used either via a graphical command interface (which is what Windows and Macintosh users are accustomed to) or via the “shell”. The shell, or command line interface, is the traditional way of interacting with the system on UNIX machines. Despite its arcane-looking appearance, the shell makes it easy to combine several programs to perform complex operations such as data mining. For example, the command

find / -name "*.csv" -exec grep -l 12345 {} \; | sort

will combine “find”, “grep”, and “sort” commands to search inside all .csv files in the com- puter and list in alphabetical order the ones that contain the string “12345”. Moreover, the shell interface allows the user to run commands on remote machines very easily via an ordinary Internet connection, which is extremely useful fo r computationally demanding tasks. Finally, another reason why many are adopting GNU/Linux as their main computing platform is the quality of software. Written without commercial concerns, most GNU

5 Reported by “Top500 Supercomputer Sites” (http://www.top500.org/stats) accessed on 4-13-

2007.

6 One such example is the Bliss virus, which was created as a pro of-of-concept. Bliss keeps a log of all its actions in the /tmp/.bliss file and even provides the thoughtful --bliss-uninfect-files-please command line option. It never became widespread.

3

Author-created Version: The original publication is accessible from http://ideas.repec.org/a/jae/japmet/v23y2008i2p279-286.html .

programs can be considered a labor of love. For example, even the simple calculator program “Calc” is capable of doing arbitrary precision arithmetic, symbolic algebra, and graphics, as well as operations on matrices, complex numbers, calendar dates, and so on. Notwithstanding its important advantages, GNU/Linux also has some drawbacks. One of the shortcomings frequently cited in the past was the lack of commercial software support. While this is much less an issue nowadays, prospective users should check the availability of any software that they find indispensable for doing research. Another disadvantage that GNU/Linux users commonly encounter is the fact that some exotic hardware may not work properly, if at all. This is a side-effect of the dominance of the MS Windows platform: Hardware vendors do not always release Linux drivers and are often reluctant to provide detailed specifications of their hardware. In this case, device drivers have to be reverse-engineered. Consequently, some pieces of hardware can be managed by the Linux kernel only weeks, if not months, after their introduction. While the list of devices that is supported by the Linux kernel is already enormous and in rapid expansion, it is always best to check beforehand. A comprehensive list of supported hardware can be found at http://en.tldp.org/HOWTO/Hardware-HOWTO/index.html . In addition, migrating to a new and unfamiliar operating system can be discouraging. Therefore, it is important to accept from the beginning that it will take some time and effort to adapt to a new environment. However, the very nature of FLOSS makes users very willing to cooperate with each other. Almost any question one may wish to ask has already been answered on some web page. Search engines are the FLOSS equivalent of customer support for proprietary software. GNU/Linux is designed for computer literate people. Having more control over the computer means doing some manual configuration or using the console to enter commands every so often. Consequently, as the saying goes: “If your VCR is blinking 12:00, you don’t want Linux.”

3 Obtaining and Installing GNU/Linux

Unlike other operating systems, GNU/Linux does not refer to a single product. In order

to better satisfy different needs and preferences, it comes in various flavors called “distri-

butions”. While there are hundreds of distributions, 7 most of these are modified versions

of the several well-established ones tailored for a certain language or a specific type of

usage. A typical distribution includes the Linux kernel, the GNU tools and libraries, the

X

graphical system, a desktop environment such as KDE or Gnome, along with hundreds

of

freedom respecting programs for handling different tasks.

Distributions can be either commercial or noncommercial. A commercial distribution such as Mandriva or Red Hat Enterprise Linux usually comes with a printed manual and telephone or email support for a certain number of weeks or months. On the other hand, noncommercial distributions usually offer detailed documentation in electronic format and community support through Internet forums. Some of the leading non-commercial

7 DistroWatch (http://www.distrowatch.com) is a popular destination for obtaining news and in- formation regarding the many flavors of GNU/Linux.

4

Author-created Version: The original publication is accessible from http://ideas.repec.org/a/jae/japmet/v23y2008i2p279-286.html .

distributions include Debian, Ubuntu, Fedora, openSUSE, G entoo, and Slackware. This review focuses on Debian GNU/Linux, which is a truly global and hardware agnostic distribution. Also, Debian supports a very large library of software packages, including those particularly useful for econometricians. Most noncommercial distributions including Debian are obtainable from the Internet in the form of downloadable .iso files. These large files contain exact images of a CD or a DVD for online distribution, and need to be written on blank media once acquired. As an alternative, the Debian web site (http://www.debian.org ) also provides a list of online vendors, which sell various distributions for a little more than the cost of media and shipping charges. GNU/Linux is capable of running on a large variety of hardware architectures includ- ing Intel x86 based processors, Intel IA-64, AMD64, IBM/Motorola PowerPC, SPARC, DEC Alpha, and ARM, with an extremely efficient use of hardware. Accordingly, any PC with a Pentium II or above and at least 128 MB ram is capable of running most GNU/Linux distributions. The graphical desktop environment tends to use a large part of system resources. Consequently, the requirements will b e significantly less if the ma-

chine is used for less intensive tasks such as running a dedicated web server, a firewall or

a database back-end. Recent versions of the Linux kernel support up to 64GB of RAM

and file systems as large as 16 terabytes (1 terabyte = 1024GB). Installing GNU/Linux is as simple and as fast as other operating systems. Most hardware is supported out of the box and automatically detected during installation.

The user is also given the option of keeping the existing setup on the computer, making

it

possible to have more than one operating system on a single PC. Once the installation

is

complete, maintaining the system and adding or removing software is easily done using

“package managers”. These smart programs use various online software repositories to fetch and configure the latest updates and additional software packages automatically. The Debian distribution has a tradition of being extremely good at this. Nearly all end-user programs (including upgrades) can be fetched from the official Debian archives through simple commands, keeping the system clean, consistent, and well-organized at all

times. When a new version of a program is released by its authors, a Debian maintainer repackages it and puts it into the unstable repository; after 10 days, if no critical bugs are discovered, it migrates into the testing repository. At regular intervals, the testing archive

is frozen and all bugs are eliminated. This becomes the stable distribution. Debian users

can choose to use either repository; most desktop users choose the testing repository. The stable repository, which contains somewhat outdated versions of the programs, is normally chosen for servers that need virtually bug-free software. At the time of writing, the number of packages in the official Debian testing repository was over 19,000.

GNU/Linux can be used even without installing on a hard drive. There are various self-configuring “live” distributions which are capable of running entirely from removable media such as a CD or a memory card. These live distributions instantly turn any computer into a full-blown GNU/Linux workstation and offer an effortless way to test drive this platform. They also make excellent rescue disks that can be used to retrieve files in case the computer fails to start. Some of the Debian based live distributions include Knoppix, Ubuntu, Mepis, and Quantian. These distributions also offer hard drive

5

Author-created Version: The original publication is accessible from http://ideas.repec.org/a/jae/japmet/v23y2008i2p279-286.html .

installation option and are recommended for new users. Quantian is a remastering of the highly regarded Knoppix distribution, tailored to include a large number of quantitative, numerical, and scientific software of interest to econometricians. Being a live DVD, however, it cannot be used on older computers. See Eddelbuettel ( 2000) for a review of this Debian based distribution.

4 Freedom Respecting Software for Economists

One of the main considerations for choosing an operating system is the amount and quality of software supported by the platform. While today most commercial scientific programs are available for GNU/Linux, there is also a large selection of high quality FLOSS alternatives. In fact, the math and science sections of Debian repositories feature hundreds of Free Software packages. In this section, we will attempt to provide a basic overview of some of the popular freedom respecting programs for doing research in eco- nomics. The functionality offered gratis by these programs to individual users, as well as economics departments, would require substantial financial resources to acquire through proprietary software. Also, being FLOSS, most of these programs are already ported to commercial platforms. This makes it possible for prospective users to become familiar with the software before actually switching to GNU/Linux. The leading econometrics package developed for GNU/Linux is GRETL, “GNU Re- gression, Econometrics and Time Series Library” (http://gretl.sourceforge.net ). This sophisticated program features an intuitive graphical user interface, an integrated scripting language, and a wide variety of econometric tests and estimators. It is also known to have high numerical accuracy and is capable of producing publication qual- ity output by integrating with gnuplot (http://www.gnuplot.info ), a portable data and function plotting utility. See Mixon and Smith ( 2006) and Yalta and Yalta ( 2007) for additional information regarding GRETL and Racine ( 2006) for a recent review of gnuplot. In addition to GRETL, GNU-R (http://www.r-project.org ) provides a powerful programming language and a statistical environment similar to the S language. R is very extensible, with over 1000 add-on packages obtainable from CRAN, The “Comprehensive R Archive Network”. It is also highly regarded for the quality of its graphical output. Two separate reviews of this program from the perspective of econometricians are provided by Cribari-Neto and Zarkos ( 1999) and Racine and Hyndman ( 2002). Also, for time-series analysis, an interesting cross-breed case is JMulTi; see L¨utkepohl and Kr¨atzig ( 2004) and the project website http://www.jmulti.com . Despite having Gauss, a proprietary product, as its main computational engine, JMulTi itself is released under a GNU license. It is an excellent time-series analysis package, especially for multiple time series methods (VARMAs and cointegrated systems). GNU/Linux also offers several good quality FLOSS alternatives for numerical com- putation. GNU Octave (http://www.gnu.org/software/octave ) is a high-level, matrix based language mostly compatible with the commercial Matlab program. Alternatively, Scilab ( http://www.scilab.org ) is also a numerical computational package featuring a large variety of built-in mathematical functions and rich data structures. However, while

6

Author-created Version: The original publication is accessible from http://ideas.repec.org/a/jae/japmet/v23y2008i2p279-286.html .

Scilab is a gratis and open source program, certain restrictions in its license disqualify it as Free Software . For further information on GNU Octave and Scilab, see Eddelbuettel ( 2000) and Mrkaic ( 2001), respectively. Econometric analysis frequently requires manipulation of symbolic expressions. While GNU Octave and Scilab provide some of this functionality, dedicated alternatives are also available. Maxima (http://maxima.sourceforge.net ) is a full-featured computer alge- bra system capable of symbolic differentiation and integration, handling Taylor series, Laplace transforms, ordinary differential equations and systems of linear equations. It of- fers arbitrary precision arithmetic and supports number sizes limited by machine memory. There is also an impressive utility named Qalculate! (http://qalculate.sourceforge.net ). Despite being designed as a desktop calculator, Qalculate! offers symbolic calculation, arbitrary precision, and supports constants, vectors, matrices, and complex numbers. For creating technical documents, most GNU/Linux distributions support the teT E X package (http://www.tug.org/tetex ), which provides a T E X and L A T E X installation that consists only of free/libre software, although most distributions are now phasing out teT E X in favor of the more modern alternative T E Xlive ( http://www.tug.org/texlive ). A variety of editors are also available such as L Y X( http://www.lyx.org ), emacs/AUCT E X ( http://www.gnu.org/software/auctex ), Kile ( http://kile.sourceforge.net ), and Texmaker (http://www.xm1math.net/texmaker ) to name a few. See Koning ( 2001) for an overview of different L A T E X editors and installations. Almost all GNU/Linux distributions come with a preinstalled copy of OpenOffice.org ( http://www.openoffice.org ), a multilingual and open source office suite that provides all the features expected from a collection of productivity programs. OpenOffice.org is capable of reading and writing Microsoft Office documents and has the additional feature of saving documents as .pdf files as well as in L A T E X format. There is also an excellent spreadsheet program named Gnumeric (http://www.gnome.org/projects/gnumeric ), which offers better numerical accuracy than the leading commercial alternative, as dis- cussed by McCullough ( 2005). Two programs worth mentioning are openMosix (http://openmosix.sourceforge.net ) and Xen ( http://www.cl.cam.ac.uk/Research/SRG/netos/xen ). openMosix is a clus- ter management system for parallel computing. It is used to turn a network of ordinary computers into a supercomputer, which can be put to work for computationally intensive projects such as simulations and bootstraps. 8 Xen, on the other hand, is a virtual ma- chine monitor. Using Xen, it is possible for the user to create a virtual computer running another operating system inside a window in GNU/Linux. Finally, as wide as the spectrum of available free software is, the possibility that an economist might need to run proprietary software on a GNU/Linux system is worth considering. A few econometric packages are not available under GNU/Linux as yet, but the majority of the most widely used ones are. GNU/Linux versions exist of Gauss, Mathematica, Matlab, RATS, Stata, and TSP, to name but a few. 9 The same goes for

8 For further information on parallel computing clusters, see Creel (2007), which shows how to reduce the computation of a Monte Carlo Study that involves 4,000,0 00 nonlinear optimizations to several hours using the ParallelKnoppix GNU/Linux distribution. 9 In fact, Matlab and Stata for GNU/Linux seem to handle large datasets better then their Windows counterparts.

7

Author-created Version: The original publication is accessible from http://ideas.repec.org/a/jae/japmet/v23y2008i2p279-286.html .

popular compilers and numerical libraries, such as the Intel FORTRAN compiler and the NAG and IMSL libraries. In addition, the Wine project (http://www.winehq.org ) should be mentioned: Its aim is to create a software compatibility layer which will enable users to run unmodified Windows software with little or no performance penalty. The current version, which is still considered “beta”, can already handle several Windows programs satisfactorily. For example, some versions of Eviews can easily be installed on a GNU/Linux machine using Wine.

5

Conclusion

The GNU/Linux platform provides a versatile, stable, and secure computing environment ideal for conducting econometric research. It also hosts a large collection of free/libre and open source programs that surpass their commercial alternatives in terms of both features and quality. Moreover, GNU/Linux offers significant advantages such as freedom and zero cost availability, which make it an excellent operating system in an academic setting. Changing operating systems is not as difficult as it sounds, especially when the tran- sition is done gradually. The various live CD/DVD distributions provide an effortless way to become familiar with the new platform. Afterwards, a dual installation can be considered, where the new user can easily switch back to the previous system for per- forming certain tasks he or she is not yet comfortable carrying out under GNU/Linux. A complete migration takes time. However, the stimulation of learning new things and the final feeling of liberation will make the effort worthwhile.

Acknowledgements

We wish to thank James MacKinnon for his suggestions and corrections on earlier drafts. For his comments, we also wish to thank Richard Stallman, who does not endorse the use of any non-free software mentioned in this paper.

References

Creel M. 2007. I ran four million probits last night: HPC clustering with ParallelKnoppix. Journal of Applied Econometrics , 22: 215–223.

Cribari-Neto F, Zarkos SG. 1999. R: Yet another econometric programming environment. Journal of Applied Econometrics , 14: 319–329.

de Laat P. 2005. Copyright or copyleft? An analysis of property regimes for software development. Research Policy, 34: 1511–1532.

Eddelbuettel D. 2000. Econometrics with Octave. Journal of Applied Econometrics , 15:

531–542.

8

Author-created Version: The original publication is accessible from http://ideas.repec.org/a/jae/japmet/v23y2008i2p279-286.html .

Koning RH. 2001. A comparison of different L A T E X programs. Journal of Applied Econo- metrics , 16: 81–92.

L¨utkepohl H, Kr¨atzig M (eds.). 2004. Applied Time Series Econometric . Cambridge Uni- versity Press, Cambridge.

MacKinnon JG. 1999. The Linux operating system: Debian GNU/Linux. Journal of Applied Econometrics , 14: 443–452.

McCullough BD. 2005. Fixing statistical errors in spreadsheet software: the cases of Gnu-

meric and

[Online; retrieved September 14, 2006].

Mixon JW, Smith RJ. 2006. Teaching undergraduate econometrics with GRETL. Journal of Applied Econometrics , 21: 1103–1107.

Mrkaic M. 2001. Scilab as an econometric programming system. Journal of Applied Econo- metrics , 16: 553–559.

Peeling N, Satchell J. 2001. Analysis of the Impact of Open Source Software. URL http://www.govtalk.gov.uk/documents/QinetiQ_OSS_rep.pdf , [Online; retrieved September 14, 2006].

Racine J. 2006. gnuplot 4.0: a portable interactive plotting utility. Journal of Applied Econometrics , 21: 133–141.

Racine J, Hyndman R. 2002. Using R to teach econometrics. Journal of Applied Econo- metrics , 17: 175–189.

Stallman R. 1985. The GNU manifesto. Dr. Dobb’s Journal of Software Tools , 10: 30–35.

Yalta AT, Yalta AY. 2007. GRETL 1.6.0 and its numerical accuracy. Journal of Applied Econometrics , 22: 849–854.

9