UNIX / LINUX
ASSIGNMENT SUBMISSION124
BT-
SUBMITTED BY: DK CHETHAN
24-B-BT-09
-
1.Introduction
In the field of bioinformatics, where we deal with huge amounts of biological data such as DNA, RNA, and
protein sequences, the choice of operating system becomes very important. We cannot rely only on Windows
or Mac for advanced research because they are not always stable for large-scale data processing.
Here, UNIX and Linux come as saviours. UNIX is the ancestor system, developed way back in the 1970s, and
Linux is its modern, open-source version created in 1991. Today, Linux has become the backbone of
bioinformatics research. Almost every bioinformatics tool – from BLAST for sequence search, ClustalW for
alignments, BWA for genome mapping, to AutoDock for protein docking – is designed to run on UNIX/Linux
platforms.
Therefore, for any bioinformatics student or researcher, learning UNIX/Linux is as important as learning the
biological concepts themselves.
2. History of UNIX and Linux
UNIX was developed in 1969 at AT&T Bell Labs by great scientists like Ken Thompson and Dennis
Ritchie (who also created the C programming language).
In 1971, the first official UNIX system was released. Later in 1973, it was re-written in C, which made it
portable and flexible.
By the late 1970s and 1980s, different UNIX versions like BSD and System V came out and were used in
universities and industries.
Linux started in 1991 when a Finnish student, Linus Torvalds, developed a small UNIX-like kernel. He
shared it freely under the GNU GPL license, and it became open-source.
Soon, Linux grew into full operating systems like Slackware, Debian, and later Ubuntu.
Today, Linux powers supercomputers, Android phones, servers, and almost all bioinformatics clusters.
So we can say that Linux is like the “people’s version of UNIX” – free, powerful, and made by the community.
3. Key Differences Between UNIX and Linux
Feature UNIX Linux
Source Code Mostly closed Completely open-source
Cost Paid, expensive Free of cost
License Proprietary GNU GPL
Enterprises, Students, researchers,
Users
industries enterprises
Flexibility Less flexible Highly customizable
Widely used, supports
Bioinformatics Use Limited
most tools
4. Structure of Linux Operating System
According to your PDF and standard OS concepts, Linux has the following parts:
1. Kernel – The heart of the OS. Manages CPU, memory, processes, and hardware devices.
2. Shell – The interpreter where we type commands (bash, zsh, csh). It connects the user to the kernel.
3. File System – Organises data in directories. In Linux, everything is treated as a file, even devices.
4. System Libraries – Provide functions to programs (glibc, pthread, math libraries).
5. Utilities – Commands like ls, cp, mv, ps which we use daily.
6. User Interface – Mainly CLI (command line), but also GUIs like GNOME and KDE.
5. Features of UNIX/Linux
1. Open Source – Free to use, modify, and share.
2. Multiuser System – Multiple people can work on the same machine simultaneously.
3. Multitasking – Run many processes at the same time.
4. Security – Strong file permissions, firewall, and regular updates.
5. Stability – Rarely crashes, very reliable for long-running jobs.
6. Portability – Runs on different hardware, from laptops to supercomputers.
7. Networking – Built-in tools for data transfer and server management.
8. File System – Everything is treated as a file, even devices.
9. Programming Support – Best environment for running Python, R, Perl, Java, C, etc.
10. Distributions – Many flavours available (Ubuntu, Debian, Fedora, CentOS, Bio-Linux).
6. Installing Linux for Bioinformatics
There are many ways to install Linux for our work:
1. Dual Boot – Install Linux alongside Windows on your laptop/PC.
2. Virtual Machine (VM) – Use software like VirtualBox to run Linux inside Windows.
3. WSL (Windows Subsystem for Linux) – In Windows 10/11, you can directly run Linux terminal.
4. Cloud Servers – Many bioinformatics labs use Linux servers on AWS, Google Cloud, or institutional HPC
clusters.
5. Bio-Linux – A special Linux distribution preloaded with bioinformatics software like BLAST, ClustalW,
EMBOSS, and Bowtie.
Steps for basic Ubuntu installation:
Download ISO from ubuntu.com
Make a bootable USB
Boot from USB and install Ubuntu
After installation, update system (sudo apt update && sudo apt upgrade)
7. UNIX/Linux Commands in Bioinformatics
Linux is famous for its command-line power. With just a few commands, we can handle huge FASTA/FASTQ or
genomic files.
Some examples:
ls → lists files (e.g., check all sequence files in folder).
cat → shows contents of files (view FASTA sequences).
grep “ATGC” file.fa → search a DNA motif in a sequence file.
wc -l file.fastq → count total reads.
head -n 20 file.fa → show first 20 lines of sequence file.
awk '{print $1}' blast.out → extract first column from BLAST output.
sed 's/A/T/g' seq.fa → replace A with T in a sequence file.
chmod 755 script.sh → give permission to run a script.
tar -czvf data.tar.gz data/ → compress datasets.
top → monitor system usage when running BLAST.
8. Applications of UNIX/Linux in Bioinformatics
1. Sequence Analysis → BLAST, ClustalW, MUSCLE.
2. Genomics → Read mapping (BWA, Bowtie), SNP calling (SAMtools, GATK).
3. Transcriptomics → RNA-seq analysis using STAR, DESeq.
4. Proteomics → Protein structure and docking (GROMACS, AutoDock).
5. Big Data Pipelines → Snakemake, Nextflow for automation.
6. HPC & Cloud → Running jobs on Linux clusters with SLURM scheduler.
9. Advantages for Indian Bioinformatics Students
Free of Cost – Linux is free, so no software expense, perfect for students.
Works Everywhere – From personal laptops to big supercomputers.
Huge Community – Easy to find help on forums and Indian labs also prefer Linux clusters.
Reproducibility – Easy to share code and pipelines with labmates.
Integration with Programming – Runs Python, R, Perl easily for data analysis.
9. Conclusion
UNIX and Linux are not just operating systems, they are tools of discovery in bioinformatics. Whether we are
aligning DNA sequences, predicting protein structures, or analysing RNA-seq data, Linux provides the most
efficient and reliable environment.
10. References
1. National Center for Biotechnology Information (NCBI). https://www.ncbi.nlm.nih.gov
2. UniProt Consortium. (2023). UniProt: the Universal Protein Resource. Nucleic Acids search, 51(D1),
D523–D531.
3. Linux Documentation Project. https://tldp.org
4. GeeksforGeeks. Introduction to Linux Operating System. Available at: https://www.geeksforgeeks.org
5. Bio-Linux Project. Bioinformatics Linux Distribution. Available at: http://nebc.nerc.ac.uk/tools/bio-linux