Introduction to
UNIX / Linux
What is an Operating System?
• The operating system (OS) is the program which starts up when you turn on
your computer and runs underneath all other programs - without it nothing
would happen at all.
• In simple terms, an operating system is a manager. It manages all the available
resources on a computer, from the CPU, to memory, to hard disk accesses.
UNIX History
• UNIX operating system was born in the late 1960s.
• It originally began as a one man project led by Ken Thompson of
Bell Labs, and has since grown to become the most widely used
operating system.
• In the time since UNIX was first developed, it has gone through
many different generations and even mutations.
– Some differ substantially from the original version, like Berkeley Software
Distribution (BSD) or Linux.
– Others, still contain major portions that are based on the original source
code.
Flavors of UNIX
• These can be grouped into two categories:
➢ Open Source
➢ Proprietary
• Proprietary: (redistribution and modification prohibited or restricted; not
free)
– Solaris
– IRIX
– Mac OS X
– and some others...
Linux
The Linux operating system (OS) was first
coded by a Finnish computer programmer
called Linus Benedict Torvalds in 1991,
when he was just 21!
He had got a new 386, and he found the
existing DOS and UNIX too expensive and
inadequate.
In those days, a UNIX-like tiny, free OS called Minix was
extensively used for academic purposes. Since its source
code was available, Linus decided to take Minix as a model.
What is Linux?
Linux + GNU Utilities = Free Unix
• a set of small programs
• Linux is an O/S core written by Richard
written by Linus Torvalds Stallman and others. They
and others AND are the GNU utilities.
http://www.gnu.org/
Flavors of Linux
Billion of Users Worldwide
Linux vs. Windows
Linux Windows
Open source operating system Proprietary software which needs
to be purcahsed
Free of cost Not free
Case sensitive, Case insensitive
More efficient, allows the user to Less efficient, does not allow
use the computers strengths changes at the source code and
better lesser options for developers
More secure, less prone to virus Prone to virus,
Ext2 or ext3 file system FAT32 or NTFS file system
Many linus flavors available from Only versions are available but
different developers supplying company is same i.e
Microsoft
Linux for Bioinformatics
• Workstations are needed to analyze biological problems and Linux skills are a must
to handle large datasets
• Requirement of running commands for data analysis which is not possible on
Windows whereas Linux is very useful for this purpose
• It is easy to handle data, write scripts, connect scripts, use command lines
• Linux/UNIX has powerful text processing tools which are highly suited to working
with sequence data
• Database handling is easier and open source tools are available on Linux
• Several biological analysis pipelines use Linux
• R, Perl, Python all are associated with Linux
General Characteristics of LINUX as an Operating System (OS)
• Linux distribution offers integrated software worth thousands of rupees for no cost
• Linux operating system is reliable, stable, and very powerful
• Linux comes with a complete development environment, including compilers, toolkits,
and scripting languages
•Linux utilizes your memory, CPU, and other hardware to the fullest
• A wide variety of open source software are available for installation on Linux
• Supports multiple processors as standard
• Excellent for multitasking
• Linux comes with networking facilities,
• More flexibility to to share hardware
LINUX Interfaces
– Local Installation at your machine (your end)
– Dual-bootable PC which can be booted into Linux and or
Windows mode.
– Connect remotely to one of the UNIX servers (whether
from home or at the labs). You can logon using command
line to external systems in text mode
– Use a Virtual Machine (Vmware and Linux Image) or
Cygwin
https://www.cygwin.com/
https://www.vmware.com/in.html
Installing Linux on Windows
• Create a Virtual Machine using Vmware
• Download Vmware from
http://www.vmware.com/in/
• Install it, Disk should have space of 20 GB
• Download the Image of Linux (any flavor) from http://www.thoughtpolice.co.uk/vmware/
• Uncompress linux
• Import in Vmware and start using it
Graphical User Interfaces (GUIs)
• On logon locally, a graphical
environment will appear
• Start at a graphical login screen
• Enter the username and password
• Once you enter in your username and
password, you are then presented with
a graphical environment that is
convenient like Windows or Mac GUIs
these days
Programming Tools and Utilities Available under Linux
• Text Editors • Debuggers
– Xemacs – C / C++ debugger - gdb
– Emacs • Interpreters
– Pico – Perl - perl , Python
– Vi
• Miscellaneous
– Gedit, Kedit
– Web Browsers – Mozilla,
• Compilers Firefox, and Chrome
– C compiler - gcc – Instant Messengers
– C++ compiler - g++ – Email – like Outlook
– Java compiler & Java
Virtual Machine - javac &
java
Linux for Bioinformatics
• The use of computers to analyze biological problems- Bioinformatics
• As biological data sets have grown larger and biological problems have
become more complex, the requirements for computing power have also
grown.
• Computers that can provide this power generally use the Unix operating
system - so you must learn Unix
• Linux/UNIX has powerful text processing tools which are highly suited to
working with sequence data
• While many bioinformatics tools have Web interfaces, many more are
available via the UNIX command line
Few free Bioinformatics SW for Linux
• Linux operating system, mySYQL database
• Perl/Python - programming language
• Blast and Fasta - similarity search
• Clustal - multiple alignment
• Phylip - phylogenetics
• Sequence analysis pipelines
• EMBOSS - a complete sequence analysis package created by the
EMBL
Linux directories
• /bin System binaries, including the command shell
• /boot Boot-up routines
• /dev Device files for all your peripherals
• /etc System configuration files
• /home User directories
• /lib Shared libraries and modules
• /lost+found Lost-cluster files, recovered from a disk-check
• /mnt Mounted file-systems
• /opt Optional software
•/proc Kernel-processes pseudo file-system
• /root Administrator’s home directory
• /sbin System administration binaries
•/usr User-oriented software
• /var Various other files: mail, spooling and logging
Why Linux in Bioinformatics ?
• One definition of bioinformatics is "the use of computers to analyze biological
problems.”
• As biological data sets have grown larger and biological problems have
become more complex, the requirements for computing power have also
grown.
• Computers that can provide this power generally use the Unix operating
system - so you must learn Unix
• Linux/UNIX has powerful text processing tools which are highly suited to
working with sequence data
• While many bioinformatics tools have Web interfaces, many more are
available via the UNIX command line
Some Advantages
• Linux/Unix is very stable - computers running
Linux/Unix almost never crash
• Linux/Unix is very efficient
• it gets maximum number crunching power out of your
processor (and multiple processors)
• it can smoothly manage extremely huge amounts of data
• it can give a new life to otherwise obsolete Macs and PCs
•
• Most new bioinformatics software is created
for Unix - its easy for the programmers
Few free Bioinformatics SW for Linux
• Linux operating system, mySYQL database
• Perl - programming language
• Blast and Fasta - similarity search
• Clustal - multiple alignment
• Phylip - phylogenetics
• Phred/Phrap/Consed - sequence assembly
and SNP detection
• EMBOSS - a complete sequence analysis
package created by the EMBL
Linux basics..
• Linux/Unix is case sensitive i.e. WHO is not
same as who
• Unix shell is a command program to
communicate with a computer
• Shell interprets the command that you enter
on keyboards
• Shell commands can be used to automate
various programming tasks
Connecting to a Unix/Linux system
• Open a terminal:
Connecting to a Unix/Linux system
• Open up a terminal:
The “prompt”
The current
directory
(“path”)
The host
Linux commands
• Usually short and cryptic like
– vi or rm
• Commands may also have modifiers for
advance options like:
– “ls –l” and “mv –R” are different that “ls” or “mv”
respectively
Wildcards
• You can substitute the * as a wildcard symbol for any
number of characters in any filename.
• If you type just * after a command, it stands for all files in
the current directory:
ls * will list all files
• You can mix the * with other characters to form a search
pattern:
ls a*.txt will list all files that start with “a”
and end in “.txt”
• The “?” wildcard stands for any single character:
cp file?.doc will copy file1.doc, file2.doc,
file3.doc, etc.
Help on command line
• man : Type man and the name of a command
to read the manual page for that command.
e.g. “man ls”
Some important commands in Linux
• ls, Give a listing of the current directory. Try also ls -l
• cp, Copy file from source to destination
• mv, Move file from source to destination. If both are the same directory, the file is
renamed
• vi, Edit a file. vi is one of the most powerful text editors
• chmod, Change file permissions
• mkdir, rmdir Make/Remove a directory
• cd, Change directory
• rm, Remove a file. Can also remove directory tree
• man ls, Get help for ls. All commands have help
Commands for handling files
• cat : Concatenate program. Can be used to concatenate multiple files together into a single file, or, much more
frequently, to send the contents of a file to the terminal for viewing.
• more : Scroll through a file page by page. Very useful when viewing large files. Works even with files that are
too big to be opened by a text editor.
• less : A version of more with more features.
• grep : Filter a file for lines matching a specified pattern. Can also be reversed to print out lines that don't
match the specified pattern.
• head : Display the top 10 line of a file. You can control how many lines to view.
• tail : Display the last 10 lines of a file. You can control how many lines to view.
• wc : Count words, lines and/or characters in one or more files.
• sort : Sort the lines in a file alphabetically or numerically.
• uniq : Remove duplicated lines in a file.
• cut : Remove sections from each line of a file or files.
Command: pwd
• To find your current path use “pwd”
Command: cd
• To change to a specific directory use “cd”
Command: cd
• To change a directory or going to a new
directory path
Command: ls
• To list the files in the current directory use “ls”
Command: ls
• ls has many options
– -l long list (displays lots of info)
– -t sort by modification time
– -S sort by size
– -h list file sizes in human readable format
– -r reverse the order
• “man ls” for more options
• Options can be combined: “ls -ltr”
Command: ls -ltr
• List files by time in reverse order
with long listing
Wild card: *
• “*” can be used as a wildcard in
unix/linux. For example ls *
Command: mkdir
• To create a new directory use
“mkdir”
Command: rmdir
• To remove a directory, use “rmdir”
Displaying the contents of a file
• cat
• less
• head
• tail
Command: cat
• Dumps an entire file to standard output
• Good for displaying short, simple files
Command: head
• “head” displays the top part of a file
• By default it shows the first 10 lines
• -n option allows you to change that
• “head -n50 file.txt” displays the first 50 lines of
file.txt
Command: tail
• Same as head, but shows the last lines
File Commands
• Copying a file: cp
• Move or rename a file: mv
• Remove a file: rm
Command: cp
• To copy a file use “cp”
Command: mv
• To move a file to a different location use “mv”
Command: mv
• mv can also be used to rename a file
Command: rm
• To remove a file use “rm”
What we learnt in this lecture
• Introduction to Linux
• Basic Linux commands for file handling
What is Linux?
• Linux is a Unix clone written from scratch by Linus
Torvalds with assistance from a loosely-knit team
of hackers across the Net.
• Unix is a multitasking, multi-user computer
operating system originally developed in 1969 by
a group of AT&T employees at Bell Labs.
• Linux and Unix strive to be POSIX compliant.
• 64% of the world’s servers run some variant of
Unix or Linux. The Android phone and the Kindle
run Linux.
The Linux Philosophy
The *Nix Philosophy of Doug McIlroy
(i) Make each program do one thing well. To do a new job, build
afresh rather than complicate old programs by adding new
features.
(ii) Expect the output of every program to become the input to
another, as yet unknown, program. Don't clutter output with
extraneous information. Avoid stringently columnar or binary
input formats. Don't insist on interactive input.
(iii) Use tools in preference to unskilled help to lighten a
programming task, even if you have to detour to build the tools
and expect to throw some of them out after you've finished using
them.
What is
Linux?
It’s an Operating
System
What is an Operating System?
• The operating system (OS) is the program which starts up
when you turn on your computer and runs underneath all
other programs - without it nothing would happen at all.
• In simple terms, an operating system is a manager. It manages
all the available resources on a computer, from the CPU, to
memory, to hard disk accesses.
What is Linux?
“Small programs that do one thing well”
(see unix-reference.pdf)
• Network: ssh, scp, ping, telnet, nslookup, wget
• Shells: BASH, TCSH, alias, watch, clear, history, chsh, echo, set,
setenv, xargs
• System Information: w, whoami, man, info, which, free, echo, date,
cal, df, free, man, info
• Command Information: man, info
• Symbols: |, >, >>, <, &, >&, 2>&1, ;, ~, ., .., $!, !:<n>, !<n>
• Filters: grep, egrep, more, less, head, tail
• Hotkeys: <ctrl><c>, <ctrl><d>
• File System: ls, mkdir, cd, pwd, mv, ln, touch, cat, file, find, diff, cmp,
/net/<hostname>/<path>, mount, du, df, chmod, find
• Line Editors: awk, sed
• File Editors: vim, gvim, emacs –nw, emacs
Linux vs. Windows
• OS does not have to use a graphical interface.
– The OS itself (the kernel) is incredibly small.
– The GUI just another application (or set of applications) that can be
installed and run on top the existing text-based OS.
• File system differences.
– Windows typically uses FAT32 or NTFS file systems.
– Linux typically uses the ext2 or ext3 file systems
– Larger research and university environments, where file access is
necessary across the network, use Network File System (NFS)
– Windows lists all drives separately (A:,C:,D:, etc…), with “My
Computer” at the highest level.
– UNIX starts its highest level at “/” and drives can be mounted
anywhere underneath it.
General Characteristics of LINUX as an Operating System (OS)
• A Linux distribution has software worth thousands of dollars, for
virtually no cost
• Linux operating system is reliable, stable, and very powerful
• Linux comes with a complete development environment, including
compilers, toolkits, and scripting languages
• Linux comes with networking facilities, allowing you to share
hardware
• Linux utilizes your memory, CPU, and other hardware to the fullest
• A wide variety of commercial software is also available
• Linux is very easily upgradeable
• Supports multiple processors as standard
• True multitasking. So many apps, all at once
• The GUIs are more powerful than Mac!
Linux OS
GNU is a Unix-like operating system that is free software—it respects your freedom.
Linux-based versions of GNU can be installed as free software.
LINUX Interfaces
Ways of connecting to computers
– Dual-bootable PC that you have booted into Linux and logged
onto. All of your commands are then being run locally on that
computer. When you logon in this manner you have a full GUI
environment.
– Connect remotely to one of the UNIX servers (whether from
home or at the labs). When you logon in this manner you have a
command line (or text based) environment. You can also open
up a command line on local lab machines as well.
– Use a Virtual Machine (Vmware and Linux Image)
– Use Cygwin
Graphical User Interfaces (GUIs)
• On logon locally, you are presented with graphical
environment.
• You start at a graphical login screen.
• Enter your username and password. You also the have
the option to choose from a couple session types. In some
Linux, mainly you have the choice between Gnome and KDE.
• Once you enter in your username and password, you are
then presented with a graphical environment that looks
like one of the following.
What is a Desktop Manager?
• Gnome and KDE are examples of desktop managers. Both of these look a
lot like Microsoft Windows.
– They have the equivalent of a Start Menu, have an equivalent of
Windows Explorer, and have some sort of control panel.
• Desktop Manager provides you with the ability to manage all of the details
of your system that would otherwise require you to type in a bunch of
commands in a terminal window.
– These details include managing your files, launching programs,
configuring various aspects of your system, etc.
• It is also worthy to note that the desktop manager is optional. Many older
systems did not have a desktop manger that sat in-between the X server
and the Window manager.
Some Notes on X window, Desktop Managers &
Window Managers
• Most UNIX systems can be installed without the GUI.
• The GUI is just another application that runs on top
of the operating system.
• There are many implementations of all three of these
components.
– It is possible to mix and match implementation and
versions of these.
– They need not be alike and need not be all by the same
organization.
• This is quite a shift in paradigm from Microsoft and
Apple.
Programming Tools and Utilities Available
under Linux
• Text Editors • Debuggers
– Xemacs – C / C++ debugger - gdb
– Emacs • Interpreters
– Pico – Perl - perl
– Vi – Tcl/Tk - tcl & wish
– Gedit, Kedit
• Miscellaneous
• Compilers – Web Browsers – Mozilla,
– C compiler - gcc Firefox, and Chrome
– C++ compiler - g++ – Instant Messengers
– Java compiler & Java Virtual – Email – like Outlook
Machine - javac & java
Networking
• telnet
– Log into a remote host machine.
• rlogin
– Almost the same as telnet, but uses a different protocol.
• ping
– See if a remote host is up.
• ftp
– Transfer files using the File Transfer Protocol.
• netscape
– Run the Netscape web browser.
• trn
– Read Internet News.
• pine
– Read your mail using a full-screen display.
• mail
– Read your mail using an ancient command-line program.
• who
– See who else is logged in.
• talk
– Talk to someone else who is current logged in.
• lp
– Send a file or set of files to a printer.
Remote login on Linux Server
• Install putty.exe from
http://www.chiark.greenend.org.uk/~sgtatham/
putty/download.html
• IP Address to connect: 122.168.101.3
• User name: bioinfo
• Password:
Remote login on Linux Server
• ssh bio402@ipaddress
• Accept the Key
• Provide password
Connecting to a Unix/Linux system
• Open up a terminal:
Connecting to a Unix/Linux system
• Open up a terminal:
The “prompt”
The current directory (“path”)
The host
What exactly is a “shell”?
• After logging in, Linux/Unix starts another program
called the shell
• The shell interprets commands the user types and
manages their execution
• The shell communicates with the internal part of the operating
system called the kernel
• The most popular shells are: tcsh, csh, korn, and bash
• The differences are most times subtle
• For this tutorial, we are using bash
• Shell commands are CASE SENSITIVE!
Help!
• Whenever you need help with a command
type “man” and the command name
Help!
Help!
Help!
Unix/Linux File System
NOTE: Unix file names
are CASE SENSITIVE!
/home/mary/
/home/john/portfolio/
The Path
Perl for Bioinformatics
What is PERL
Practical Extraction and Report Language
• Perl is an interpreted language optimized for scanning
arbitrary text files, extracting information from those text
files, and printing reports based on that information.
• The language is intended to be practical (easy to use,
efficient, complete) rather than beautiful (tiny, elegant,
minimal).
• It combines some of the best features of C, sed, awk, and sh.
Why PERL
• Easy to write
• Doesn’t require declaration
• Inter-converting incompatible data formats
• Powerful regular expressions and String manipulation operators
• Object oriented
• Use pipe, system call or socket
• Good language for web CGI scripting
• Availability of PERL modules Bioinformatics and Internet