You are on page 1of 39

Introduction to Bioinformatics

BI 109: Basics in Bioinformatics

Unit I: Biology in the Computer Age, Challenges in biology, intersection of classical biology,
mathematics and computer science and the scope of bioinformatics. Bioinformatics and the
internet.

Unit II: Computational approaches to biological questions: The Nucleic acid world: the structure
of DNA and RNA, Molecular biology’s central dogma, gene structure and control, mathematical
modelling of biochemical systems.

Unit III: Basic Bioinformatics computer skills: Basics of computers, working on a Unix system,
setting up a linux workstation, different flavours of Unix, File system basics, commands for
working with directories and files, Sharing software among multiple users, commercial software
packages for biological applications, working in a multiuser environment, Unix shell scripts,
communicating with other computers.

Unit IV: Biological research on web: Using search engines, finding scientific articles, the public
biological databases, searching biological databases, depositing data into the public databases,
finding software, judging the quality of information

Unit V: Visualizing protein structures and computing structural properties: Protein structure
data, the chemistry of proteins, web-based protein structure tools, structure visualization,
structure analysis and optimization.
Recommended Textbooks and References:
1. Lesk, A. M. (2002). Introduction to Bioinformatics. Oxford: Oxford University Press.
2. Mount, D. W. (2001). Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor,
NY: Cold Spring Harbor Laboratory Press.
3. Baxevanis, A. D., & Ouellette, B. F. (2001). Bioinformatics: a Practical Guide to the
Analysis of Genes and Proteins. New York: Wiley-Interscience.
4. Pevsner, J. (2015). Bioinformatics and Functional Genomics. Hoboken, NJ.: Wiley-
Blackwell.
5. Bourne, P. E., & Gu, J. (2009). Structural Bioinformatics. Hoboken, NJ: Wiley-Liss.
6. Lesk, A. M. (2004). Introduction to Protein Science: Architecture, Function, and Genomics.
Oxford: Oxford University Press.
• Bioinformatics is the science of using
information to understand biology;
• it's the tool we can use to help us answer many
biological questions and many others like them.
• Strictly speaking, bioinformatics is a subset of
the larger field of computational biology , the
application of quantitative analytical techniques
in modeling biological systems.
• The field of bioinformatics relies heavily on work by
experts in statistical methods and pattern recognition.
• Researchers come to bioinformatics from many fields,
including mathematics, computer science, and linguistics.
• Unfortunately, biology is a science of the specific as well
as the general.
• Bioinformatics is full of pitfalls for those who look for
patterns and make predictions without a complete
understanding of where biological data comes from and
what it means.
• By providing algorithms, databases, user
interfaces, and statistical tools, bioinformatics
makes it possible to do exciting things such as
compare DNA sequences and generate results
that are potentially significant.
• But once you gain that understanding and
become an intelligent consumer of
bioinformatics methods, the speed at which
your research progresses can be truly amazing.
Why Should Biologists Use
Computers?
• Computers are powerful devices for
understanding any system that can be described
in a mathematical way.
• As our understanding of biological processes
has grown and deepened, it isn't surprising,
then, that the disciplines of computational
biology and, more recently, bioinformatics,
have evolved from the intersection of classical
biology, mathematics, and computer science.
How Is Computing Changing Biology?
• An organism's hereditary and functional information is stored as DNA, RNA,
and proteins, all of which are linear chains composed of smaller molecules.
• These macromolecules are assembled from a fixed alphabet of well-
understood chemicals:
• DNA is made up of four deoxyribonucleotides (adenine, thymine, cytosine,
and guanine),
• RNA is made up from the four ribonucleotides (adenine, uracil, cytosine, and
guanine),
• and proteins are made from the 20 amino acids.
• Because these macromolecules are linear chains of defined components, they
can be represented as sequences of symbols.
• These sequences can then be compared to find similarities that suggest the
molecules are related by form or function.
• Sequence comparison is possibly the most useful
computational tool to emerge for molecular biologists.
• The World Wide Web has made it possible for a single
public database of genome sequence data to provide
services through a uniform interface to a worldwide
community of users.
• With a commonly used computer program called
BLAST, a molecular biologist can compare an
uncharacterized DNA sequence to the entire publicly
held collection of DNA sequences.
What Does Informatics Mean to
Biologists?
• The science of informatics is concerned with the
representation, organization, manipulation,
distribution, maintenance, and use of information,
particularly in digital form.
• There is more than one interpretation of what
bioinformatics—the intersection of informatics and
biology—actually means, and it's quite possible to go
out and apply for a job doing bioinformatics and find
that the expectations of the job are entirely different
than you thought.
• The functional aspect of bioinformatics is the
representation, storage, and distribution of data.
• Intelligent design of data formats and databases,
creation of tools to query those databases, and
development of user interfaces that bring together
different tools to allow the user to ask complex
questions about the data are all aspects of the
development of bioinformatics infrastructure.
• Developing analytical tools to discover knowledge in data
is the second, and more scientific, aspect of
bioinformatics.
• There are many levels at which we use biological
information, whether we are comparing sequences to
develop a hypothesis about the function of a newly
discovered gene, breaking down known 3D protein
structures into bits to find patterns that can help predict
how the protein folds, or modeling how proteins and
metabolites in a cell work together to make the cell
function.
• What skills should a bioinformatician have?

• Why should biologists use computers?

• How Can I Configure a PC to Do Bioinformatics


Research?

• Setting up your Workstation Working on a Unix


System
What Skills Should a Bioinformatician
Have?
• There's a wide range of topics that are useful if
you're interested in pursuing bioinformatics,
and it's not possible to learn them all.
What Skills Should a Bioinformatician
Have?
• You should have a fairly deep background in some aspect of
molecular biology. It can be biochemistry, molecular biology,
molecular biophysics, or even molecular modeling, but without a core
of knowledge of molecular biology you will, as one person told us,
"run into brick walls too often.“
• You must absolutely understand the central dogma of molecular
biology. Understanding how and why DNA sequence is transcribed
into RNA and translated into protein is vital.
• You should have substantial experience with at least one or two major
molecular biology software packages, either for sequence analysis or
molecular modeling. The experience of learning one of these packages
makes it much easier to learn to use other software quickly.
What Skills Should a Bioinformatician
Have?
• You should be comfortable working in a
command-line computing environment.
Working in Linux or Unix will provide this
experience.
• You should have experience with
programming in a computer language such as
C/C++, as well as in a scripting language such
as Perl or Python.
What Skills Should a Bioinformatician
Have?
• Molecular evolution and systematics; physical
chemistry-kinetics, thermodynamics and
statistical mechanics, statistics and
probabilistic methods: database design and
implementation; algorithm development;
molecular biology laboratory methods and
others.
How Can I Configure a PC to Do
Bioinformatics Research?
• Up to now you've probably gotten by using word-processing
software and other canned programs that run under user-friendly
operating systems such as Windows or MacOs.
• In order to make the most of bioinformatics, you need to learn Unix,
the classic operating system of powerful computers known as servers
and workstations.
• Most scientific software is developed on Unix machines, and serious
researchers will want access to programs that can be run only under
Unix.
• Unix comes in a number of flavors, the two most popular being BSD
and SunOs.
• Recently, however, a third choice has entered the marketplace:
Linux. Linux is an open source Unix operating system.
Computing?
• The use or operation of computers

• The term "computing" is also synonymous with


counting and calculating.

• Computing is any activity that uses computers to


manage, process, and communicate information.
• It includes development of both hardware and 
software.
Purpose of computing?
• For processing, structuring, and managing
various kinds of information; doing
scientific studies using computers; making
computer systems behave intelligently;
creating and using communications and
entertainment media; finding and
gathering information relevant to any
particular purpose, and so on.
• The von Neumann architecture—the fundamental architecture upon which
nearly all digital computers have been based—has a number of characteristics
that have had an immense impact on the most popular 
programming languages. These characteristics include a single, centralized
control, housed in the central processing unit, and a separate storage area,
primary memory, which can contain both instructions and data. The
instructions are executed by the CPU, and so they must be brought into the
CPU from the primary memory. The CPU also houses the unit that performs
operations on operands, the arithmetic and logic unit (ALU), and so data must
be fetched from primary memory and brought into the CPU in order to be
acted upon. The primary memory has a built-in addressing mechanism, so that
the CPU can refer to the addresses of instructions and operands. Finally, the
CPU contains a register bank that constitutes a kind of “scratch pad” where
intermediate results can be stored and consulted with greater speed than
could primary memory.
Von Neumann Architecture
• Von Neumann architecture was first published by John von
Neumann in 1945.

• His computer architecture design consists of a Control Unit, 


Arithmetic and Logic Unit (ALU), Memory Unit, Registers and
Inputs/Outputs.

• Von Neumann architecture is based on the stored-program


computer concept, where instruction data and program data
are stored in the same memory.  This design is still used in
most computers produced today.
Central Processing Unit (CPU)
• The Central Processing Unit (CPU) is the electronic
circuit responsible for executing the instructions of
a computer program.

• It is sometimes referred to as the microprocessor or


processor.

• The CPU contains the ALU, CU and a variety of 


registers.
Registers
• Registers are high speed storage areas in the
CPU.  All data must be stored in a register
 before it can be processed.
Holds the memory location
MAR Memory Address Register of data that needs to be
accessed

Holds data that is being


MDR Memory Data Register transferred to or
from memory

Where intermediate
AC Accumulator arithmetic and logic results
are stored

Contains the address of the


PC Program Counter next instruction to be
executed

Contains the current


CIR Current Instruction Register instruction during
processing
Arithmetic and Logic Unit (ALU)
• The ALU allows arithmetic (add, subtract etc)
and logic (AND, OR, NOT etc) operations to be
carried out.
Control Unit (CU)
• The control unit controls the operation of the
computer’s ALU, memory and input/
output devices, telling them how to respond to
the program instructions it has just read and
interpreted from the memory unit.
• The control unit also provides the timing and
control signals required by other computer
components.
Buses
• Buses are the means by which data is
transmitted from one part of a computer to
another, connecting all major internal
components to the CPU and memory.
• A standard CPU system bus is comprised of a 
control bus, data bus and address bus.

Carries the addresses of data


Address Bus (but not the data) between the
processor and memory
Carries data between the
Data Bus processor, the memory unit and
the input/output devices
Carries control
signals/commands from the
CPU (and status signals from
Control Bus other devices) in
order to control and
coordinate all the activities
within the computer
Memory Unit
• The memory unit consists of RAM, sometimes referred to as
primary or main memory.  Unlike a hard drive (secondary
memory), this memory is fast and also directly accessible by
the CPU.
• RAM is split into partitions.  Each partition consists of an
address and its contents (both in binary form).
• The address will uniquely identify every location in the
memory.
• Loading data from permanent memory (hard drive), into the
faster and directly accessible temporary memory (RAM),
allows the CPU to operate much quicker.
Setting up your Workstation
Working on a Unix System
• You are probably accustomed to working with
personal computers; you may be familiar with
windows interfaces, word processors, and
even some data-analysis packages.
• But if you want to use computers as a serious
component in your research, you need to
work on computer systems that run under
Unix or related multiuser operating systems.
What Does an Operating System Do?
• The operating system breathes life into the
inert body of your computer.
• It handles the low level processes that make
hardware work together and provides an
environment in which you can run and develop
programs.
• The most important function of the operating
system is that it allows you convenient access
to your files and programs.
Why Use Unix?
• So if the operating system is something you're not
supposed to notice, why worry about which one
you're using? Why use Unix?
• Unix is a powerful operating system for multiuser
computer systems. It has been in existence for
over 25 years, and during that time has been used
primarily in industry and academia, where
networked systems and multiuser high-
performance computer systems are required.
• Unix is the operating system of the World
Wide Web; the software that powers the Web
was invented in Unix, and many if not most
web servers run on Unix servers.
• Unix has been used extensively in
universities, where much software for
scientific data analysis is developed, you will
find a lot of good-quality, interesting scientific
software written for Unix systems.
• Computational biology and bioinformatics
researchers are especially likely to have
developed software for Unix, since until the
mid-1990s, the only workstations able to
visualize protein structure data in realtime
were Silicon Graphics and Sun Unix
workstations.

You might also like