You are on page 1of 44

ITE312 - Basic Bioinformatics

Introduction to Bioinformatics

Five modules
– Introduction to Bioinformatics
– Sequencing Alignment and Dynamic
– Sequence Databases and Uses
– Evolutionary Trees and Phylogeny
– Special Topics in Bioinformatics

Data & Information
• Data is a representation of a fact, figure,
and idea.
• In computer science – data are numbers,
words, images, etc.
• Information is an ordered sequence of


textual and numerical information by a microelectronics-based combination of computing and telecommunications" – Dennis Longley. Information Technology • Information technology (IT) is "the acquisition. pictorial. Michael Shain (1985) Dictionary of Information Technology . processing. storage and dissemination of vocal.

. Informatics • Informatics is the study of application of computer and statistical techniques to the management of information. • Biology in the 21st century is being transformed from a purely lab-based science to an information science as well.

and information technology merge to form a single discipline. – National Center for Biotechnology Information (NCBI) . computer science. Bioinformatics • Bioinformatics is the field of science in which biology.

. Bioinformatics • Bioinformatics is the marriage of biology and information technology. • Bioinformatics is the application of statistics and computer science to the field of molecular biology.

Bioinformatics • The term bioinformatics was coined by Paulien Hogeweg and Ben Hesper in 1978 for the study of informatic processes in biotic systems at Utrecht University. Netherlands. .

– Development of algorithms and statistics to determine relationships among members of large data sets. analyze and manipulate large sets of biological data. . gene expression profiles and biochemical pathways. – The actual process of analyzing and interpreting data is referred to as computational biology. – Use of tools for the analysis and interpretation of various types of biological data. Three components: – Creation of databases allowing the storage and management of large biological data sets. RNA and protein sequences. including DNA. protein structures. Bioinformatics • Bioinformatics encompasses any computational tools and methods used to manage.

. Computers in Bioinformatics • Repeat same task millions of times • Problem solving Bioinformatics is more of a tool than a discipline.

RNA and proteins. Scope of Bioinformatics • Bioinformatics derives knowledge from computer analysis of biological data. manipulation and distribution of information by analysis sequence data of biological macromolecules like DNA. . • It is the technology that uses computers for storage. retrieval.

Related Fields • Computational Biology • Genomics • Proteomics • Pharmacogenomics • Pharmacogenetics • Cheminformatics • Structural genomics or structural bioinformatics • Comparative genomics .

Applications • Molecular medicine • Personalized medicine • Preventive medicine • Gene therapy • Drug development • Microbial genome applications • Waste cleanup • Climate change Studies • Alternative energy sources • Biotechnology • Antibiotic resistance • Forensic analysis of microbes • Bio-weapon creation • Evolutionary studies • Crop improvement • Insect resistance • Improve nutritional quality • Development of Drought resistance varieties • Veterinary Science .

Oracle .JavaScript • DBMS – MySQL. Perl. Skills for Bioinformatics • Molecular Biology – Central Dogma • Operating System – Unix or Linux • Sequence Analysis and Molecular modeling software packages – EMBOSS. Java • Markup Languages – HTML. C++. Swiss-PdbViewer. GCG Wisconsin Package • Programming Language – C. Python. RasMol. XML • Scripting Languages .

Introductory Genetics Central Dogma of Molecular Biology .

Why genetics is important .

Genetics G×E interaction Environment Health .

ISI Web of Science topic search for "genetic AND disease" 8000 7000 6000 Number of journal records 5000 4000 3000 2000 1000 0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 .

How genes work .

What is a gene? • A gene is a stretch of DNA whose sequence determines the structure and function of a specific functional molecule …GAATTCTAATCTCCCTCTC …function (usually DNA a protein) AACCCTACAGTCACCCATTT Computer program GGTATATTAAAGATGTGTTG sf(){document.focus()}… TCTACTGTCTAGTATCC… mRNA Working copy Protein Specific function . f.q.

Genes are located in the cell nucleus on chromosomes Karyotype .

Down syndrome karyotype (trisomy 21) .

DNA (deoxyribonu Protein cleic acid) mRNA .


Transcription movie .

Translation .

Translation .

Translation .

Translation movie .

Gene expression movie .

Summary • A gene is a length of DNA that contains instructions for making a specific protein • Genes are arranged along 23 pairs of chromosomes in the cell nucleus • Genes work by specifying the amino acid sequence of a protein .

Summary • Post-genomic genetics has enormous promise for tracking down the genes involved in common complex diseases • Currently our ability to exploit this potential is limited by – study size – difficulty of correcting for confounding factors .

Components of a Digital Computer System .

is a collection of computers and devices interconnected by communications channels that facilitate communications among users and allows users to share resources. Bioinformatics and Internet • Biological information is stored on many different computers around the world. • The easiest way to access this information is for the computers to be joined together in a network. • A computer network. .

845.946 29.609.096.304.013.556.511 105.0 % Middle East 212.032.201 7.284.400 110.3 % 179.3 % Asia 3.836 34.224.068.htm .3 % North America 344.492 1.500 77.990 61.966.450 108.000 825.336.240.514.931. WORLD INTERNET USAGE AND POPULATION STATISTICS Internet Users Population Internet Users Latest Data Penetration Growth World Regions ( 2010 Est.800 63.620. (% Population) 2000-2010 2010) Africa % WORLD TOTAL 6.825.448 58.700.5 % 621.4 % 146.050 4.094. 2000 (June 30.069.919 204.357.096.792.972 18.263.319.8 % America/Caribbean Oceania / Australia 34.8 % 1. 31.960 360.4 % 352.396 21.985.689.8 % Europe 813.852 114.700 10.800 266.) Dec.7 % 444.093 475.internetworldstats.816 28.3 % Latin 592.480 21.834.514.8 % http://www.124.779.9 % 2.924 3.5 % 1.

• This means that a single signal can be routed to multiple users. Software. • The foundations of the Internet were formed when packet- switching networks came into operation in the 1960s. Permission for network access . Internet • The internet is an international network of computers derived from an earlier system. and reassembled at the other side. • Packets can be compressed for speed and encrypted for security. developed by the US military. sent to its destination. • Transmitted data is broken up into small packets of data. • Internet Access – Hardware (network card and/ or modem). ARPAnet. and an interrupted packet may be re-sent without loss of transmission.

. which determines how the packets of information are addressed and routed over the network. • IP is the Internet Protocol. which determines how data is broken into packages and reassembled. TCP/ IP • Information transfer over the internet is governed by a set of protocols (procedures for handling data packages) called TCP/IP. • TCP is the Transmission Control Protocol.

. such as the Internet. FTP • File Transfer Protocol (FTP) is a standard network protocol used to copy a file from one host to another over a TCP-based network. FTP users may authenticate themselves using a clear- text sign-in protocol but can connect anonymously if the server is configured to allow it. FTP is built on a client-server architecture and utilizes separate control and data connections between the client and server.

• With Telnet. • On the Web. Telnet • Telnet is a network protocol used on the Internet or local area networks to provide a bidirectional interactive text- oriented communications facility using a virtual terminal connection. . an administrator or another user can access someone else's computer remotely. Through Telnet. but not to actually be logged on as a user of that computer. • Telnet is a user command and an underlying TCP/IP protocol for accessing remote computers. HTTP and FTP protocols allow one to request specific files from remote computers. one can log on as a regular user with whatever privileges may have been granted to the specific application and data on that computer.

whatis. Telnet • A Telnet command request looks like this: telnetthe. • Telnet is most likely to be used by program developers and anyone who has a need to use specific applications or data located at a particular host The result of this request would be an invitation to log on with a userid and a prompt for a password. If accepted. . one would be logged on like any user who used this computer every day.

– URL or Hyperlink . WWW • The World Wide Web is a way of exchanging information over the Internet using a program called a browser. • The WWW was developed in 1992 and allows the display of information pages containing multimedia objects in a special format called hypertext.

gov/ • Gateway sites for Bioinformatics on WWW • • • http://www.ebi.nih.ncbi.