Professional Documents
Culture Documents
Ppt 1
Intro to bio
Intelligence in Biological Systems - 3
19BIO201
L-T-P-C:1-2-0-3
Syllabus
• Assembling genome using graph algorithm
• The string reconstruction problem
• String reconstruction as a walk in the overlap graph
• Gluing graphs
• de Bruijn graph
• The seven bridges of konigsberg - Euler’s theorem
• From Euler’s theorem to an algorithm for finding an
Eulerian cycle
• Assembling genomes from read pairs
• Python programming for bioinformatics
Course Objective
• To introduce the basic concepts of
bioinformatics using computational
methods
• To introduce programming for
bioinformatics
• To explore the challenges and the
potential of artificial intelligence in
bioinformatics
Course Outcome
• To understand basics of assembling
genome
• To learn Python programming for
bioinformatics
• To explore potential challenges and
applications of computational
bioinformatics
Evaluation Pattern
• Assignments: 10 x 4.5= 45
• Quizzes: 5 x 5 = 25
• Project: 30
Gregor Mendel – Father of Genetics
• Law of Segregation
• Law of Independent
Assortment
https://www.youtube.com/watch?v=
Mehz7tCxjSE
• Each species has blue print
of its life which is different
from other species
• The individuals in a species
have similarity yet
differences
• The blueprint are inherited
from one generation to
another
Genetics • Many traits are influenced
by environment also
And if the
sequence
has
4.6 x 107
Role of Computer Scientist
Important Milestones
• DNA established as the genetic material 1869 – Johann
Fridrich Miescher
• Genes on chromosomes are the discrete units of
heredity 1911 – Thomas Hunt Morgan
• Genes make proteins 1941 – George Beadle and
Edward Tatum
• Cytosine complements Guanine and Adenine
complements Thymine 1950 – Edwin Chargaff
• Double helical structure of DNA - 1952-1953 James D.
Watson and Francis H. C. Crick & Rosalind Frankline
Genome Sequencing
• Isolation of the first restriction enzyme 1970
Howard Temin and David Baltimore
https://www.nature.com/scitable/topicpage/dna-sequencing-technologies-key-to-the-human
-828/
(extra reading material)
DNA
Nitrogenous Bases
Molecular basis of DNA Structure
• Polynucleotide chain – sugar phosphate back bone having
nitrogenous base attached to it.
• Nucleotide has three elements – phosphate, pentose sugar,
nitrogenous base
• Pairing between Nitrogenous base is not chemical….
• Erwin Chargaff “A pairs with T (2 H bond) & C pairs with G (3 H
bond), differs among species but is constant in all cells of an organism and
within a species.
Human being - A29.8, T- 31.8, G-20.2, C -18.2, G+C- 38.4
Base composition in terms of % of total base.
• C – G bond stronger than A – T (amount of heat required to
saperate the DNA strand increases with increase in G + C )
• A+ G = C + T that is purines = pyrimidines
DNA …..
• Read from 5’ to 3’ direction. These labels are
indicative of free carbon on sugar phosphate
backbone ( 5’ has terminal phosphate group &
3’ has free OH)
• Length measured in base pair units (bp units)
• 3.2 billion base pair human genome
sequenced
The Double Helix Structure
• Helix is twisted in right hand direction
• Each turn measures 34 angstrom
• Bases are spaced at 3.4 angstrom
• There are ten base pair in each helical strand
• Bases are perpendicular to the sugar phosphate
backbone but stacked parallel to each other
• Two grooves, the major and the minor groove
appar on the helix. These provide binding site for
the proteins.
https://www.youtube.com/watch?v=ThG_02mi
q-4
Ppt 2
DNA structure
DNA
Code
for
Life
DNA
Nitrogenous Bases
Molecular basis of DNA Structure
• Polynucleotide chain – sugar phosphate back bone
having nitrogenous base attached to it.
• Phosphodiester Bond – the backbone of DNA
• Nucleotide has three elements – phosphate,
pentose sugar, nitrogenous base
• Pairing between Nitrogenous base is not chemical….
• Base Stacking – Allows millions of base pairs lie one
above the other
Chargaff Rule
• Erwin Chargaff “A pairs with T & C pairs with G
• C – G bond stronger than A – T (amount of heat
required to separate the DNA strand increases with
increase in G + C )
• Base composition – % of G + C in terms of % of total
base
differs among species but is constant in all cells of an
organism and within a species.
Human being – A 29.8, T- 31.8, G-20.2, C -18.2, G+C-
38.4
• A+ G = C + T that is purines = pyrimidines
Evaluate yourself
1. The base sequence for complete set of
chromosomes --------
2. Nucleotide is made of --------, ----------- & -------
3. Phosphodiester bond is the bond between ----
4. Purines and pyrimidines are ------------
5. Number of hydrogen bond between adenine and
thyamine -----------
6. Arrangement of base pairs in DNA ---------
7. The base composition for a species remains --------
Answers
1. Genome
2. Pentose sugar, phosphate group, nitrogenous
base
3. Pentose sugar & phosphate group
4. Nitrogenous bases
5. 2
6. Base stacking
7. constant
Watson Crick Model
The Double Helix Structure
• Right hand twisted helix
• Each turn measures 34 Å
• Bases are spaced at 3.4 Å
• Ten base pairs in each helical strand
• Bases are perpendicular to the sugar
phosphate backbone but stacked parallel to
each other
• Grooves - The major & the minor groove,
provide binding site for the proteins.
•Read from 5’ to 3’ direction.
Labels indicate free carbon on
sugar phosphate backbone ( 5’
has terminal PO4 & 3’ has free
OH)
•DNA Length measured in base
pair (BP) units
•1kb = 1000bp
Shortest human DNA 4.6x107
•Sugar lie above and below the
plane containing the base pair
Some facts…..
• DNA is a very dynamic molecule
• Satisfy the criteria for genetic material
- can make a copy of itself
- should code for life
- allow for changes in progeny
Points to ponder…..
2. Fragment
with adaptor
3. Achment of
bridging group
Cluster Generation
• Clustering is amplification process in flow cell
• Each fragment is first attached onto glass
channels on a flow cell and then amplified into
millions of copy
Sequencing
• Begins by adding one nucleotides at a time
which generates a signal
• The reads are generated for forward as well as
reverse strand
• Illumina generates
paired reads
that is two reads for
each fragment
Data Analysis
• Preliminary data analysis is done
• Data is locally clustered based on indices given to
each cluster
• Contiguous sequence prepared
• Contig aligned to reference genome for
verification and identification.
Nanopore Sequencing
• Unique & scalable technology
• Enables direct, real-time analysis of long DNA or RNA
fragments.
• It works by monitoring changes to an electrical current
as nucleic acids are passed through a protein
nanopore.
• The resulting signal is decoded to provide the specific
DNA or RNA sequence.
• Advantage is fast and cost effective & disadvantage is
error in reads