You are on page 1of 21

INTERNATIONAL MOLECULAR BIOLOGY LAB

TRAINING PROGRAMME
(MoBiLab)

Bioinformatics
M B lab
March 2024
Bioinformatics Session 1
https://www.ncbi.nlm.nih.gov/

Which tools are available on NCBI?

What kind of information can you find on your


preferred organism?
BLAST: Basic Local Alignment Search Tool
BLAST can be used for a lot of different purposes some are:

Looking for species. If you are sequencing DNA from unknown species, BLAST may help
identify the correct species or homologous species.

Looking for domains. If you BLAST a protein sequence (or a translated nucleotide sequence)
BLAST will look for known domains in the query sequence.

Looking at phylogeny. You can use the BLAST web pages to generate a phylogenetic tree
of the BLAST result.

Mapping DNA to a known chromosome. If you are sequencing a gene from a known
species but have no idea of the chromosome location, BLAST can help you. BLAST will
show you the position of the query sequence in relation to the hit sequences.

Annotations. BLAST can also be used to map annotations from one organism to another
or look for common genes in two related species.
Different types of BLAST option
https://blast.ncbi.nlm.nih.gov/Blast.cgi
Exercise 1
➢ Identify the origin of this sequence (organism, gene) by using the right
BLAST function
➢ Download the DNA sequence of the complete ORF of this gene, what is
its size in kb?
➢ Download the corresponding protein sequence of this gene
➢ Look in Pubmed if this gene is existing in your species of interest

ATCTAGAATGTTCTCTTGCCTCTATTGTTCCCGTAAGTTCTGCACTTCT
CAAGCACTTGGAGGACATCAGAATGCTCACAAACGAGAAAGAGCC
GCTTCTCGCCGGAATATGTTCTCTACCGACCCTAACAATAACAATAATA
GGCTCCAATTCCATTTTCTTCACAACAACAATGAGAATGTGCCTCAAC
AACACAACATCATCAATGCTAATTACTCCTGCCAGCT
Exercise 2
➢ Identify the origin of this sequence (organism, gene) by using the right BLAST function?
➢ What can you tell about the gene structure (exons/introns)?
➢ What is the length of the mRNA?
➢ Who published this sequence?
➢ What functional domains does the corresponding protein have

attttagtcccccacactcacttcctcggaaaattcgctctcactcttccagctggaatc caaagcagacaatgaagacaaaagcttctctctctctctcttatttctccatctctctat
gattaaacttggattttctgcaatttgtgtttttcttcggatgatttcttaatctgaaaa agttttgaagatctttggataatttagcgatgaaatcttcgactcggcttgactcagttg
cttttcagctcactcccactcgaacccggtaagttaaatttcgtgtaaactcttgttttc gtcagctttagagaagcctgatatggtaattgatcctaaatgttaaatctttctgtgaaa
ttgttctatggaagtcgaaagattgcattgttatgttcattgataagacataaaaattgt tagaagtgttaaactttgtgtgacttgttagagaaattggttttatggaagtcgaaagat
tgcatttttatgttcatttaaaagattaaatattgttaaagtgataaattttgtgttagt tttgtgagaaattggttctatggaagttaaaagattgcatttttgggtttcctaataaga
tcaaaagttgttataatgttaatatttgtgttagttttctgagaaattggttctatggaa gttgaaagattgcattttttgggttcattaataagatcaaaagtcgttgaaaattgtagg
tttcttaccaaacttgcaacgaaaaatttaagtgtttttctttgcattgattaattaggt aggtaacagctaatgtaaaataatttctctcttgtattgtataaactcttgtattcgtga
gctttatttacttaagcagaaaattgttgttaacttttgtgaagaactgttcaatgcaaa tagaaagactgtatctttatgttcattgataggttcaaaagtcactgaattttgttgttt
gtgtaccatattactatggaaaatttaatgtctttttctttgtggttcttacaggtgtga tttgttggtaacagctaatggaaagactgagaaaatagctacagggttacttgatccatt
tcttgctcatttaaagactgcaaaagatcaattggaaaaaggtggctattcgattattct caagcctgaagctagtgacaatgcggcttggttcactaaaggaacaattgaaaggtttgt
gtcatatacattagttgtctttgtatggaactagacactttgagataagagttttttctt cttttgtactcgtttaggtttgttcgatttgtgagtacgccggaagtcatagaacgtgtt
tatactttagaaactgagattatacagatcaaggaagccattggtattcagaacaactcg gaaatggcattgactgttgtaagtatattgggacttttgaatactttctttgttgtatgg
tcagaagtacttaatggtatattgtataaacaggtgaaagacgatcatcgagcaaaaaaa gcagatagtgcagaaggtatgccataaggataaaattatgtaatttatgagtctctatgc
taaagctactgatttggaaaaacaatattcccgtctcaggatttgactgatactaagcaa atctcaactcataaaatcgaaaatagtgtgtctcttggctattgatctttctgtttaacc
aatatgtgagaatgtgaatcaggtagcaggcctttgctacagctcaacgaggagaaagct attgtcctttacgaggtcaattagaactgaaatggtttctataatttaaaggaccttttc
tcttatttttctctttggaacaaacaacttatatcatcttcttgcatttgaacagcctga ttcccatccaaagcaagcaaatcggtctacctcatcagatgaaaactctaagtatacccc
tttgcgtgaattgttttttggttttcaatgtttttcaacaaagagagtagttaagtatct aattggtaatatgtggaattgtatcagagctcaagttatgaaagttctggagacacggaa
Exercise 3

Which gene does this sequence correspond to?


Which conserved domain is present in this protein sequence?
Does this conserved domain tell you anything about the putative function of the protein?

MQEQATSSLAASSLPSSSERSSSSALHLEVKEGMESDEEI
RRVPEIGGEPAGTSASGREPGSVTGLDRVQ
AAGEGQRKRGRSPADKENKRLKRLLRNRVSAQQARERK
KAYLNELETRVKDLEKKNSELEERLSTLQNEN
QMLRNILQNTTASRRGGSSDANGDGSL
Bioinformatics Session 2
How to amplify a gene by PCR?
https://www.youtube.com/watch?v=KIcxzSr6IcE
How to amplify a gene by PCR?
Scheme of what a PCR machine does
Phase I denaturation Phase II amplification Phase III final extension

Denaturation

temp 95°C

DNA amplification

Primer annealing
temp dependent on Tm primers
holding temp
Exercise 1
Prepare the optimal PCR protocol for the termocylcer knowing that:
• Your DNA polymerase works best at 72°C
• Your primers have been shown to have a good working temperature at 4°C below their Tm
• The Tm of your primers is 60°C
• The DNA you want to amplify is 750bp long

X cycles
X°C for X sec X °C for Xsec

X °C for X sec X°C for X sec

X °C for X sec X °C for X sec


Primers design
PCR primers are short, single-stranded segments of DNA that are designed to be
complementary to the beginning and end of the target sequence that will be amplified. In a
PCR, it is the primers that dictate exactly what sequence of DNA gets copied.
Basic rules for a good primer

•A GC content between 40 and 60% with the 3’ of a primer ending in G or C to promote binding (GC Clamp).
Be mindful not to have too many repeating G or C bases, as this can cause primer-dimer formation.

•A good length for PCR primers is generally around 18-30 bases. Specificity usually is dependent on length
and annealing temperature. The shorter the primers are, the more efficiently they will bind or anneal to the
target.

• Melting temperature (Tm) of the primers around 60°C, and within 5°C of each other. Tm is dependent on the
length, on the bases If the Tm of your primer is very low, try to find a sequence with more GC content, or
extend the length of the primer a little.

•Try to avoid runs of 4 or more of one base, or dinucleotide repeats (for example, ACCCC or ATATATAT).

•Avoid intra-primer homology (more than 3 bases that complement within the primer) or inter-primer homology
(forward and reverse primers having complementary sequences). These circumstances can lead to self-dimers
or primer-dimers instead of annealing to the desired DNA sequences.
Exercise 2
Use primer 3 https://primer3.ut.ee/ or primer blast https://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi
to make a set of primers for the following gene sequence from Lycopersicon esculentum :

ATGGCGGACCACACCCACCTAGTTTATGATTTCTGGAACCAGTCTAACTCCGTTGCCCCAGATATTATGCTGGAACCACCCATGA
TCCCTAAACCTAAAGTAGCTCCATCTTCATCTAGAATGTTCTCTTGCCTCTATTGTTCCCGTAAGTTCTGCACTTCTCAAGCACTT
GGAGGACATCAGAATGCTCACAAACGAGAAAGAGCCGCTTCTCGCCGGAATATGTTCTCTACCGACCCTAACAATAACAATAAT
AGGCTCCAATTCCATTTTCTTCACAACAACAATGAGAATGTGCCTCAACAACACAACATCATCAATGCTAATTACTCCTGCCAGC
TCCAAGTCCAACCACCGCCTCCTTCAAACAATAGCTTGTGCTCCTCCTCATCCGCCGTTTTCTATCCTCCGCCGAATCATGGATAT
TTTTCTACTACTGGTGAGGATTTGTCATTGTTCCCCATTGATGAACCCCAGCTGAATCTCGACCTTACCCTTCGTTTGTAG

Requirements:
- The amplicon should be around 200 bp
- The melting temperature should be around 60°C
- The length of the primers between 18 and 25bp

• If you were to amplify the whole sequence, what would be your limitation?
• If you wanted to make a primer set for Q-PCR, what additional elements would you look at?
Restriction sites and their use in molecular biology
A restriction site is a nucleotide sequence in a DNA fragment that is recognized and cut by an endonuclease

- Their natural function is to inactivate invading viruses by cleaving the viral DNA
- Recognition is often a palindromic sequence
- approximately 4–8 base pairs
- Are recognized specifically by restriction enzymes (EcoRI, SalI, XbaI….)
Uses of DNA restriction
Restriction fragment length polymorphism, or RFLP
Uses of DNA restriction
Cloning a piece of DNA (insert) into a plasmid (vector) for further use
1. Amplification of a DNA fragment
2. Protein expression
Uses of DNA restriction

>Knuckles-like genomic sequence (Solyc02g160370.1.1)


ATGGCGGACCACACCCACCTAGTTTATGATTTCTGGAACCAGTCTAACTCCGTTGCCCCAG
ATATTATGCTGGAACCACCCATGATCCCTAAACCTAAAGTAGCTCCATCTTCATCTAGAATGT
TCTCTTGCCTCTATTGTTCCCGTAAGTTCTGCACTTCTCAAGCACTTGGAGGACATCAGAAT
GCTCACAAACGAGAAAGAGCCGCTTCTCGCCGGAATATGTTCTCTACCGACCCTAACAATA
ACAATAATAGGCTCCAATTCCATTTTCTTCACAACAACAATGAGAATGTGCCTCAACAACAC
AACATCATCAATGCTAATTACTCCTGCCAGCTCCAAGTCCAACCACCGCCTCCTTCAAACAA
TAGCTTGTGCTCCTCCTCATCCGCCGTTTTCTATCCTCCGCCGAATCATGGATATTTTTCTACT
ACTGGTGAGGATTTGTCATTGTTCCCCATTGATGAACCCCAGCTGAATCTCGACCTTACCCT
TCGTTTGTAG

Fragment length: 507 bp


Digestion XbaI: 392bp + 115bp
Exercise 3

Online tool: https://www.genecorner.ugent.be/rest_map.html

Sequence to analyse:
CTTTGAAGAGAGAGTTAAAGAGGACGTGAAACCGGTGAGATGGAAACGGATAGAGCTAGCGTATCTGTCCTGCATTC
AGCCACTTGGCATTGGCGCCCGCGTAGCCGGCGGCCCCAGATTGGGACGGTTACGCTGGGCAATGTTGGTGCCGAG
TGGTGCATTTGTAGGTGGAGTGCGCCGAGGCGTTCGGGGCAGCGGTGTGAGCTTAGCTTTGAGGCCAGCTTGCTGG
TACCCGGGCTGGGGGAGCATTGTTTGCCTTGGGCGTTTATAGAGGGAGTATGGCTTACGAGCTCGGGTTGGTGTCGA
GCTGAGAGTCGGCGGCGGTCGCAGGCGACACGTGCTGTGCTATTAGTTCGGTCCTGCTCGAGCTCACATGCTCGCTCT
CGGTGTAAAAGCTGGTCATCTTTCCGACCCGTCTTGAAACACGGACCAAGGAGTTTATCGTGTGCGCGAGTCATTGGG
CGTTGAAAACCCAAAGGCGCAATGAAAGTGAATGCTCCGCAAGGAGCTTACGTGCGATCCTGGGCACCGCGGTGTCC
GGGCGCAGCATGGCCCCATCCTGACTGCTTGCAGTGGGGTGGCGGAAGAGCGTACGCGGTGAGACCCGAAAGATG
GTGAACTATTCCTGAGCAGGATGAAGCCAGAGGAAACTCTGGTGGAAGTCCGAAGCGATTCTGACGTGCAAATCGAT
CGTCTGACTTGGGTATAGGGGCGAAAGACTAATC

• Create a restriction map of the sequence above


• Choose a restriction enzyme that you think would allow you to check the sequence identity on an agarose gel
• What will be the size of the obtained restriction fragments?
• Draw the expected restriction profile

You might also like