Professional Documents
Culture Documents
ASHWIN SAXENA
(2017B1A40453P)
Results obtained:
Alanine A 4.8%
Isoleucine I 2.8%
Leucine L 4.4%
Valine V 4.4%
Phenylalanine F 3.6%
Tryptophan W 0.4%
Tyrosine Y 0.79%
Asparagine N 2.00%
Cysteine C 4.00%
Glutamine Q 5.20%
Methionine M 2.40 %
Serine S 10.80%
Threonine T 6.00 %
Arginine R 6.00%
Histidine H 3.20%
Lysine K 9.20%
Glycine G 11.60%
Proline P 9.60%
Inferences -
The presence of hydrophilic amino acids like serine, arginine, glutamic acid, etc. supports the role of the protein as a
secretory product of endothelial cells.
The presence of some hydrophobic amino acids like proline suggests that the core is hydrophobic.
Cysteine is also present in a sufficient amount to allow for the formation of disulfide bridges.
Secondary structure prediction of protein using different web servers like GOR IV, PHD, PredictProtein, Psipred
and Predator.
1. GOR IV
2. JPRED
3. PHD
4. PREDATOR
5. PREDICT PROTEIN
6.
7.
6.PSIPRED
Ribbon
Lines
C Alpha Trace
B-Factor Tube
Backbone
Schematic
Sphere
Sticks
Strand
1. In all the Web servers predicted protein secondary structure, Majority of them belong to the Random Coil
averaging out near to more than 50 %.
2. In all the predicted structures, random coil is followed by Extended Strand structure, averaging out near to
40 %.
3. Among all the constituent amino acids, Top 5 most abundant in decreasing order are Glycine(G), Serine(S),
Glutamate (E), Proline(P) and Lysine(K)
4. Some consensus could be drawn First eight Amino acids are random coils and near the end of amino acids
are also random coils.
5. Among all the best predictions and clarity is of PREDICTPROTEIN.
6. From the images provided above, we see that the protein does indeed contain a majority of alpha-helix
structures and coils.
1)For prokaryotes
cDNA sequence was found using the tool Reverse Translate and setting the codon usage parameter for Escherichia
coli. This cDNA sequence was then fed into the BLASTn algorithm. No significant results were found for prokaryotes
as well.
2) For Eukaryotes: BLASTn was performed using the DNA sequence to find the top 20 eukaryotic genes. 20
eukaryotes were considered because no prokaryote hits were obtained.
1)For prokaryotes
2) For Eukaryotes: BLASTp was performed using the protein sequence data of human gene lin28B to find the top 10
eukaryote hits from different species. 10 eukaryotes were considered because no prokaryotic hits were obtained.
S. no. Species Total Score Query cover E-value %identity Accession
Number
1. Pan 503 100% 8e-180 99.60 NM_001004317
troglodytes
2. Gorilla 501 97% 3e-179 99.20 XP_018885752.2
Inferences:
• From the BLASTP and BLASTN results we observe that the closest relatives (sequence wise) to the human gene
lin28B are from the apes and monkeys, as should be expected. The next closest relatives are mostly mammals.
• The high sequence similarity (>90%) obtained through BLASTP indicates that the protein product of lin28B is highly
conserved in the animal kingdom.
Protein –
DNA-
Higher similarity is obtained due to the reason that the longer sequences allow for a greater number of matching
bases and this results in higher similarity.
Both DNA and protein sequence data suggest that the apes and primates are most closely related to the human gene
lin28B.
• * marks on the top of certain columns depict the columns that are conserved across all the DNA sequences.
Inferences
MSA obtained:
• * marks on the top of certain columns depict the columns that are conserved across all the protein sequences.
Inferences –
For the protein sequence MSA, conserved sites = 307. Out of 349 aa (in most species), hence we can say this protein
is highly conserved across various animals.