You are on page 1of 8

Prof J Schymkowitz Evaluation Structural Bio-IT

Structural Bio-informatics
Evaluation 2023

Name: Hayder Al-Gburi


Student number: r0726249
Email Address: hayder.algburi@student.kuleuven.be

Do not forget to fill out your details above!

You are investigating a protein X, which UniProt ID is P01111. You are interested in
mutation M for this protein (G13D). Answer the following questions:
If the protein has multiple isoforms, answer the questions based on isoform 1.

Question 1: What is the gene name of protein X and from what organism? [1pt]

Gene: NRAS organism: Human

Question 2: What are two diseases this protein is involved with? [1pt]

A. Leukemia, juvenile myelomonocytic (JMML)


B. Melanocytic nevus syndrome, congenital (CMNS)

Question 3: What is the FASTA sequence of protein X? [1pt]

MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRD
QYMRTGEGFLCVFAINNSKSFADINLYREQIKRVKDSDDVPMVLVGNKCDLPTRTVDTKQAHELAKSY
GIPFIETSAKTRQGVEDAFYTLVREIRQYRMKKLNSSDDGTQGCMGLPCVVM

Question 4: For the following mutations, indicate whether you expect them to be
detrimental or benign and explain why based only on the physicochemical properties of
the wild type and mutant amino acids. [2pts]

K5W: mutation not in an active sight but K has a positively charged linear side chain while
W is hydrophobic and has an aromatic ring in side chain. We expect detrimental change,
possibly structure disturbance since AA change is too drastic.

N85S: both N and S are amino acids with polar uncharged side chains and the side chain
length is also very close. So we expect it to NOT be detrimental since its also not in an
active site

V112L: not in an active site, and also both have hydrophobic side chains with close length
so we expect NOT detrimental

E143D: both have negatively charged side chains with close length so we expect NOT
detrimental change to the protein, it also is not in an active site.

Question 5: What is the PDB ID of the best available experimental structure for protein X
and why? [1pt]

Page
Prof J Schymkowitz Evaluation Structural Bio-IT

7F68 is the ID of the best available experimental structure since it has the highest
resolution amongst the structures (1.24 angstrom)

Question 6: What was the experimental method and resolution for this PDB? [1pt]

Experimental method was X-Ray and resoltution was 1.24Ao

Question 7: What does this resolution allow you to observe? [1pt]


This resolution allows individual hydrogen atoms to be visualized and heavy atoms (C, O,
N) to be very accurately mapped

Question 8: What fraction of the protein is covered by this PDB? [1pt]


About 89.4% (the PDB length is 169 AA and total length obtained from fasta sequence is
189AA)

Question 9:  What is an AlphaFold structure? What is the coverage of the AlphaFold


structure? [1pt]
It is the 3D structure of a protein predicted from its sequence using an AI system. The
coverage is 100% since it’s in silico and uses the whole input fasta sequence.

From this point on, use the AlphaFold structure from UniProt to answer the
following questions, regardless of which PDB you chose in question 2A.

Indicate here what structure viewer you are using (YASARA, PyMol...): PyMol

Question 10: What is the secondary structure of residue 20? [1pt]

Residue 20 T is an alpha helix

Question 11: Is residue 25 exposed or buried? [1pt]

Residue 25 is exposed

Question 12: How many hydrogen bonds is residue 51 engaged in? [1pt]

Page
Prof J Schymkowitz Evaluation Structural Bio-IT

Residue 51 C is engaged in 2 hydrogen bonds

Question 13: What is the pLDDT score (confidence score) of residue 75? What does it
mean? [1pt]
pLDDT score of residue 75 is 97,52. It means that this is how much alphafold is certain
about this being the correct amino acid in this position

Page
Prof J Schymkowitz Evaluation Structural Bio-IT

Question 14: Looking now at the structure, would you adjust any of your answers from
question 4? Why? [2pt]

Well, now having the structure in mind, K5W is more likely to be detrimental since K is
exposed to the surface and a mutation to W would mean that a hydrophobic side chain
would be exposed which destabilizes the protein (the side chain wants to be buried with
the other hydrophobic residues). N85S is in a coil and V112L is buried but both have similar
properties so we expect no change in conclusion with these two. With E143D, E makes 3
hydrogen bonds with it’s side chain and although the physiochemical properties are
similar D's sidechain is shorter which might disrupt these bonds so it might be detrimental.

Question 15: Is this an enzyme? If so, what is the EC code? [1pt]

It is an enzyme with EC code EC:3.6.5.2

Question 16: Does mutation M violate the hydrophobicity rule? [1pt]

In G13D mutation, both G and D are hydrophilic thus no violation of the rule

Question 17: What secondary structure is mutation M located in? [1pt]

Page
Prof J Schymkowitz Evaluation Structural Bio-IT

It is located in a coil

Question 18: Is mutation M related to cancer? If so, in which tissue is most distributed?
[1pt]
G13D is indeed related to cancer, and from COSMIC we see that hematopoetic and
lymphoid are most affected

Question 19: Is mutation M located in an active site/binding site? [1pt]


Yes it is located in a binding site

Question 20: What is the Zvelebil score of mutation M? [1pt]


Its zvelbil score is 5/10

Question 21: FoldX gives a ΔΔG of 1.74 kcal/mol for this mutation, what does that
mean? [1pt]
A positive ΔΔG means that the mutation resulted in a higher energy state for the which is
destabilizing

Question 22: Have a look at the PDB structure of Protein_1, Protein_2 and Protein_3 (in
Toledo). One of these structures would definitely not be stable in nature. Which one, and
why? [3pts]

The answer is definitely protein 2. Since we look at the buried side chains they
are consistently charged (hydrophilic) which goes against the hydrophobicity

Page
Prof J Schymkowitz Evaluation Structural Bio-IT

rule. This structure is simply physically impossible to exist in nature.

Question 23: Imagine yourself as a talented structural bioinformatician embarking on a


thrilling expedition into the realm of ancient DNA. Recent excavations in the frozen tundra
have discovered a gene sequence from a long-extinct creature from the Ice Age: the
majestic woolly mammoth.
However, an intriguing twist awaits—only a single point mutation distinguishes this
mammoth gene from an existing mammalian protein, offering a unique opportunity to
model the structural implications of this evolutionary alteration.
The gene sequence is:
ATGATTGAGACGATAACTGGAAAGAATGCCCTGCTGAACTATGGTTTCTATGGCTGCTACTGTGGC
TTGGGTGGCCAAGGGACCCCCAAAGATGGCACTGATTGGTGCTGTTGGGTGGATGACCACTGCTA
CGGGCTTCTGGAGGAGAAAGGCTGCAACATCGTTACCCAGTCATACAAGTACAAAGTCACATGGG
GCTCGGTCACCTGTGAGCTCGGGCCCTTCTGCCAGGTGCATCTCTGTGCCTGTGACCGGAAGCTTGT
CTACTGCCTCCGGAGAAAACTAAGGAGCTACGACTCCAGCTACCAATACTTTCCCCGGGTCTTCTGC
TCCTAG

1. Based on existing gene orthologs, do you think this is an actual mammoth gene?
Why? [1pts]
After Blasting the sequence, we see top results gene orthologues in the African bush
elephant and the Asian elephant which are phylogenetically extremely related to the
mammoth thus this might actually be a mammoth gene

2. What is the potential function of the encoded protein? [1pts]


The encoded protein is most probably an enzyme

3. Given the lack of experimental protein structure data for this mammoth gene,
what are your options to get structural information and explain them briefly?
[3pts]

Page
Prof J Schymkowitz Evaluation Structural Bio-IT

Since it is only a single point mutation that distinguishes this gene from existing
mammalian genes, we can use homology modelling. Which is basically building a
model using the known structures as template. The structure of a protein is
uniquely determined by its amino acid sequence and Structure is even more
conserved than sequence similar sequences adopt nearly exact same structure.

Another possibility, since we have the sequence, we can artificially synthesize the
protein in cell cultures and then use X-Ray, NMR, or cryo electron microscopy to
get a possible structure.

We can also use alphafold to predict the structure which is an AI system that
utilizes a neural networks to predict the structure from the sequence alone.

4. To assess the structural consequences of the point mutation, you decide to use the
structure of the most similar existing protein as a model. Is this point mutation
strongly destabilizing the mammoth protein structure? Could this point mutation
result in a different function for the protein? [2pts]

Question 24: Given the data points reported in the table, find a possible set of parameters
(w1, w2 and c) that would allow the neuron to solve the discriminative problem. [3pts]

The neuron has the following characteristics


● Step function activation function
● Takes 2 features (X1 and X2) as input
● The weights are w1,w2 and the bias is c
Solution: we assume if f(x) >=0 then Y=1 otherwise Y=0

Before you go
● Please make sure you filled in your name, number and email address at the
top of the document.

Page
Prof J Schymkowitz Evaluation Structural Bio-IT

● Save this document and upload it to Toledo when you are finished.
● As a backup, email it to yourself with me in cc
(joost.schymkowitz@kuleuven.vib.be)

Page

You might also like