You are on page 1of 9

ASSIGNMENT 3

BT3040 - Bioinformatics

FATHIMA BENSHA M
BS20B015
19 February 2023

1. All of the information can be found on UniProt.


Amino Acid sequence

Function

Transmembrane segments
There are a total of 19 transmembrane segments as shown below.

4. 17,317 results are found for mouse protein sequences that are
manually annotated.

Out of these, no. of sequences associated with PDB are 2217.

5. 2099 sequences are mapped to the STRING database out of the 2217
sequences we received in the previous inquiry. The entry IDs were selected
and ID mapping was done from UniProtKB_AC-ID to STRING by using
Retrieve/ID Mapping.

6. (a)

• The no. of sequences with extremely small and extremely large


sequence lengths is found to be low.

• The average length of a sequence is 361 amino acids in UniProtKB/


Swiss-Prot.

• The average length of a sequence is 351 amino acids in UniProtKB/


TrEMBL.

(b) The shortest sequence is GWA_SEPOF (P83570) in UniProtKB/Swiss-


Prot. The longest sequence is TITIN_MOUSE (A2ASS6) in UniProtKB/Swiss-
Prot.

The shortest sequence is A0A0U1RQB9_HUMAN in UniProtKB/TrEMBL.


The longest sequence is A0A5A9P0L4_9TELE in UniProtKB/TrEMBL.

(c) UniProtKB/TrEMBL

UniProtKB/Swiss-Prot

2. 95 clusters are found for transcription factors with 50% sequence


identity. The size column is put into excel sheet to calculate the total number
of sequences which is 2059.

The FASTA sequence is given below.

3. For Homo sapiens, a total of 379,233 results are found out of which
223,648, 102,119 and 52,466 have identity cut o of 100%, 90% and 50%
respectively.

ff
9

You might also like