You are on page 1of 2

Protein Subcellular Localization

Project Progress - LeoDas

1. Light Attention Architecture [1]


Performing inference on setHard dataset using pre-trained model weights. (model was trained on
deeploc dataset)
setHard.h5 embeddings were given as input.
The .fasta data contains 490 rows.
Utilized code [2].

Inference results
The predicted results can be found at
https://drive.google.com/file/d/1kvjckTqCAy5U4qRYZ9p8aMwzM4IAji4d/view?usp=sharing

2. Went through bio-embeddings repository [3] . Tried to run the code [4] which uses embeddings
and annotations of the DeepLoc dataset to train a supervised classifier to predict subcellular
localization based on embedding.

To do next

1. Training the LA model [1] using deeploc embeddings provided in [5]

2. Creating our own embeddings and using them for training and testing
References

[1] Stärk, H. et al. (2021) ‘Light attention predicts protein location from the language of life’,
Bioinformatics Advances, 1(1). doi:10.1093/bioadv/vbab035.

[2] https://github.com/HannesStark/protein-localization?tab=readme-ov-file

[3] https://github.com/sacdallago/bio_embeddings

[4] https://docs.bioembeddings.com/v0.2.1/notebooks/deeploc_machine_learning.html

[5] https://drive.google.com/drive/folders/1Qsu8uvPuWr7e0sOdjBAsWQW7KvHcSo1y

You might also like