Welcome to Scribd!

Assignment No 1 Part 2

Uploaded by

0% found this document useful (0 votes)

11 views2 pages

This document describes datasets used to analyze protein-protein interactions between pathogens and their hosts. It discusses: 1. Using positive interaction data from Yersinia pestis and Bacillus anthracis from databases, filtering out records with missing or uncommon amino acids. 2. Encoding the protein interactions into Pseudo amino acid composition representations for machine learning. 3. Constructing negative interaction datasets by randomly selecting protein pairs not in the known interaction datasets. 4. The final datasets have 51 fields including target (positive/negative), pathogen/host IDs, amino acid representations, and are implemented as CSV files in Python for analysis.

Original Description:

Original Title

assignment no 1 part 2

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

11 views2 pages

Assignment No 1 Part 2

Uploaded by

Faiza Malik

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Computational Analysis of Protein Interactions between

pathogens and its host

Introduction:

In order to understand the mechanism of infection and develop better treatment and prevention
of infectious diseases, host-pathogen interactions are important. The protein interaction map will
guide research on key PPIs that may lead to human cells being adhered to, colonized, and even
invaded by pathogens. Host-pathogen PPI prediction, however, has its challenges.

Dataset description:

We used Yersinia pestis and bacillus anthracis positive ppi’s interaction files from PHISTO
databases and then match these corresponding sequences from ncbi and uniprot in order to
predict protein interaction.
We take 4040 interactions of Yersinia pestis; many of the interactions are ignored due to the
deletion of the records from databases like ncbi and uniprot, because the data that cannot
contribute anything to the results are considered noise.
Furthermore many of the interactions are excluded from our datasets that contain uncommon
amino acids, because the amino acids that occurred too often are also considered as the data
that cannot contribute much to the results.
At the end the ppi’s are encoded into Pseudo amino acid composition, or PseAAC, that
represent protein samples for improving protein subcellular localization prediction and
membrane protein type prediction.
The same method is applied to bacillus anthracis dataset also get the interaction files from
PHISTO database. The no of positive interactions are 3003.

Negative data preparation:

We construct negative data by selecting negative protein pairs randomly from all possible
Protein pairs except the known ones interactions and we label these data as negative.
We take positive and negative interactions of equal size, but the size may vary after the
experimental results, as the study shows the author selected negative data by the amount 1:1
1:2 1:3 and find a very minor effect of changing the size of positive and negative data.

Our dataset contains the following 51 fields:

1. Target: 1 for positive interactions and 0 for negative interactions.
2. Ids: pathogen and host id from which we can track the record of the specific sequence.
3. Amino acids which are represented by the alphabets and combining the first alphabet of the
corresponding genes.
For example: in bioinformatics the alphabet A refers to alanine and we combine this letter with
the gene with which this particular amino acid belongs to.
A-H Means alanine of humans for that particular record and A-Y Means alanine of Yersinia
pestis, and likewise.

The format of the list is: amino acid name - 3 letter code - 1 letter code.
alanine - ala - A
arginine - arg - R
asparagine - asn - N
aspartic acid - asp - D
cysteine - cys - C
glutamine - gln - Q
glutamic acid - glu - E
glycine - gly - G
histidine - his - H
isoleucine - ile - I
leucine - leu - L
lysine - lys - K
methionine - met - M
phenylalanine - phe - F
proline - pro - P
serine - ser - S
threonine - thr - T
tryptophan - trp - W
tyrosine - tyr - Y
valine - val - V

DATASET IMPLEMENTATION AS CSV FILE IN PYTHON:

import pandas as pd

dataset = pd.read_csv("//content/golf-dataset1.csv")

dataset.head()

Molecular Protocols in Transfusion Medicine
From Everand
Molecular Protocols in Transfusion Medicine
Gregory A. Denomme
No ratings yet
Pharmaceutical Biotech 2.1-2018
Document82 pages
Pharmaceutical Biotech 2.1-2018
An Ngoc Hoai
No ratings yet
Isoenzymes Multienzyme Complex
Document20 pages
Isoenzymes Multienzyme Complex
Anisam Abhi
100% (2)
General Biology 1: Quarter 2 - Module 13 To 15: Energy Transformation
Document8 pages
General Biology 1: Quarter 2 - Module 13 To 15: Energy Transformation
kent ignacio
No ratings yet
A Systematic Review On The Comparison of Molecular Gene Editing Tools
Document8 pages
A Systematic Review On The Comparison of Molecular Gene Editing Tools
International Journal of Innovative Science and Research Technology
No ratings yet
Statistics for Bioinformatics: Methods for Multiple Sequence Alignment
From Everand
Statistics for Bioinformatics: Methods for Multiple Sequence Alignment
Julie Thompson
No ratings yet
Fetal GeneFnDisease
Document6 pages
Fetal GeneFnDisease
Miha
No ratings yet
Vinay Agam 2013
Document9 pages
Vinay Agam 2013
minhnhut_lxag
No ratings yet
Computational Analysis of Protein Interactions Between HIV-1 and Homo Sapiens
Document5 pages
Computational Analysis of Protein Interactions Between HIV-1 and Homo Sapiens
Faiza Malik
No ratings yet
Relating Protein Pharmacology by Ligand Chemistry: Analysis
Document10 pages
Relating Protein Pharmacology by Ligand Chemistry: Analysis
srikalyani2k9
No ratings yet
Structural Similarity-Based Predictions of Protein Interactions Between HIV-1 and Homo Sapiens
Document15 pages
Structural Similarity-Based Predictions of Protein Interactions Between HIV-1 and Homo Sapiens
thamizh555
No ratings yet
Genes 11 00668
Document17 pages
Genes 11 00668
long
No ratings yet
Rostami-Hodjegan DMD 2014 CYP Abundance
Document11 pages
Rostami-Hodjegan DMD 2014 CYP Abundance
klieber
No ratings yet
(Eisenberg@@) (2000) (NA) (X) Protein Function PDF
Document4 pages
(Eisenberg@@) (2000) (NA) (X) Protein Function PDF
byomin
No ratings yet
Structure-Based Pharmacophore Models Generation and Combinatorial Screening of ICE Inhibitors
Document4 pages
Structure-Based Pharmacophore Models Generation and Combinatorial Screening of ICE Inhibitors
Raju Das
No ratings yet
Schwanhausser 2011 Global Quantification
Document6 pages
Schwanhausser 2011 Global Quantification
tamykve
No ratings yet
Bioinformatics 2007 Dyer I159 66
Document8 pages
Bioinformatics 2007 Dyer I159 66
Nguyen Hoang Tu
No ratings yet
Polyethylenimine Nanogels Offered With Ultrasmall Straightener Oxide Nanoparticles and Also Palbociclib Pertaining To MR ImagingGuided Radiation Involving Tumorsgqpgf PDF
Document2 pages
Polyethylenimine Nanogels Offered With Ultrasmall Straightener Oxide Nanoparticles and Also Palbociclib Pertaining To MR ImagingGuided Radiation Involving Tumorsgqpgf PDF
strawtoilet8
No ratings yet
Accepted Manuscript: 10.1016/j.ijporl.2018.09.005
Document24 pages
Accepted Manuscript: 10.1016/j.ijporl.2018.09.005
Ridho Rifhansyah
No ratings yet
A Genetic Association Study Is A Test of Whether A Given Sequence Has Any Effect On The Phenotype of A Specific Trait
Document2 pages
A Genetic Association Study Is A Test of Whether A Given Sequence Has Any Effect On The Phenotype of A Specific Trait
Tim Hepburn
No ratings yet
In vivo demonstration that α-synuclein oligomers are toxic: a b c d e f
Document6 pages
In vivo demonstration that α-synuclein oligomers are toxic: a b c d e f
Fatima Herranz Trillo
No ratings yet
Main PDF
Document12 pages
Main PDF
Agung Budi Pamungkas
No ratings yet
Landscape of Enhancer-Enhancer Cooperative Regulation During Human Cardiac Commitment
Document12 pages
Landscape of Enhancer-Enhancer Cooperative Regulation During Human Cardiac Commitment
Agung Budi Pamungkas
No ratings yet
InTech-In Silico Identification of Plant Derived Antimicrobial Peptides
Document25 pages
InTech-In Silico Identification of Plant Derived Antimicrobial Peptides
pedro41
No ratings yet
Research Paper On Protein Binding
Document7 pages
Research Paper On Protein Binding
nadevufatuz2
100% (1)
Albert Et Al., 2014 PDF
Document19 pages
Albert Et Al., 2014 PDF
Indrani Bhattacharya
No ratings yet
Walking The Interactome For Prioritization of Candidate Disease Genes
Document10 pages
Walking The Interactome For Prioritization of Candidate Disease Genes
Liza Marcel
No ratings yet
Lipoprotein (A) and Venous Thromboembolism in Adults: A Meta-Analysis
Document6 pages
Lipoprotein (A) and Venous Thromboembolism in Adults: A Meta-Analysis
antonio
No ratings yet
Bioscientific Review (BSR) : The Correlation Between Alpha-Fetoprotein and Liver Function Tests
Document10 pages
Bioscientific Review (BSR) : The Correlation Between Alpha-Fetoprotein and Liver Function Tests
UMT Journals
No ratings yet
Molecular Interaction Study of Flavonoids With Human Serum Albumin Using Native Mass Spectrometry and Molecular Modeling
Document11 pages
Molecular Interaction Study of Flavonoids With Human Serum Albumin Using Native Mass Spectrometry and Molecular Modeling
PJ Su
No ratings yet
Comparing Genetic Evolutionary Algorithms On Three Enzymes of HIV-1: Integrase, Protease, and Reverse Transcriptome
Document13 pages
Comparing Genetic Evolutionary Algorithms On Three Enzymes of HIV-1: Integrase, Protease, and Reverse Transcriptome
AI Coordinator - CSC Journals
No ratings yet
Final Presentation
Document20 pages
Final Presentation
Christina Joseph
No ratings yet
In Silico Optimization of A Guava Antimicrobial Peptide Enables Combinatorial Exploration For Peptide Design
Document12 pages
In Silico Optimization of A Guava Antimicrobial Peptide Enables Combinatorial Exploration For Peptide Design
JPIAMR Grant
No ratings yet
706 FTP
Document9 pages
706 FTP
elle
No ratings yet
Patterns of Molecular Evolution in Pathogenesis-Related Proteins
Document9 pages
Patterns of Molecular Evolution in Pathogenesis-Related Proteins
Frontiers
No ratings yet
Ako Adjei2014
Document5 pages
Ako Adjei2014
Efraim Virrey
No ratings yet
Prediction of Interactions Between Hiv-1 and Human Proteins
Document5 pages
Prediction of Interactions Between Hiv-1 and Human Proteins
Dipyaman Kundu
No ratings yet
Jurnal Enzim 3
Document4 pages
Jurnal Enzim 3
Laris Donar Marukkap Sihombing
No ratings yet
Lombriz PCR
Document3 pages
Lombriz PCR
Norma Tamez
No ratings yet
CDC 89359 DS51
Document3 pages
CDC 89359 DS51
Kasun Herath
No ratings yet
PNAS 2005 Subramanian 15545 50
Document71 pages
PNAS 2005 Subramanian 15545 50
slick911
No ratings yet
(BIF 401) Current Solved Papers.
Document16 pages
(BIF 401) Current Solved Papers.
Sagheer Malik
No ratings yet
Unknown - 2010 - Book Announcements Book Shelf
Document209 pages
Unknown - 2010 - Book Announcements Book Shelf
marcos_de_carvalho
No ratings yet
Arabidopsis Thaliana: Designing A Computational System To Predict Protein-Protein Interactions in
Document4 pages
Arabidopsis Thaliana: Designing A Computational System To Predict Protein-Protein Interactions in
bikash
No ratings yet
Rticle: DNA Interactions Using Photonic
Document12 pages
Rticle: DNA Interactions Using Photonic
Besma76
No ratings yet
Research Paper On Piper Longum
Document7 pages
Research Paper On Piper Longum
soezsevkg
100% (1)
ISMB Neurotransmitter
Document5 pages
ISMB Neurotransmitter
Miha
No ratings yet
Atm 05 06 151
Document3 pages
Atm 05 06 151
Marius Voinea
No ratings yet
Myers 2005
Document6 pages
Myers 2005
Aldo Prianandi
No ratings yet
Mardinoglu 2014
Document14 pages
Mardinoglu 2014
Julia SC
No ratings yet
Whole Blood Gene Expression and Atrial Fibrillation - The Framingham Heart Study
Document7 pages
Whole Blood Gene Expression and Atrial Fibrillation - The Framingham Heart Study
agama_l
No ratings yet
MD Mehedi Hasan
Document30 pages
MD Mehedi Hasan
HMM007
No ratings yet
Sexually Transmitted Diseases Putative Drug Target Database: A Comprehensive Database of Putative Drug Targets of Pathogens Identified by Comparative Genomics
Document5 pages
Sexually Transmitted Diseases Putative Drug Target Database: A Comprehensive Database of Putative Drug Targets of Pathogens Identified by Comparative Genomics
Prasanna Babu
No ratings yet
Identification of Diagnostic Biomarkers For Idiopathic Pulmonary Hypertension With Metabolic Syndrome by Bioinformatics and Machine Learning
Document16 pages
Identification of Diagnostic Biomarkers For Idiopathic Pulmonary Hypertension With Metabolic Syndrome by Bioinformatics and Machine Learning
pauletta15bp
No ratings yet
5 Pernpasan Bagian Bawah
Document4 pages
5 Pernpasan Bagian Bawah
Wita Citra Dewi
No ratings yet
Relating Whole-Genome Expression Data With Protein-Protein Interactions
Document30 pages
Relating Whole-Genome Expression Data With Protein-Protein Interactions
Kaye Pales Calambo
No ratings yet
CIBB2016 Paper 7
Document7 pages
CIBB2016 Paper 7
Igk Adi Winata
No ratings yet
MD Mehedi Hasan
Document30 pages
MD Mehedi Hasan
HMM007
No ratings yet
Chemical Proteomics Applied in Target Identification and Drug Discovery
Document3 pages
Chemical Proteomics Applied in Target Identification and Drug Discovery
Free Escort Service
No ratings yet
Drug-Induced Liver Injury: A Meta-Analysis: NAT2 Polymorphisms and Susceptibility To Anti-Tuberculosis
Document9 pages
Drug-Induced Liver Injury: A Meta-Analysis: NAT2 Polymorphisms and Susceptibility To Anti-Tuberculosis
Sherif Edris
No ratings yet
Obesity: Index
Document34 pages
Obesity: Index
Mitul Shah
No ratings yet
Network Neighbors of Drug Targets Contribute To Drug Side-Effect Similarity
Document7 pages
Network Neighbors of Drug Targets Contribute To Drug Side-Effect Similarity
mcgilicutty
No ratings yet
Journal Pone 0080257
Document7 pages
Journal Pone 0080257
Deddy Herman
No ratings yet
Pacific Symposium On Biocomputing 2014
Document12 pages
Pacific Symposium On Biocomputing 2014
quimiza
No ratings yet
Xenobiotic Metabolism PDF
Document50 pages
Xenobiotic Metabolism PDF
roxy8marie8chan
No ratings yet
Category Unique - Person - Idlast - Name First - Name Middle - Name Suffix Contact - No
Document21 pages
Category Unique - Person - Idlast - Name First - Name Middle - Name Suffix Contact - No
Jofel Laygan Porras RN
No ratings yet
Gene Prediction
Document24 pages
Gene Prediction
JUDE serpes
No ratings yet
Crispr Presentation
Document6 pages
Crispr Presentation
api-666004858
No ratings yet
Biological Membranes
Document15 pages
Biological Membranes
Tazin
No ratings yet
9.1C: Types of Receptors: Learning Objectives
Document3 pages
9.1C: Types of Receptors: Learning Objectives
stalker ako
No ratings yet
National Seeds Policy 2002
Document15 pages
National Seeds Policy 2002
anirban6363
No ratings yet
ZFN, TALEN, and CRISPR-Cas-based Methods For Genome Engineering
Document9 pages
ZFN, TALEN, and CRISPR-Cas-based Methods For Genome Engineering
Romina Tamara Gil Ramirez
No ratings yet
Drug Delivery and Hydrotropism Dr. Sunil Kulkarni
Document5 pages
Drug Delivery and Hydrotropism Dr. Sunil Kulkarni
KRISHNA SINGH
No ratings yet
Real Time PCR (RT-QPCR) or QPCR
Document15 pages
Real Time PCR (RT-QPCR) or QPCR
Hamza Khan
No ratings yet
Module-1: Genomics and Proteomics Notes
Document8 pages
Module-1: Genomics and Proteomics Notes
Nishtha Khanna
No ratings yet
Eur J of Neuroscience - 2023 - Morrill
Document15 pages
Eur J of Neuroscience - 2023 - Morrill
Ram K
No ratings yet
Gene Expression: Central Dogma of Protein Synthesis: Pre-Assessment
Document4 pages
Gene Expression: Central Dogma of Protein Synthesis: Pre-Assessment
Brian Nguyen
No ratings yet
Αντίγραφο Του Jcr-2016
Document635 pages
Αντίγραφο Του Jcr-2016
Maria Papadopoulou
No ratings yet
Journal of Hazardous Materials Advances: Aashna Monga, Abhay B. Fulke, Debjani Dasgupta
Document18 pages
Journal of Hazardous Materials Advances: Aashna Monga, Abhay B. Fulke, Debjani Dasgupta
elescribano laboyano
No ratings yet
2014 AGTA Conference Handbook
Document139 pages
2014 AGTA Conference Handbook
VERSAILLES EXPRESS
No ratings yet
Recombinant DNA Technology - FactRecall
Document1 page
Recombinant DNA Technology - FactRecall
sabina
No ratings yet
Gmo Vs Selective Breeding
Document9 pages
Gmo Vs Selective Breeding
api-267067429
No ratings yet
Colon Cancer Treatment by Ricin Immunotoxin
Document17 pages
Colon Cancer Treatment by Ricin Immunotoxin
Menail Sajid
No ratings yet
Teacher: DR Gouri Course: Zoology Hons. 6 Sem Paper: Developmental Biology
Document17 pages
Teacher: DR Gouri Course: Zoology Hons. 6 Sem Paper: Developmental Biology
Apratim Singh
No ratings yet
Cellullar Respiration
Document29 pages
Cellullar Respiration
John Lloyd Balla
No ratings yet
CHSE Science Revised Syllabus 20-21 PDF
Document106 pages
CHSE Science Revised Syllabus 20-21 PDF
sandeep prASAD PATRO
No ratings yet
Unit Ii: Bioenergetics and Carbohydrate Metabolism
Document51 pages
Unit Ii: Bioenergetics and Carbohydrate Metabolism
Erjel J. Malabanan
No ratings yet
Sugiharto, 2018
Document29 pages
Sugiharto, 2018
Hicha Lidya
No ratings yet
Inside The Cell
Document84 pages
Inside The Cell
eof25
No ratings yet
Docking
Document13 pages
Docking
09sangeetachaudhary
No ratings yet