GENOMANCER

BACKGROUND: Human immunodeficiency virus (HIV) is a retrovirus that causes acquired immunodeficiency syndrome (AIDS, a condition in humans in which the immune system begins to fail, leading to life-threatening opportunistic infections). Infection with HIV occurs by the transfer of blood, semen, vaginal fluid, pre-ejaculate, or breast milk. Within these bodily fluids, HIV is present as both free virus particles and virus within infected immune cells. The three major routes of transmission are unprotected sexual intercourse, contaminated needles, and transmission from an infected mother to her baby at birth, or through breast milk. The AIDS virus infects 40 million people globally and has killed 25 million. Although there are about 20 different drugs on the market that can help control the virus, there is no cure and no vaccine. PROBLEM STATEMENT: 1) To predict the change in gene sequence of HIV virus under given cellular environments, by calculating the most probable extent of replacement of different nucleotides in the gene sequence after the given number of mutations. 2) To tell what is the effect of replacement of a particular nucleotide over the other nucleotides that is whether replacement of one nucleotide affects the replacement of other nucleotides, if yes then by what extent. ISSUES IN MEDICAL SCIENCE: Gene sequence of HIV virus undergoes many mutations in every 24 hours. The mutation is due to the absence of any gene correction mechanism in the HIV virus. This mutation acts as a defense mechanism for the HIV against the antibodies or medicines designed to attack it. Due to the absence of the knowledge about the HIV’s present gene sequence, the medical science, by so far, has not been able to plan an appropriate strategy to fight against the HIV. IMPLEMENTATION STRATEGY: We will take an arbitrary gene sequence of length about 100 nucleotides, as the original gene sequence is not known to medical science. Cells in human body have many transcriptional factors of the order of 103. These different transcriptional factors are responsible for the replacement of different sequence of nucleotides. With the help of the algorithms based on parallel processing and distributed computing we can give the information about the most possible changes a nucleotide or a sequence of nucleotides undergo after the given M number of mutations. Possible Approach:

Input: An arbitrary gene sequence, Cellular Environment variables- (transcriptional factors with their percentage of contribution in the cell). Number of mutations. Pseudo code: Each transcriptional factor will have a data structure, whose members will be:- name, percentage, this transcriptional factor affects what kind of nucleotide or the sequence of nucleotides. An array whose elements will be the above mentioned data structures. The n-ary (n can b taken till 100 or 1000 , more the value of n the more accurate the output will be) tree has to be taken under computation. Each node will contain the gene sequence. Each level from the root will indicate the number of mutation, that is the level number, k = number of mutations. Step 1: Root node will contain the gene sequence. An equation with cellular environment variables , gene sequence, will tell us the most possible change in the nucleotides and how the change in a particular nucleotide affects the change in the other nucleotides. we will take first n possible solutions and these n solutions will now be the children of the above node that is the children of the root node in case of 1st mutation. Now we can distribute the various branches of the tree to different computers using appropriate parallel processing algorithms. Step 2: Each child node will now act as the root node for the rest of the tree that will follow under it. Final step: After given M number of levels (mutations) the computing will stop and will give desired output. DELIVERABLES: Output: Affected nucleotides in descending order, the gene sequence (part of gene sequence) most susceptible to mutate, Who replaces whom and with decreasing order of probability.

Proposed by:
Vaibhav Saini B.Tech. (ICT), Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India E-mail: vaibhav_saini@daiict.ac.in

Deepak Gupta Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India E-mail: gupta_deepak@daiict.ac.in Abhishek Jain B.Tech. (ICT), Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India E-mail: abhishek_jain@daiict.ac.in Faculty Guide: Dr. Sanjay Chaudhary Associate Professor, Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India E-mail: sanjay_chaudhary@daiict.ac.in Tel: (079)-30510560