You are on page 1of 17

System Biology Assignment

NCBI
NCBI is the National Center for Biotechnology Information. The Center was founded in 1988 as a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). It is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper. The National Center for Biotechnology Information (NCBI) provides a comprehensive website for biologists that includes biology-related databases, and tools for viewing and analyzing the data inherent in the databases. A division of the National Library of Medicine at the National Institutes of health, NCBI is the agency responsible for creating automated systems for storing and analyzing the rapidly growing profusion of genetic and molecular data. One of the most difficult challenges faced in the field of bioinformatics is how to store, in an easily accessible manner, the overwhelming abundance of new information, including the sequences of entire genomes, the ongoing discoveries of new genes and gene products, and the determinations of their functions and structures. NCBI not only conducts research on biomedical problems at the molecular level using mathematical and computational methods, but also provides numerous free databases and molecular search tools, with extensive support documentation for these resources NCBI was established as the government's response to the need for more and better information processing methods to deal with this challenge. The NCBI houses genome sequencing data in GenBank and an index of biomedical research articles in PubMed Central and PubMed, as well as other information relevant to biotechnology. All these databases are available online through the Entrez search engine. The NCBI website contains several free computerized information-processing methods of biological information.

Basic Research at NCBI


NCBI has a multi-disciplinary research group composed of computer scientists, molecular biologists, mathematicians, biochemists, research physicians, and structural biologists concentrating on basic and applied research in computational molecular biology. These investigators not only make important contributions to basic science but also serve as a wellspring of new methods for applied research activities. Together they are studying fundamental biomedical problems at the molecular level using mathematical and computational methods. These problems include gene organization, sequence analysis, and structure prediction. A sampling of current research projects includes: detection and analysis of gene organization, repeating sequence patterns, protein domains and structural elements, creation of a gene map of the human genome, mathematical modeling of the kinetics of HIV infection, analysis of effects of sequencing errors for database searching, development of new algorithms for database searching and multiple sequence alignment, construction of non-redundant sequence databases, mathematical models for estimation of statistical significance of sequence similarity, and vector models for text retrieval.

The text below is a summary of the use of NCBI Understanding the genomic organization of genes; mapping a gene Understanding the exon/intron structure of a gene Searching for genetic and physical markers Accessing comprehensive information about a gene, its transcript(s) and protein(s), structure, activity, and location Studying an integrated genetic/physical/sequence map with the ability to zoom using NCBI's excellent Map Viewer (example) Obtaining genomic sequence] Protein function data Gene structure Association with diseases

Entrez
Entrez is NCBI's search and retrieval system that provides users with integrated access to sequence, mapping, taxonomy, and structural data. Entrez also provides graphical views of sequences and chromosome maps. A powerful and unique feature of Entrez is the ability to retrieve related sequences, structures, and references. The journal literature is available through PubMed, a Web search interface that provides access to over 11 million journal citations in MEDLINE and contains links to full-text articles at participating publishers' Web sites. Entrez returns search results that can include a combination of many types of data on the query, such as nucleotide sequences, protein sequences, macromolecular structures, and related articles in the literature. Prior to the creation of Entrez, an individual might have to place one query to a nucleotide database to find a nucleotide sequence, submit another query to a structural database to find the published structure of the gene product, and submit a final query to a literature database to find citations for journal articles on the query topic. NCBI recognized the time and effort that could be saved by a tool that could cross-link these databases and integrate all information related to a given query subject into one report. The Entrez Nucleotides database includes sequences from GenBank, RefSeq, and PDB. GenBank is the National Institutes of Health (NIH) genetic sequence database. GenBank, the DNA DataBank of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL) comprise the International Nucleotide Sequence Database Collaboration. These three organizations exchange data on a daily basis. The number of bases in the Entrez Nucleotides database currently grows at an exponential rate.

Database
A database is a structured collection of data. The data is typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). The term database is correctly applied to the data and their supporting data structures, and not to the database management system (DBMS). The database data collection with DBMS is called a database system

The Entrez Databases


The Entrez system comprises 40 molecular and literature databases. New databases are added as biomedical science advances and new kinds of data become available. An alphabetical list of the current databases with a brief description of each is given below.

BioSystems
The BioSystems database collects information on interacting sets of biomolecules involved in metabolic and signaling pathways, disease states, and other biological processes. BioSystems currently contains biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the EcoCyc (Escherichia coli K-12 MG1655) subset of the BioCyc databases and is designed to accommodate other data in the future. BioSystems records link to related literature, genes, protein sequences, structures, chemical data, to related BioSystems. When available each record links to detailed diagrams and annotations for individual pathways on the Web sites of the source databases.

Bookshelf
The NCBI Bookshelf contains a collection of full-text books that can be searched online and that are linked to PubMed records through research paper citations within the text. The collection includes biomedical textbooks, other scientific titles, the NCBI News, and NCBI help manuals.

Conserved Domains
Conserved Domains is a database of protein domains represented by sequence alignments and profiles for protein domains conserved in molecular evolution. It also includes alignments of the domains to known three-dimensional protein structures in the MMDB database. The source databases for Conserved Domains are Pfam, Smart, and COG.

dbGaP
dbGaP (Database of Genotypes and Phenotypes) provides the results of studies that have investigated the interaction of genotype and phenotype including genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.

dbVAR
dbVAR (Database of Genomic Structural Variation) contains information about large-scale genomic variation, including large insertions, deletions, translocations and inversions. dbVar also provides associations of defined variants with phenotype information.

Epigenomics
The Epigenomics database contains results of genome-wide studies on modifications of chromatin (histone

modification, DNA methylation, DNAase footprinting) in various cell types that assay programmable changes that affect gene expression (epigenetics). Data from these studies may be displayed graphically on the genome sequence using the NCBI graphical sequence viewer.

EST
The EST database contains sequence records from the bulk EST (Expressed Sequence Tag) division of GenBank. These are typically short single-pass reads from cDNA libraries often generated as large survey project. Data from EST can be used to catalog expressed genes for a particular organ, tissue or cell type or general for a species, and compare expression levels of genes in various library sources.

Gene
Gene is a searchable database of genes, focusing on genomes that have been completely sequenced and that have an active research community to contribute gene-specific data. Information in Gene records includes nomenclature, chromosomal localization, gene products and their attributes (e.g., protein interactions), associated markers, phenotypes, interactions, and links to citations, sequences, variation details, maps, expression reports, homologs, protein domain content, and external databases.

Genome
The Genome database contains sequence and map data from the whole genomes of over 1000 species or strains. The genomes represent both completely sequenced genomes and those with sequencing in-progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.

Genome Project
Genome Projects collects information on complete and in-progress large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms. The database is organized as a set of organism-specific overviews that allow browsing and retrieving specific projects for that organism.

GEO Datasets
GEO Datasets stores curated gene expression and molecular abundance data sets assembled by NCBI from the Gene Expression Omnibus (GEO) repository of microarray data.

GEO Profiles
GEO Profiles is a database that stores individual gene expression and molecular abundance profiles assembled from the Gene Expression Omnibus (GEO) repository of microarray data.

GSS
The GSS database contains sequence records from the bulk GSS (Genome Survey Sequence) division of GenBank. These are the genomic equivalent of EST records; short single pass reads from gDNA libraries. Insert end and other reads from BAC and other large insert genomic libraries used to identify and assemble candidates for genome sequencing are common examples of GSS records.

HomoloGene
The HomoloGene database contains automatically generated sets of homologous genes and their corresponding mRNA, genomic, and protein sequence data from selected eukaryotic organisms. Potential homologs from other organisms are included through sequence similarity to UniGene clusters.

MeSH
MeSH (Medical Subject Headings) is the National Library of Medicine's controlled vocabulary and classification system (ontology) used for indexing articles in PubMed. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Searches in the Entrez MeSH database provide synonymous MeSH terms that can provide more useful results in PubMed. The MeSH database records show subheadings access the MeSH browser showing related concepts and hierarchical relationships among MeSH terms.

NCBI Web Site Search


NCBI Site Search is database of static NCBI web pages, documentation, and online tools. Searching this database is a quick way to find specialized online sequence analysis tools, back issues of newsletters, legacy resource description pages, sample code, and other miscellaneous resources.

NLM Catalog
The NLM Catalog contains records for books, journals, audiovisuals, computer software, electronic resources, and other materials in the National Library of Medicine (NLM) collections. The old Journals database was merged into the NLM Catalog database and the information once retrieved via Journals, is provided by the NLM Catalog. This includes data such as journal title, MEDLINE abbreviation, NLM ID, ISO abbreviation, or ISSN.

Nucleotide
Apart from sequence data in the EST (Expressed Sequence Tag) and GSS (Genome Survey Sequence divisions of GenBank, theNucleotide database contains all the sequence data from GenBank, EMBL, and DDBJ, the members of the International Nucleotide Sequence Databases Collaboration (INSDC). Nucleotide also includes NCBI-curated Reference Sequences (RefSeqs), submitted assemblies and annotations from the Third Party Annotation (TPA) database, and nucleotide sequences extracted from structure records from the Protein Databank (PDB).

OMIA
OMIA (Online Mendelian Inheritance in Animals) is a database of genes, inherited disorders and traits in animal species (other than human and mouse). The database contains textual information and references, as well as links to relevant records from OMIM, PubMed, and Gene.

OMIM
The OMIM (Online Mendelian Inheritance in Man) database contains review articles human genes, genetic disorders, and other inherited traits. OMIM articles provide links to associated literature references, sequence records, maps, and related databases.

PopSet
The PopSet database contains related nucleotide sequences that originate from comparative studies: phylogenetic, population, environmental (ecosystem), and mutational. Each record in the database is a set of nucleotide sequences representing the same molecule from the same species (population, mutation), different identifiable species (phylogenetic), or anonymous species from the same biological community (ecosystem).

Probe
Probe is a database of nucleic acid reagents designed for use in a wide variety of biomedical research applications including genotyping, gene expression studies, SNP discovery, genome mapping, and gene silencing. Probe records contain information on reagent distributors, probe effectiveness, and computed sequence similarities.

Protein
The Protein database contains amino acid sequences created from the translations of coding regions provided on nucleotide records in GenBank, EMBL, and DDBJ, the members of the International Nucleotide Sequence Databases Collaboration (INSDC) as well as those from coding regions on NCBI Reference Sequences and the Third Party Annotation (TPA) database records. Protein records are also imported from the outside protein-only data sources Protein Information Resource (PIR), SWISS-PROT, Protein Research Foundation (PRF). Protein sequences are also extracted from structure records from the Protein Data Bank (PDB).

Protein Clusters
Protein Clusters is a collection of related protein sequences (clusters) consisting of Reference Sequence proteins that are encoded by complete prokaryotic genomes as well those encoded eukaryotic organelle plasmids and genomes. The database provides easy access to annotation information, publications, domains, structures, external links, and analysis tools.

PubChem BioAssay
PubChem BioAssay is a database that contains bioactivity screens of chemical substances described in PubChem Substance. It provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to that screening procedure.

PubChem Compound
The PubChem Compound database contains unique, validated chemical structures (small molecules) that can be searched using names, synonyms or keywords. The compound records may link to more than one PubChem Substance record if different depositors supplied the same structure. Structures in PubChem Compounds are preclustered and cross-referenced by identity and similarity groups. Additionally, calculated properties and descriptors are available for searching and filtering of chemical structures. Compound records are linked to related PubChem Substance Records, PubMed citations, protein 3D structures, and biological screening results that are available in PubChem BioAssay.

PubChem Substance
The PubChem Substance database contains information on chemical substances including mixtures electronically submitted to PubChem by depositors. This includes any chemical structure information submitted, as well as chemical names, comments, and links to the depositor's web site.

PubMed
PubMed is database of citations and abstracts for biomedical literature from MEDLINE and additional life science journals. Links are provided when full text versions of the articles are available through PubMed Central or other websites.

PubMed Central
PubMed Central (PMC) is the U.S. National Library of Medicine's digital archive of life sciences journal literature. PMC contains full-text manuscripts deposited by authors or articles provided by the publisher.

SNP
The SNP (Single Nucleotide Polymorphism) database is a central repository for single nucleotide polymorphisms, microsatellites, and small-scale insertions and deletions. Both submitted SNPs and NCBI-produced non-redundant reference records (RefSNPs) that cluster reports of the same polymorphism from different sources are available. SNP also contains population-specific frequency and genotype data, experimental conditions, molecular context,

and mapping information for both neutral polymorphisms and clinical mutations.

SRA
The SRA (Sequence Read Archive) contains sequencing data from the next generation sequencing platforms. SRA accepts and presents data from all current next-generation sequencing platforms including 454 (Roche), Illumina, SOLiD (Applied Biosystems), HeliScope, and Complete Genomics. Data can include sequence, quality scores, color values, and intensity graphs depending on the platform involved.

Structure
The Structure or Molecular Modeling Database (MMDB) contains experimental data from crystallographic and NMR structure determinations. The data for MMDB are obtained from the Protein Data Bank (PDB). Structure records link to bibliographic information, the sequence databases, and to the NCBI taxonomy. Cn3D, the NCBI 3D structure viewer, allows for easy interactive visualization of molecular structures from Entrez.

Taxonomy
The Taxonomy database contains the names and phylogenetic lineages of the more than 160,000 organisms that have molecular data in the NCBI databases. New taxa are added to the Taxonomy database as data are deposited for them. The taxonomy records include links to all molecular data for the organism or group as well as links to outside classification resources. The taxonomy provides the major controlled vocabulary for classifying molecular data across the Entrez system.

UniGene
UniGene is a database that provides automatically generated nonredundant sets (clusters) of transcript sequences, each cluster representing a distinct transcription locus (gene or expressed pseudogene). UniGene clusters also provide information on protein similarities, gene expression, cDNA clone reagents, and genomic location.

UniSTS
UniSTS is a comprehensive database of sequence tagged sites (STSs) derived from STS-based maps and other experiments. STSs are defined by PCR primer pairs and are associated with additional information, such as genomic position, genes, and sequences.

Parkinson's disease
Parkinson's disease most often develops after age 50. It is one of the most common nervous system disorders of the elderly. Sometimes Parkinson's disease occurs in younger adults. It affects both men and women.In some cases, Parkinson's disease runs in families. When a young person is affected, it is usually because of a form of the disease that runs in families. Nerve cells use a brain chemical called dopamine to help control muscle movement. Parkinson's disease occurs when the nerve cells in the brain that make dopamine are slowly destroyed. Without dopamine, the nerve cells in that part of the brain cannot properly send messages. This leads to the loss of muscle function. The damage gets worse with time. Exactly why these brain cells waste away is unknown. The brain cells and nerves affected in PD normally help to produce smooth, co-ordinated movements of muscles. Therefore, three common Parkinson's symptoms that gradually develop are: Slowness of movement (bradykinesia). For example, it may become more of an effort to walk or to get up out of a chair. When this first develops you may mistake it as just 'getting on in years'. The diagnosis of PD may not become apparent unless other symptoms occur. In time, a typical walking pattern often develops. This is a 'shuffling' walk with some difficulty in starting, stopping, and turning easily. Stiffness of muscles (rigidity), and muscles may feel more tense. Also, your arms do not tend to swing as much when you walk. Shaking (tremor)is common, but does not always occur. It typically affects the fingers, thumbs, hands, and arms, but can affect other parts of the body. It is most noticeable when you are resting. It may become worse when you are anxious or emotional. It tends to become less when you use your hand to do something such as picking up an object. The symptoms tend slowly to become worse. However, the speed in which symptoms become worse varies from person to person. It may take several years before they become bad enough to have much effect on your life. At first, one side of your body may be more affected than the other. Most people with Parkinson's disease have idiopathic Parkinson's disease (having no specific known cause). A small proportion of cases, however, can be attributed to known genetic factors. Other factors have been associated with the risk of developing PD, but no causal relationships have been proven. PD traditionally has been considered a non-genetic disorder; however, around 15% of individuals with PD have a first-degree relative who has the disease.[2] At least 5% of people are now known to have forms of the disease that occur because of a mutation of one of several specific genes.[22] Mutations in specific genes have been conclusively shown to cause PD. These genes code for alphasynuclein (SNCA), parkin (PRKN), leucine-rich repeat kinase 2 (LRRK2 or dardarin), PTEN-induced putative kinase 1 (PINK1), DJ-1 and ATP13A2. In most cases, people with these mutations will develop PD. With the exception of LRRK2, however, they account for only a small minority of cases of PD.[4] The most extensively studied PD-related genes are SNCA and LRRK2. Mutations in genes including SNCA, LRRK2 and glucocerebrosidase (GBA) have been found to be risk factors for sporadic PD. The role of the SNCA gene is important in PD because the alpha-synuclein protein is the main component of Lewy bodies. The LRRK2 gene (PARK8) encodes for a protein called dardarin. The name dardarin was taken from a Basque word for tremor, because this gene was first identified in families from

England and the north of Spain. Mutations in LRRK2 are the most common known cause of familial and sporadic PD, accounting for approximately 5% of individuals with a family history of the disease and 3% of sporadic cases. There are many different mutations described in LRRK2, however unequivocal proof of causation only exists for a small number

Availability of literature : While searching for literature on OMIM and PubMed it was seen that a lot of research
has been done regarding Parkinsons disease as about 258 research articles were found in OMIM and 130 research articles in PubMed

PINK1 PTEN induced putative kinase 1 [ Homo sapiens ]


This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive earlyonset Parkinson disease. This gene is present on chromosome 1. >gi|224589800:20959948-20978004 Homo sapiens chromosome 1, GRCh37.p10 Primary Assembly
CGCAGAGGCACCGCCCCAAGTTTGTTGTGACCGGCGGGGGACGCCGGTGGTGGCGGCAGCGGCGGCTGCG GGGGCACCGGGCCGCGGCGCCACCATGGCGGTGCGACAGGCGCTGGGCCGCGGCCTGCAGCTGGGTCGAG CGCTGCTGCTGCGCTTCACGGGCAAGCCCGGCCGGGCCTACGGCTTGGGGCGGCCGGGCCCGGCGGCGGG CTGTGTCCGCGGGGAGCGTCCAGGCTGGGCCGCAGGACCGGGCGCGGAGCCTCGCAGGGTCGGGCTCGGG CTCCCTAACCGTCTCCGCTTCTTCCGCCAGTCGGTGGCCGGGCTGGCGGCGCGGTTGCAGCGGCAGTTCG TGGTGCGGGCCTGGGGCTGCGCGGGCCCTTGCGGCCGGGCAGTCTTTCTGGCCTTCGGGCTAGGGCTGGG CCTCATCGAGGAAAAACAGGCGGAGAGCCGGCGGGCGGTCTCGGCCTGTCAGGAGATCCAGGTGAGCGGG GCCGGGTCCTAAGCCGAGCGGAGGACGGAGCTAAGCGCGGGGGCGGGTCCTCAGCTGGGTGGGGGCGGGG CTAGGTGTGGAGGCGGGGCTCTGAGCAGATCGAGGGCCGAGGCGAGGGTCCTTAAAGCTCATCTATTTCA CCATTACTGATCGGCTGCTATAAATAAAGCCAGCACCTCCCATTTGTTTTAATGTTTCCCTTCCTCAAAT GAAGACATGTTGCCGATTACAGCTCCTGTCGCAGCACAGCAAAAGGCTTTGTGTAAATTTTCTAAAATGT ACGGACAACTAAATCATAACATTCCTATCCCTTTGAGGTAGTTGCCGTCCCTAATTTATGGAGAAGGAAA GTCCTCAGGTGAAGGGACTTGCTCGAAGTCACACAGCTAATAAAATGCAGTGCCCTTAACCACTGAGCCA GGCTGCCTCCGCCGTTTAACCAAAGGATTAGTAGTGACAGAGCTGAAACCGCAGTAAAAACTATGAACGG CGAGAAAAACAGTCCTAACATTTTAGTTACCTGTGTAGAGTTATCCCTGCTGACTGGATACACAAGGGTT CTTAGGGTTTTTTAATGCTTAAAATAGCACAAGACTTCTCTTTTTGCCCAACCAAAGTCTGTATTAGGGT TCTCCAAATGGAATCAATAGGATGTGTCTATATAGAGGCAGGTTTATTTTGAGGACCTGGCTCCCTATGG GGATTGGCAAAGTCTTAAAAGCCGCAGAGTAGGCGGGCAGGCTGGAGAGCCAGGGAGGAGCCAGCGGTGC AGTTCAGGTTTGAAGCCTGGCCGCTGGCAGAATTCCATCTTCTTCCAGGGAGGTCACTCTTCTTCCTGGG GGTGTAGAACCACTCACTAAATTAGCATAGCCCCTGCATTTTACAGGTTAGGGCTGAGGTGGTGGAAGAG GGAGTGACTTGCCCAAGGACACAGCTGTTAGGGCCAAGCAGTGGCTCCTGGGTTTCCTGGTTTCCATCCC AGTTCTTATTGCTCATCACCACTGTCTCATGTTTGAGCTCTGGCCAGTTTGGGGTGACAGGTGACATCTG GCCTAGTCCCCAGCCCCTGACCTTGTCTTTTGCCACAGCTTAACTGGCAGAAGCTAAGGATGGGAAATTT GACTAATCCTGCTTAAAACTAAAGAGGCTTTTTTAACTGAGGAGATTGATCCTCCTAAACTTACCATTCA CACACACCTTCTTCGCACACTTCACCCTCCTATGCCTGAAAATGTTATTAGTTATCAATTAATTTCACAT TAAAAAAATTTTTTTTGGTCAGGCACTGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCG GGCGGATCACGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGTGGAACCCTGTCTCTACTAAAAAT ACAAAAATTACCCGGGCGCAGTGGTGGGCGCCTGTAATCCCAGCTACTTGGGAGACTGAGGCAGGAGAAT TCCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCTGAGATCACGCCACTGCACTCCAGCCTGGGTGACAG AGCAAGACTCTGACTGGGGGGAGAAAAAGTTGTTTTCACTGGCTACTTTTGCTGGAATTAATTTCACATT TAAAAAATTCTGGGCCGGGCGGGGTGGCTCATGCCTGTAATCTCAGCACTTTGGGAGGCCAAGGCGGGCG GATCATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGCCTCTACTAAAAATACAA AAAAATTAGCCAGGCATGGTGGCAGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGG CGTGAACCCGGGAGGCAGGGCTTGCAGTGAGCCAAGATCGCGCCACTGCACTCCAGCCTAGAGGACAGAG TGAGACTCCGTCTCAAAAAAAATAAAAAATAAAAAAATAAAAAATTCTGAAGCCAGGCATGGTGACTCAT GCCTATAATCCTGGTGCTTTGGGAGGCCAAGGTGAGAGAATCTCTTGAGCCCAGGAGTTGAAGACCAGCT GGGACAACATAGTGAGACCTTGTTCCCACAAAATATTAAAAAGTTAGCTAGGCATGGTGGCACATGCCGA

TAGTCCCAGCTACTTAGGAGGCTGAGGTGGGTGGATTGCTTGAGCCCAGGAGTTTGAGGCTGCAGTGAGC TGTAGTTGCAACACTGCACCCCATCCTGGGCAACTAGCAGGAGTGTGCTAGTAGCAAGCTCTAAAAATTT ATATTATAAAAATATATATATAATGTGTATATATATGTATATGTGTATATATGTTGTGTATACTATAATA TGTATGTTCCATATAATAGTATATAGTATATTTTATATATATACACACACACGTTGCTTCTGCTCCATGA AGTTTATAAGGGAAGTTGGATTTTCCTTTTTTTTTTTTGATACGGAGTCTCACACCGTCGCCCAGGCTGT GCAATGGCACGATCTCAGCTCACTGCAACCTCCACCTCCCAGATTCACGCAATTCTTCTGCCTCAGCCTC CTGAGTAGCTGGGATTACATGTGCACACCACCACACCCGGCTAATTTTTTGTATTTTTTAGTAGAGACGG GGTTTCACTATGTTGGCCAGGCTGGTCTTGAACTCCTGACTTCATGATCCGCCTCCCTCGGCCTCCCAAA GTGCTGGGATTACAGGCATGAGCCACCACGCCCGGCCTGGACTTTCATTTTTTTAAAAGGTTCAAAGTTA TAGTGCAAGGTTAAGTTGCTTACAGATAGCAGGTGCTTTAGGAACCCTCTAGAAGGAAAGTTGCCATGGG CTGGGTCATCAGGGCTGGTCACAGAGGAAGGAGGAGCTTGAGTTGGGCTTGATGGCTGAGCAGGGATGAG CCAGACAGAGCAATACTATAGAAACAGGGGCTGGAAGGATGCTGTGAGCAGCTTTCAGAGGGAGGCGTCC AAAGGATGAGTGCAGGCAGCGTGCAGACCAGCCTGGCAGCAGCCAGAACACAGATAACCTCCTGGGCAGT CTGATGGACAGCCGAGTGACACATGAAAGCAACATATTTTGATGTGGCTCAGAGCAGGGAAGGCACGGCA CTGAGAGTGGGGTAGTCTAGAGATGGTGCCTGGTAGGCTATGGCCAAGGGGCATATGCAGAGCTGTGCCT GGCACCCAGGGGTGACTAACCTTGGAAAGGAAGAGCTCAGCAGATCAGACCATTTGAGAGGGAAAGCGGA GCCTGGGTGACTGGAAGAGTGACGTCACCGCTGCCTGAAGCAGGGGAAGACCCACCAGATTTTTGTCACA TGGCCTCTTTCCCAGTGCAGCACTGCCCCCAGCACCTGCACCTCCATCTATTTCTGTCTGAACTCTTGTA CTCTTGCTACCTTTCTGGACAGAGTAGCCTCATCGGGGCCCTAAGAGGCAGAGAGAGAAAGACAGTACTG AATCCCTCTGCTCTGGGGATGCGCCTGATGTCACCCACTTGCCTGAGAGACCCTCTGAGATCACTGTGGA TCACACACTCCTGCTAGTTGCCCAGGAGCCGTCAGCCAAGGTCTTTGCAGAGCGAGCTGTCTCCATAATC AGACACCTCCAGAACTCTGCTGGAGAGTGAAGGCAGCAAGGAGAGGTGCACTGGCTGCGGGACGTGGGGT TTCTGACCTCTCAGATCATTGAGTATTGTGATCCCAGTGAAGCAACAGAGTTTGAGAAAAATCATTCTGA ATAATGAGAAAGAAGATGGCCTGGACCCAGGCTGAGCAGTAGAACCTGGTTGGGTTGTGTTTTTCTGGTT TATTGATCTGGTCGACGTGGACCACGCCTTGCTGCACCTCTCTCTGCCTCCCCTGTTTCCCTTTTCTTGG GCCTTCCTAGGCTCCCTGGCTCACGGTGCATTCTTTTCTCATCACAGGCAATTTTTACCCAGAAAAGCAA GCCGGGGCCTGACCCGTTGGACACGAGACGCTTGCAGGGCTTTCGGCTGGAGGAGTATCTGATAGGGCAG TCCATTGGTAAGGGCTGCAGTGCTGCTGTGTATGAAGCCACCATGCCTACATTGCCCCAGAACCTGGAGG TGACAAAGAGCACCGGGTTGCTTCCAGGGAGAGGCCCAGGTACCAGTGCACCAGGAGAAGGGCAGGAGCG AGCTCCGGGGGCCCCTGCCTTCCCCTTGGCCATCAAGATGATGTGGAACATCTCGGTAAGCACCAGGCCT TTCATCTTTAAAGGAGATGTTCTCAAAATGCCCATCTTAGTGGGCCTGGTGAGGATTTTTTCCAGGAAGT AGGTAGGAAAAGACAGATTATCCACAGGAAAGGTGCCTGTATAGTCAGGTTACCTCCCCCTGTTACACAG AAGTCTAAACCAGAATGTTTGAAATTGTATTCTGCCCACAGATCACCCAGAGATCAGGTTCAAAGGTAGA TTTTGGCCAGGCCTGGTGGACCATGCCTGTAATCCCAGCACTTTGGGAGGCCAATCACTTGAGCTCAGGA GTTTGAGACCAGCCTGGGCAACATGATGAAACCCCATCTCTATGAAAAATACAAAAATTAGCCGGGTGTG ACAGCTTGCACCTGTATGTAGTCTTAGCTACTTGGGTGGCTGGTGGCTGAAGTGGGAGGATCACTTGAGT GCAAGAGGCAGAGGCTGCAGTGATCCAAGATTGTGGCACTGCACTCCAGCCTGGGTAACAGAGTAAGACC CTGTCTCAAAACAAAACAAACAAACAAAAAAAACCCAAAGGTAGATTCTGATTCAGCAGGTCAGGGATGG GGCCTAAGATTGTGCCTTCCCAATAAGCTCCCACGTGATACCAATGCCACTGCTCTGTCAACCACACTTT GAGTAACAAGTGCCTAAACCACTGATTTCTACCTTGATTGGTTAGTTCACCATCCCCTCCTTCCTGCAAC CTGCCTCACTGGAGAGGGGGAACTGGATCTCTTTCCTTTAAGCTCACTCTGCAAGCCCACCTCCAGCTCA GAGGCTTGCATGTCAACTCCCTACTCTCAACTCTTGGAAGGACTCAAGACTTTGGGAAATTGAAGAGTAT TTTTCCCCCACGAAAACAATTTCTGTCCACTTAATTTATTTACCTTGCCATTGCTAGGTAGGTTTTAAAA ATGTATTAAGACCCGATGCTCTGTCTCCTGCCCATCCCTGGGCCCATCCATTCAAGGCCGCAGTGGGAAG TGGCCACAGCCGTTCATGGTGGGCCCTAGAGGATGCAGTTTCTTGTTCTGATTTGGGGAATGAAGGTTAT TAGACTGTTGCACTCCAAGCTGGGTTAATAGCTAACAAGTAGCCAAATCCTTTGCCTGTGAAACATCCAG TCTTAGCATCAAAACATGCCCTGGTTGGGGCATGACAGCTGTTGCATTCACAGCAGCTACAAGATTGTTG ATTCTTACCAGATCTTCTTGATACAGGTTTCATATGTCCTGAGAGCCCTTCCCAATACAATGAGGACAAT AAGGTGCTTTCCCCACTCTTTGGTTTTTGTCTGTTTGTCGTGGGGGATGGAGGAAGGCTGTTCTGTGATT CAGTACAACTTCACCCAGGGTGGAGCATGTCAGGCTCTGGCCTGGCTGCTGAAGGGAGCCTGCAGAATGA GTTAGTGATCCAGTGGTCAATTTTGAACTGGTGGGGACCAGAGGCACCGATGGCAGGAAGCAGCACCCCA TATCCTGATCACCTTGGCATCTCCTCCAGCCCTGGCATCTAGGCTGCAAGAGTTTGAGGAGTGTGAAGAA TCCTCTGAGTTGGCATGGATGGTACCTCTGTCTGCCTCCCAGGAGTAACTAGTCTCAGCCTGCCAGTTAA GACAGGTCATCTTATCTCGAAGGTCAGAGCCAATTCTAGGCAGTAGCTGCCCTGCTCCAGGTTACAGGCA GGGCTTACAAGGAACTTACCATTCTGCTCCGGCCTGTGTAACCCTGGGTTCCTTGTGGGTGTTCCAGGCA GGTTCCTCCAGCGAAGCCATCTTGAACACAATGAGCCAGGAGCTGGTCCCAGCGAGCCGAGTGGCCTTGG CTGGGGAGTATGGAGCAGTCACTTACAGGTAAGTGCCCTCTGCCTGCCAGACTGACTGGGACTTCTTTGA GAGCAACTTCATCCATCACTTATGTCCTCAGCACCTGGTACAGTGTCTGATATGACAGTAGATAATAAAG

GCTTAATGTTGGTGATGGATTTTCAGTTAGTGGATAATTTCACTTGGGAAAGATTGCAGGTAATCTGACC CCAAATGATGATGCACTTGCGTAATTCACATTGGAGCAGGGGAGAGGAGGCCCCCCAAAAATGCCCAGTT CACAGTGTTGCATGATTGACTGGGGTTCTCAGATTCCTCCTAAGAAATGCACGGGTAGAGCGCCACCTAT CGGAATAAACTGAACTCTGTCCCCACCAGAGGGAACACTCATTTCACTAATAGTTATGTGCACTGATGGT GCCAAGAGATTTTAAAAAAAACAAAAAGTGGTCTGTTGGCCCAGAGGTCTCAGTGTGGCAGCAGAGAGCC CAGGGTGTAAGTGCTAGAATCGAGTATGCTTGGGCTGAGGGAGCCCAGGGGAGGCGTGTGCTGCAGAGGA GGGGCTGCTCAGAAAGCCTTCTCAAAGGGACAGTTTGTGCCACCTTGTGAAGGATGAATTGGCATTACTT GGGCAGAGGAGTAGGGAAGTGGCATTCCCAGCAGAGGGAAGAGCAAGTGAACAGCAGCTACGCTTTCAGC TCAGAAAAGCCCATGTCCACCACCAGCCTCACCACAGGTGGTGAGGACTGACGCGCAGGCTATGACAGAG GAGATACTTGTGCCCACCTCAATATTTACAGCCTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCTG AGGTAGGCAGATCACTTGAGCCCAGGAGTTCAAGAGCAGCCTGGGCAACATGGTGAAACCCCGCCTTTAC AAAAAATAAATTTTAGCCGGGCATGGGGGCAAATGCCTGTGATCCCAGCTACTCAGGAGGCTGAGGCGGG AGGATCATTTGAGTCCAGGGAGGTTGAGGCTGCAGTGAGCCAAGATCATGCCACTGCACTCCAGCCTGGG CGACCCTGGAGTGAGACCCTGTCTCTCATTTGTAGACCCACCAAGAAGAGGTGGGTCTGCTGCATTTTTG GTGATTTACTGGAGGGCTGGTTTATAGATCTTCATTCTGTCAGAAGCAGGGAGGCAAAAGTATGAAATTA AGGGAGTGGTTGTGGAAAACCCCTTCCATGGTTTGGAGGTTTCCAATGTGACTGGGAGTCCCTGCAGGCC GGTGGAGGTAGCTGCTCAAGTGGCTGCTGCTTCTCCTGAGGCCTTTTTGGAGAAAGTGGACACCTGAATG TCAGCTGCTTTGGGGCTAACATGATCCTTGATGCCTCCTTTTGTGGCATGAGTGGCAGCCGGCCGACGTG GTGCTGTCCTGCTGCCGGAGCACCATGATGTCTGCTGCTGAGACCTCCCATCTGACATAGTCCCTGTCCC TCTTCAGGGACTTTGTTCCTTTAGCAGTTCTCACTGTCTGGCCTCAAATAACACGTCTTTTTCTGTTGGT TCCTTTTTGCTCCAGCTGTACTGTAAACACTCTTTGTTTACTGCATACCCTCAGTAAATACCTGTTGAAT GACCGGATAGATGTGGCAGCAGGTACATTACCTCAACTTCTGGTACACCTTGGGAGAAGCTTTCCCAAGA ACACCCTGGGTTCATTTCCTCCTAGCCTCTGCTCTCCTGGGGCCCAGAGATTGAAGGCGCTTAACCTGCT CATCTCACCACGTCTCCCGCCTTATCTCTCACCCTTCTCATCAGCACCCTACACTCCACCACGCTGGCTT CATAGCTCCTTCCCCGAATGTGCCAGGCTTCTCTGTGACCATCTCTTGAGCGTACAGCTGGCTATACCTG GGCTGCCCTCCTCCCACATTTCAGGTCTCAGTTCAGACACCCGCAACCCCACCATGTATCTCCCCAGTAC AGCCATACTCCTCTGCCCCTGAGCTCCCATGACACCTGCTGGGGCCCTGACAGCCTGGGGCTGTGATCAT GACTTGCCCAGGGGCCCGAGGGTGGAAACGATGCTCTGGCTCCTTTGATTGCATAGAACAGGGGCCACTC AGGTTGACTCAAGAGCAGGAGCAGCGCGTGGGCACACGTGGACTGCAGCCACACAGCCTGGGGACCATGC AGTGCTGGGAGAGGCCGGTGCCCTGCTCTCTCCCCAGCACCGTCTAGGCTCTGCCCCATTCGCTTCCCTC CACCATTGTTATGCAGCAAAGGGGGCTCTAGCCTGATGTGCTAGAAGCAGTCACACTGGATTTTTGAGAA AAGCCAAGCTTTCTATTGTGAGTCGACTCATATGGAGACAGGAGTTGAATTCAACCCTGTCTCCCTGTGC TAGCTTTAAGGCGGTAATTTTATTAGAGGAGGTTTAAGGGGTGGATTCTAAGATTAGCAGGTGATTGATG GAAGGAAAGGAGAAGTCTGGACAGTCCTTGGACATGCACAGTTATCTGTTCATGCCAACTCATGGGTCCC CTGTGCAGATTTGGGAGGAGTGAGTATGAAACGTGCAGTGGCAATTCAGGCTTTGACATCAGCAAACTTG TTCTGTGCAAGCTGCAATTGGCCTTATTGGTTCCAACCAATTTCAGCCAGTTCTTTTATCTCATAAGCAG AGGGAGTTTCAGCCTTTCAGAAAGTGGTTTCTGCAAACTCAAATTTTCTTTTATTTTTCTGAGACAGGGT CTTACTCTGTTGCCCAGGCTGGAGAGCAGTGGCGTGATCTTGGCTCACTGCAGCCTCAACCTCCCTGGGC TCAAGTGATCCTCCCACCTCAGCCTCCCAAGTAGCTGGGTCTGCAGGCACATGCCACCATGCCCAGCTAA TTTTTGTAGTTTTTGTAGAGACAGAGTTTCTCCATGTTGCCCGGGCTGGCCTCGAACTCCTGGGCTCAAG CAATTGCCTGCCTCAGCCTCCCAAATTGCTGGGATTACAGGAGTGAGCCACGGGAATTTCCATCAGTTAC TGATTCTTGAACTTGTAGGAACGTGGTTCCACTGTCCACATGCTGTTCTTTCCTCGCATCTCCTGTTCCT GCTTCCCTTTGGCCTGCTCACCTCTGGCTCCTCATGGCCCCAGTGAAGTATCGTGGCTTCCTCTCCACTG CAGACTGGCAGATTTCTTCACTCACACTCCCAAAAAGAGAATCTACTTCCCTCCCTTCCCTTCGTCCAAC CATCTGGGTCTCGAGTGTCACTGTAGGTTCACTGCCGTTGGGGCAGGTGACTCCTTGGGCCAGTCAGCTT TGGCTGAAGGGCAGGGCTGGCACACAGCTGGCAGAGCCACCCCATCCAAAATCCAGCCACTGTGCCTCAG CAGGGCTGTGGTTTGGGGGTCGGGGGGCGCTCCCCACAGGATAAGGAGCAGCCAGGGCAGGCAGATGACA CTACCACTACTAGCTGCATCTGGATCTATTCACTTATTCAACAAAGGTTTATTACACCCATATTCTATGC CAACTACTGTACTAGACTTGGAATAATGGAGAAAATTCAGGCACAGTCCTTGCTCTTAGGGAGCTCACAT GATTACTCATTGATGAATATCTCTCTCCCCTGATAGACTGCAAATCCATGAGGACAGGGACTCTGTCTTG TTCTGTGCCATAGTTCCACAGCATTCCAGCATGATGTCTGGTACTCAGCATGCAAACAGTGCGTATTGGT TGAATGAATAATAGTACAGCTAATAGCAGCTGAGAGCTTCCTCTGTGCTAGACACGATTCTAGGTGCCTT CCACACTTTGACAACCTTATGCAGCGTGTTAACACTGCATTCCCCTTGCACCCTGGAGATTGAGCCCAGA GAAGATAAGGACAGAGGGGACACATTCCAAGATAGATCCCCCAGTGGATGCCTGAAGCCACAGATAGTAA CAAACCCTATATATACTGTGTTTTTTCCTATACACACATACCTATGGTAAAGTTTAATTTATAAATTAGG CACAGTAAAAAATTAAAGATAACTAATAAAATAGAATAATTATGACAATATACTGTAATAAAAGTTATGT GAATGTGGTCTCTCTCTCTCTCCTCTCTCTAAATATCTTCTTGTACTCACCTATTCTCAAACTGTAGATA ACTGAAATTGCAGAAAGCAAAACTGTGGATAAGGTTGGGGCTACTGCAACTCAGCCAAGGACTTACAGTG

GCAGACCTGGAATGCAAACCCAGGCAGTCTGATACCAGGGTTTCTCCACCTGTGTTCTGCCACATGGGAT GGATAGGATGGATGGGTGTACGGATGGACGGACGGACAGACGGATGGACAAATGGATTGAAGGATGGGTA GGGGAGTCATCAGATGTGTTCTCCAACACCATGTACAGTACCTGGCACATAGCAAATCTATGATAAACAT TTGATAGTAAGTGAATAATGAATGTCAGTGCCAGTGTTGGTGTGGCCTTAGGTTATTCTTTCCAGGTGTT GTATCTGATGCTGGCCTCATATGTTTGTCTCACTTGGCTGACTAGAAAATCCAAGAGAGGTCCCAAGCAA CTAGCCCCTCACCCCAACATCATCCGGGTTCTCCGCGCCTTCACCTCTTCCGTGCCGCTGCTGCCAGGGG CCCTGGTCGACTACCCTGATGTGCTGCCCTCACGCCTCCACCCTGAAGGCCTGGGCCATGGCCGGACGCT GTTCCTCGTTATGAAGAAGTAAGTGACAGCAGCGCGGCAGGGCCTGGAGCTGATACATCTCCCAAGGGGA GCTGGTTCCTGCCCTCCATGTGCACCTTGATCAGGGGGTTTTGGAGAACAGGGTCATCACCCTTCCGGAG AAGAAAGCCATGCAAAGGGAACATATCTGCCCTGGAGAGCATTTTCCCTGTAGGACGATTTTTCATGGAA ACAAACTCTCATCTTCATCCAGAACATACTTGTCACCTAGTCCTTTTGGTCCATTTGACTGTTAACCTTT TCTGTGGCTGGACTTATCTGTTTTTAACATAAAAACCGTTCTCCTTCCTCACCCTCTGTATCCCCTAACT TTGCTATAGTGGGTATTTTATTTTAAGGAAATAATTATCTGCACCATTACTTTGAATATAGGGAGCCCCA ACTCTTACTTCCTAATTTGAGGATGGTGAGTGGGAGGGAACAGAAAGGATGCTGGGGAAAAGTGGGAATC AAAGTGCTCCTGGAAGGGGAAGAGGAACGGCCTAACCCTAACAGTGATTAAGGTTATTAGGAGGCCGGGA ATGGTGGCTGACGCCTGTAATCCCAGCACTTTGGAAGGCGGAGGTGGGTAGATCACTTGAGGTCAGGAGT TTGAGACCAGCCTGGCCAACATGATGAAACCCTGTATCTACTAAACATACAAAAATTAGCCTGGTGTGGT GGCGGGCACCTATAATCCCAGCTACTCGGGAGGCTGAGGTAGGAGAATTGCTTGAACCTGGAAGGTGGAG GTTGCAGTGAGCCAAGATCGTGCTACTGCACTCCAGCTTGGCGACAGAGTGAGACTCCATCTCAAAAAAA AAAAAAAAAACGTATTGGGAGTCGTCGATGTGTGGTAGCCAGAGGCCCTCTCCCCTCTCCGCCAGCTATC CCTGTACCCTGCGCCAGTACCTTTGTGTGAACACACCCAGCCCCCGCCTCGCCGCCATGATGCTGCTGCA GCTGCTGGAAGGCGTGGACCATCTGGTTCAACAGGGCATCGCGCACAGAGACCTGAAATCCGACAACATC CTTGTGGAGCTGGACCCAGGTAGGAACCTGCTGCACCATCAGAGCTCTCCAGGGGCACTAGAGGGTGGGT CAGGAGCATTTAGGACTGACTCTTCAGGTCCTCTCTGGTTTTGTGTTCTAAGTCATGTCTTTATTTAGCT CCGCACACAAGAGGTTAGCAATCTCTCCCTTAGAACGGGGTTTTTTTTTCTCTCTTTGCAGAGAGACAGC ACTTCCCAAGTTCCTTTCTCTAGCCCACTTAAAGAACAAGGACCTCAGTGCTGCAAGTTTTCCTAGGTAA ATAAAGAGGCCCGGCACAGTGGCTGACACCTGTAATCCCAACACTTTGGGAGGCTTGCTTGAGGATTGCT TGAGGCCAGCAGTTTGAGACCAGCCTGGGAAACAGAGTGAGATCCCTTCTCTACAAAAAAAAAATATGTT TTAAATTAGCCGGAAAAAAAGTTAGCCAGGCATGGTGGCATGCACCTGTAGCCCCAGCTACTTGGGAGGC TGAGGTCGGAGGATCACTTGAGCCTAGGAGTTAAGAGTCTGCAATGAGCTATGAATGTGCCACTGTACTC CAGCCTGGGCAGCAGAGTGAGATCCTATCTCAAAAAAATAATAAAAAATAATAAAGTAAAAGAGAAGTAG ACTTTAGCTCATTATAAAAAATAACTTTCGGCCGGGGGTAGTGGCTCACGCCTGTAATCCCAGCACTTTG AGAGGCTGAGGCGGGCAGATCATGAGGTCAAGAGATCGAGACCATCCTGGCCAACATGGTGAAACCCCAT CTCTACTAAAAATACAAAAATTAGCTGGGCATGGTGGCGGGCGCCTGTAATCTCAGCTACTTGGGAAGCT GAGGCAGGAGAATTGCTTGAACCCGGGAGGCGGAGGTTGTAGTGAGCCGAGATTGTGCCACTGCATTCCA GCCTGGCGACAGAGTGAGAGTCCATCTCAAAATAAATTAATTAATTAATTAAATTTTTCAAACAATGAAA GCTGTCCAAATGTAAGCCCAGTTGCCTCTGGAAATGAGTTGCCTACCACTGGAAGCATTCAAGTAGAAGC TGAATGGCCACTTGCCTAGGAAAATTGTAAGGAGATTCATACATCTGATAAATTTTTGAACTAGAAGATT TAAAATAATTGACTAGAGGAACTGGCTTTTATTATTTCTTTATTTATTTACTTATTTATTTATTTATTTA TTTGAGACAGAATCTTGCTTTGTTGCCCAGGCTGGAGTGCAGTGGCGCCATCTCAGCTCACTGCAACCTC TGCTTCCCAGGTTCAAGCAATTCTCGTGCCTCCGCCTCCTGAGTAGCTAGGATTACAGGCAGGTGCCACC ACGCCTAGCTAATTTTTGATTTTTTTTTTTTTTTTTTTTTGAGACTGAGTCTCGCTCTGTCACCCAGGCT GGAGTGCAATGGCATGATCTCGGCTCACTGCAAGCTCTACCTCCCAGGTTCACACCATTCTCCTGCTTCA GCCTCCCGAGTAGCTGGGACTACAGGCGCCCACCACCACGCCCGGCTAATTTTTTTGTATTTTTAGTACA GACGGGGTTTCACCATGTTAGCCAGGATGGTCTCTATCTCCTGACTTCGTGATCCACCTGCTTCGGCCTC CCAAAGCGCTGGGATTATTATTTTTAGTGGAGATGGGGTTTCACTGTTGGCCAGGCTGGTCTCAAACTCC TGGCCTCAAGTGATCCACCCACCTCAGCCTCCAAAATGTTGGGATTACAGGTATGAGCCACCACGCCCAG CTGGGAGTTGGCTTTTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCACTCTGTCGCCCAGGCTGGAGTG CAGTGGCATGATCTTGGCTCACTGCAACCCCTGCCTATCGGGTTCAAGCAATTCTCCCGCCTCAGCCTCC TGAGTAGCTGGGATTACAGGCACACGCCACTACACCTGGCTGATTTTTGTATTTTTAGTAGAGACGGGGT TTCACCATGTTGGTCAGGCTGATCTGGAACTCCTGACTTCGTGATCCGCCTGCCTCGGCCTCCCAAAGTT CTGGGATTACAGGCGTGAGCCACTGCGCCCAGCCAGGAACTGGCTTTTTAAAGGAATTTTGTGTGGACCC TTTTACAAATAACCAATTCTTTTTTTATTTTTTCTGAGACAGAGTCTCGCTGTGTTGCTCAGGCTGAAGT GATTCTCCTGTCTCAGCCTCCTTCACCTCCCAGGTTCAAGTAATTCTCCTGTCTCAGCCTCCCGAGTAGC TGGGATACAGGCACACACCACCATGCCTGGCTAGTTTTTTTGTATTTTTAGTAGAGACAGGGGTTTCACC ATGTTGGCCAGGCTGGTCTTGAACTCCTGACCTCAGGTGATCTACCCGCCTCGGCCTCCCAAAATGTTGG CATTACAGGCCACCACACCCGGCAATAACTGATTCTTAATGCACCTGGTTCTTAGGTTTGGATTTGGGGT TTCAAATTCAAATCAAAGTCTCCTGGGGTATAAGGGCCCTTGGAGATCATTTGAACCAAGCTCTAGCTCC

TTTGGTCTTGGGGACAGCTCCAATTACTAGAACATGATTTAAATTGAGCCACACAGTCCTTTGCCTGGGG ATTTTGCAGCCTGTACTTACTGGAGGCATTTCCGTGTTCGCACAGCAGGCCCTTCTGATCAGCTCTCAGG CCTTGCTGACCTCCTGGGCCAACACTGAGCCATTAGCCCCTGTCAGCTATGTCTTGCTGGTGGCTTTAGT AGGGACATAGGAGGGCCTCTCAGAGGGAAGGAGGGGAGGAGAAATGGTCACTTTGCTTGCTCCTTCCCAG ACGGCTGCCCCTGGCTGGTGATCGCAGATTTTGGCTGCTGCCTGGCTGATGAGAGCATCGGCCTGCAGTT GCCCTTCAGCAGCTGGTACGTGGATCGGGGCGGAAACGGCTGTCTGATGGCCCCAGAGGTGAGTCCCGAG TGTGTCATGCGCCATCGGCAGCCCTTCCCCCACATGTCCACTGAATGCAGGAGACTCGATGCCTTGTGAT AACCCAACACCTCCATCTTTTCTGACCCATAATTTGGCACAAGTTCCTTCCCTGCCACTTTGCTTTCCTC CGGCGTTCCCTCATGTTCCAGGAGAATGCAAGTCCTGTCACATAAACCAGGTGGTCTAAGCAGACCCCTT CTGGGTCTGAGCCACAGCTCACTCAAGCTCTGGGTTCCTTGGGACAGAGTTCAGATTAGCCCATGGATCA GGTGATGTGCAGGACATGAAAAGGTTAGATGGGCGGGCAGCGTGATGTCTCACCCACTGCTTCTGAGCAG GTGTCCACGGCCCGTCCTGGCCCCAGGGCAGTGATTGACTACAGCAAGGCTGATGCCTGGGCAGTGGGAG CCATCGCCTATGAAATCTTCGGGCTTGTCAATCCCTTCTACGGCCAGGGCAAGGCCCACCTTGAAAGCCG CAGCTACCAAGAGGCTCAGCTACCTGCACTGCCCGAGTCAGTGCCTCCAGACGTGAGACAGTTGGTGAGG GCACTGCTCCAGCGAGAGGCCAGCAAGGTGAGGCTGTCCCCGGCTTCGAGGGGACGGTGTGGGTAGAAAC CTCTGTTCTCGTTCCAGAGTGAAGGTCAGGTTTGGGCCAGAGCCACAGTGACAGATCCTCTGTGTTAGGA AGGTAAAGGCTAGTTACAAGAGAACAAAAAACAGATTTTAATGTAGGTAGGAGTAGGAGCACTAGCCACC ACAGCATAGTCAGAATCCTAGCAGTTCAACTCCTGTGGCTTTTTTAGTTGCTGAAAAAGTTGTTCAGAGG CCAGACACGGTGGCTTACACCTATAATCACAGCACTTTGGGAGGCTGAGGCGGGTGGATCACTTGGGACC AGGAGTTCCAGTCCAGCCTGGCCAACATGGTAAAACCCCGTCTGTACTAAAAATACAAAAATTAGCTGAG TGTCGTGGCACACGCCTGTAATTCCAGGTACTCGGGAGGCTGAAGCAGGAGAATCGTTTGAACCCTGGAG GCAGAGGTTGCAGTGAGCCAAGATCGCGCCACTGCACTCTGGCCTAGGTGACAGTGCAAGTCTTTGTCTC AAAAAAAAAAAAAAAAAAAAAAGGCTATTCAGAGAGAGAAAAGGAGGCATTTTTGAGAAATGTTTAATGG AGATGTAGCTCATGGAAGCAGCTGAGAACTGATCAGAGAGAGATGGAAAACATCTCCTGAGAGCAGATCT GGACATTGTGAAATTAATATAAAGGAATGCAAAGGCAGACCTATCCGAAGCCATAATTGGAGTGGCAGCT GGCTCAGGGGCAGGCTTAGTGCAAAGAGCTGAGCCATACCTGCACCCCAGCACTGTTCTGCCACTCCGTT AACTGCTCTCTGTACGTGGCCTGCTATCTTGGTGCGCAGTGAAGGTTAGAACAACAGCTGCAACCAGTTA TGAAATGATAGAGGAGACTACTTACCTGGTTCAAGGGACCAGATAGCTGTGCACAAGAGGCACTAGGCTT TCCACCCAGGGGGAAAGGCTATTTCAACAATGCATGCTGCCCCATGCAGAGGTGTACACATGGAAAAGCT TGGAGCACGGGCAGGGGACAGGCAGTATTTGTCACCTGAGTGAAGGGCATCAGTAGGAGATAGGGTAGAG GAAGAATTGGGTTGGGACCAGAGAAGGGAAGACCCTCACTAACAAAGCAGGCTTTGGGTTGAGACTGTGT TAACAGATGTTCTAGCTACAGCTTCCCTTCCTGTTGCAGAGACCATCTGCCCGAGTAGCCGCAAATGTGC TTCATCTAAGCCTCTGGGGTGAACATATTCTAGCCCTGAAGAATCTGAAGTTAGACAAGATGGTTGGCTG GCTCCTCCAACAATCGGCCGCCACTTTGTTGGCCAACAGGCTCACAGAGAAGTGTTGTGTGGAAACAAAA ATGAAGATGCTCTTTCTGGCTAACCTGGAGTGTGAAACGCTCTGCCAGGCAGCCCTCCTCCTCTGCTCAT GGAGGGCAGCCCTGTGATGTCCCTGCATGGAGCTGGTGAATTACTAAAAGAACATGGCATCCTCTGTGTC GTGATGGTCTGTGAATGGTGAGGGTGGGAGTCAGGAGACAAGACAGCGCAGAGAGGGCTGGTTAGCCGGA AAAGGCCTCGGGCTTGGCAAATGGAAGAACTTGAGTGAGAGTTCAGTCTGCAGTCCTCTGCTCACAGACA TCTGAAAAGTGAATGGCCAAGCTGGTCTAGTAGATGAGGCTGGACTGAGGAGGGGTAGGCCTGCATCCAC AGAGAGGATCCAGGCCAAGGCACTGGCTGTCAGTGGCAGAGTTTGGCTGTGACCTTTGCCCCTAACACGA GGAACTCGTTTGAAGGGGGCAGCGTAGCATGTCTGATTTGCCACCTGGATGAAGGCAGACATCAACATGG GTCAGCACGTTCAGTTACGGGAGTGGGAAATTACATGAGGCCTGGGCCTCTGCGTTCCCAAGCTGTGCGT TCTGGACCAGCTACTGAATTATTAATCTCACTTAGCGAAAGTGACGGATGAGCAGTAAGTAAGTAAGTGT GGGGATTTAAACTTGAGGGTTTCCCTCCTGACTAGCCTCTCTTACAGGAATTGTGAAATATTAAATGCAA ATTTACAACTGCAGATGACGTATGTGCCTTGAACTGAATATTTGGCTTTAAGAATGATTCTTATACTCTG AAGGTGAGAATATTTTGTGGGCAGGTATCAACATTGGGGAAGAGATTTCATGTCTAACTAACTAACTTTA TACATGATTTTTAGGAAGCTATTGCCTAAATCAGCGTCAACATGCAGTAAAGGTTGTCTTCAACTGA

Protein sequence
>gi|14165272|ref|NP_115785.1| serine/threonine-protein kinase PINK1, mitochondrial precursor [Homo sapiens]
MAVRQALGRGLQLGRALLLRFTGKPGRAYGLGRPGPAAGCVRGERPGWAAGPGAEPRRVGLGLPNRLRFF RQSVAGLAARLQRQFVVRAWGCAGPCGRAVFLAFGLGLGLIEEKQAESRRAVSACQEIQAIFTQKSKPGP DPLDTRRLQGFRLEEYLIGQSIGKGCSAAVYEATMPTLPQNLEVTKSTGLLPGRGPGTSAPGEGQERAPG APAFPLAIKMMWNISAGSSSEAILNTMSQELVPASRVALAGEYGAVTYRKSKRGPKQLAPHPNIIRVLRA FTSSVPLLPGALVDYPDVLPSRLHPEGLGHGRTLFLVMKNYPCTLRQYLCVNTPSPRLAAMMLLQLLEGV DHLVQQGIAHRDLKSDNILVELDPDGCPWLVIADFGCCLADESIGLQLPFSSWYVDRGGNGCLMAPEVST ARPGPRAVIDYSKADAWAVGAIAYEIFGLVNPFYGQGKAHLESRSYQEAQLPALPESVPPDVRQLVRALL QREASKRPSARVAANVLHLSLWGEHILALKNLKLDKMVGWLLQQSAATLLANRLTEKCCVETKMKMLFLA NLECETLCQAALLLCSWRAAL

Gene Regulatory Network: Its uses and advantages


A gene regulatory network or genetic regulatory network (GRN) is a collection of DNA segments in a cell which interact with each other indirectly (through their RNA and protein expression products) and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA. In single-celled organisms, regulatory networks respond to the external environment, optimising the cell at a given time for survival in this environment. Thus a yeast cell, finding itself in a sugar solution, will turn on genes to make enzymes that process the sugar to alcohol. In multicellular animals the same principle has been put in the service of gene cascades that control bodyshape.[2] Each time a cell divides, two cells result which, although they contain the same genome in full, can differ in which genes are turned on and making proteins. Gene regulation is a general name for a number of sequential processes, the most well known and understood being transcription and translation, which control the level of a genes expression, and ultimately result with specic quantity of a target protein. A gene regulation system consists of genes, cis-elements, and regulators. The regulators are most often proteins, called transcription factors, but small molecules, like RNAs and metabolites, sometimes also participate in the overall regulation. The interactions and binding of regulators to cis-elements in the cis-region of genes controls the level of gene expression during transcription. The cis-regions serve to aggregate the input signals, mediated by the regulators, and thereby eect a very specic gene expression signal. The genes, regulators, and the regulatory connections between them, together with an interpretation scheme form gene networks.Depending on the degree of abstraction and availability of empirical data, there are dierent levels of modeling of gene networks. One of the goals of systems biology is to elucidate functionally relevant regulatory interactions[1,2]. Since changes in gene expression are in part determined by such interactions between regulators and their target genes, genomewide expression data can be effectively used to impute regulatory transcriptional networks. Networks are ubiquitous in biology (genes, proteins, post-translational, metabolic, cellular, organisimal, social, etc). Knowing both the connectivity and dynamics of a network are important to understanding the functions and

constraints of the system. Attempts to tackle regulatory networks have come from experimental techniques (expression profiling, motif discovery, ChIP-whatever, and RNAi) and computational techniques. Computational approaches concentrate on a particular level of focus be it the high level connectivity (gene networks) or the more detailed stochastic and thermodynamic molecular models. Gene networks are directed graphs where the edges do not necessarily represent direct interactions. The goal is to capture interactions among genes, the connectivity. The canonical examples include the Lee et. al. and Harbison papers in S. cerevisiae. The purpose of a developmental GRN is not to provide biochemical insights into how the cell biological functions that actually effect differentiation and development work, but rather to explain why these functions happen when and where they do, how their appropriate execution is organized. The GRN is couched in the sequence language of the genomic regulatory code.It is an immensely potent tool for understanding development. It makes it possible to related developmental processes to the genomic regulatory sequence at whatever level of biological organization these processes are perceived. Intervention at the GRN level offers the only general, canonical approach to experimental alteration of the developmental process. Temporal changes in the amplitude of expression and spatial changes in the locations of the regulatory states controlled by the underlying GRN will cause predictable temporal and spatial changes in cell fate and function. Evolutionary change in body plans can be viewed as just such a process. Changes in body plans which have occurred in consequence of changes in the genome that result in the reengineering of GRNs so as to alter their developmental output. Experimental reorganization of GRNs offer powerful new approaches to the genetic modification of plants and animals, which could have striking implications for agriculture, animal husbandry, and human health. Fundamental questions of development and evolution will become accessible when it becomes possible to carry out comparative analyses of GRNs.

You might also like