You are on page 1of 41

Computational Chemogenomics J.B.

Brown
Visit to download the full and correct content document:
https://textbookfull.com/product/computational-chemogenomics-j-b-brown/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

The Physics of Solids 1st Edition J.B. Ketterson

https://textbookfull.com/product/the-physics-of-solids-1st-
edition-j-b-ketterson/

How to End the Autism Epidemic J.B. Handley

https://textbookfull.com/product/how-to-end-the-autism-epidemic-
j-b-handley/

My Beautiful Monsters 02.0 - Monster Song 1st Edition


J.B. Trepagnier

https://textbookfull.com/product/my-beautiful-
monsters-02-0-monster-song-1st-edition-j-b-trepagnier/

My Beautiful Monsters 01.0 - Monster Whisperer 1st


Edition J.B. Trepagnier

https://textbookfull.com/product/my-beautiful-
monsters-01-0-monster-whisperer-1st-edition-j-b-trepagnier/
System Brown

https://textbookfull.com/product/system-brown/

My Beautiful Monsters 03.0 - The Call of Monsters 1st


Edition J.B. Trepagnier

https://textbookfull.com/product/my-beautiful-monsters-03-0-the-
call-of-monsters-1st-edition-j-b-trepagnier/

The Tower of Pisa: History, Construction and


Geotechnical Stabilization First Edition J.B. Burland

https://textbookfull.com/product/the-tower-of-pisa-history-
construction-and-geotechnical-stabilization-first-edition-j-b-
burland/

Binary System Brown

https://textbookfull.com/product/binary-system-brown/

New Senior Mathematics Advanced for Years 11 12 Student


Book 3rd Edition J.B. Fitzpatrick

https://textbookfull.com/product/new-senior-mathematics-advanced-
for-years-11-12-student-book-3rd-edition-j-b-fitzpatrick/
Methods in
Molecular Biology 1825

J.B. Brown Editor

Computational
Chemogenomics
METHODS IN MOLECULAR BIOLOGY

Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK

For further volumes:


http://www.springer.com/series/7651
Computational Chemogenomics

Edited by

J.B. Brown
Life Science Informatics Research Unit, Laboratory of Molecular Biosciences,
Kyoto University Graduate School of Medicine, Kyoto, Japan
Editor
J.B. Brown
Life Science Informatics Research Unit
Laboratory of Molecular Biosciences
Kyoto University Graduate School of Medicine
Kyoto, Japan

ISSN 1064-3745 ISSN 1940-6029 (electronic)


Methods in Molecular Biology
ISBN 978-1-4939-8638-5 ISBN 978-1-4939-8639-2 (eBook)
https://doi.org/10.1007/978-1-4939-8639-2
Library of Congress Control Number: 2018952357

© Springer Science+Business Media, LLC, part of Springer Nature 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations
and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.
The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Cover Illustration: The cover image shows the inhibitory activity of compounds against aromatase, a critical hormone-
processing enzyme in many organisms. Each point represents one compound. Green and yellow colors indicate highly
weak or micromolar activity, red points represent strong activity, and purple points indicate single-digit nanomolar
activity or stronger. Compounds are positioned by relative distance using multi-dimensional scaling. Activity cliffs can be
seen where large changes in activity occur between closely spaced compounds, which are often analogs.

This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC part of
Springer Nature.
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Preface

This book provides a collection of techniques used in the emerging field of computational
chemogenomics. It covers practical processes to execute research and analyses in the field,
which is an integration of chemoinformatics, bioinformatics, computer science, statistics,
automated pattern recognition and modeling, database usage with data retrieval, and
systems integration. Clearly, to master the field of computational chemogenomics requires
a considerable variety of knowledge and data processing skills, and this text hopes to get the
interested reader acquainted with and capable of many of the practical skills used in the field.
The target audience is both those from experimental sciences who are novices to data
processing and modeling, and those with computationally oriented backgrounds wishing
to engage in this scientific area, which is continually growing and now expected to contrib-
ute to industry, academic, and government research projects.
Historically, testing for chemical effects on biological processes, whether at the level of
organism response, organ response (e.g., organ toxicity), cellular response (e.g., apoptosis),
or individual target protein response in cell lines (e.g., inhibition), has required a large and
orchestrated effort; confirmation of chemical purity, preparation of chemicals at a span of
concentrations, application of those concentration-specific chemical stocks to the process or
target, and precise recording of the outcome have typically been executed and recorded
manually. At the same time, methods in genetic manipulation, gene sequence determina-
tion, gene expression measurement, and protein expression measurement have similarly
required substantial investments in human resources and facilities.
The development of specialized equipment for automated high-content and high-
throughput screening as well as parallel automation developments in genetics and proteo-
mics made it possible to have chemical activity data for thousands of compounds instead of
hundreds, as well as to expand measurement of gene expression from a few genes to tens,
hundreds, or thousands. As a result, the technologies needed to systematically unlock the
interface between chemistry and biology on a large scale had arrived. Finally, in 2001,
worldwide efforts to create the first draft version of the full human genome were completed,
and with such in hand, the stage was set to integrate the technologies for chemistry-biology
interface exploration with our newfound knowledge about the genetic underpinnings of
human physiology.
Only months after the sequencing of the human genome, the idea of exploring the
protein products of a genome from a chemical perspective was proposed, and the term
chemogenomics was born. This term bears resemblance to two other chemically driven
scientific fields, and the reader should be aware of differences in terminology. First, scientists
are also often in need of knowing the effect of a chemical on an organism when that
organism contains a genetic defect such as a mutation or complete knockout, and this
field is known as chemical genetics. Second, scientists may want to understand the functional
impact of chemicals on coordinated processes occurring within cells encoded by genomes,
for example the multiprotein signaling response to toxic chemicals measured in a variety of
organisms. This field, chemical genomics, is more concerned with chemistry and genomics
at a systems science level, compared with the chemogenomic focus of chemical modulation
of individual proteins.

v
vi Preface

While research and development based on chemogenomics can be pursued in a variety of


ways to ultimately reach project goals, two fundamental directions exist. First is the idea of
forward chemogenomics. Much like the idea of forward genetics to identify the genes
responsible for a phenotype or disease, forward chemogenomics seeks to identify a set of
protein targets to test for chemical modulation in a biological system. Second, reverse
chemogenomics is concerned with the identification of compounds which achieve the
modulation desired by exerting their effect on the targets identified in the forward chemo-
genomic analysis. How to achieve these two goals is dictated by the state of the art in
experimental methods for chemical and molecular biology research.
While advances in automation to enable chemogenomic-based science were being
made, advances were simultaneously being made in computing and computer science-
related fields. The CPUs used in workstations and servers were undergoing redesign to
support multiple CPU cores, and operating systems and compilers designed to support
multicore and multithread programming paved the way to program execution speed-ups of
many fold. A key application area for the expanded power was statistics. Where analyses
based on large amounts of repeated subsampling or expansive numbers of hypotheses were
once prohibitive, they became mainstream and new methods for meta-analyses of results
derived from basic statistical procedures gained attention. Leveraging statistical theory and
advances in computing was the field of statistical pattern recognition, now commonly
referred to as machine learning or artificial intelligence. Algorithms capable of modeling
the patterns found in large, nonlinear datasets were shown to have extraordinary versatility,
with applications in not only chemical and biological sciences but also physical sciences such
as geology and meteorology, and applications in fields outside of natural science such as
finance and music.
Hence, science arrived at a new frontier, with the vast quantities of data from automa-
tion used to inspire and give rise to chemogenomics, yet with requirements to develop the
computing methods and infrastructure needed to harvest chemogenomic experimental
results. The computational analyses should make the experimental results intelligible and
should result in further hypotheses about living systems that could be validated. Born has
been the field of computational chemogenomics. Interestingly, though chemogenomics has
been driven by high-throughput methods and their computational analyses, accumulated
efforts over several decades for structural biology have also contributed large numbers of
publicly available three-dimensional crystal structures co-represented by the interaction of
compounds with proteins; these now number in the tens of thousands, making structural
computational chemogenomics a valuable option in the practitioner’s toolbox.
Despite its relatively short history, the impact of computational chemogenomics is
already considerably well established. Models for compound-protein interaction in drug
discovery are a prominent application, as their ability to predict the interaction of a com-
pound on a panel of targets has large implications for safety profiling, drug lead selection and
optimization, and side effect predictions. A highly related application is chemical toxicity
screening, which is concerned with chemical dose tolerance or dose lethality, and may
incorporate target panel predictions as information to explain toxicity. The field of drug
repurposing leverages chemogenomics and computational chemogenomics to suggest new
targets for existing and often clinically approved drugs, which then might be applicable to
new clinical indications. Still even further, computational chemogenomic methods may
contribute to agrochemical sciences, where the organisms and their genomes under study
are plants rather than animals. The concept of mining a chemical-protein activity matrix for
knowledge discovery and hypothesis generation in agricultural life science is identical.
Preface vii

This volume on methods in computational chemogenomics is organized in a way that


can be navigated by the reader in any order they wish. The first major unit covers the
presentation of public chemogenomic data resources, where Nanjin et al. introduce how to
use six different chemogenomics databases that each contain different focal points, and Kim
et al. present a comprehensive in-depth tutorial on using the PubChem database, arguably
the world’s largest public chemogenomics information resource. In the second unit, the
fundamentals of chemoinformatics, bioinformatics, and chemogenomic data processing are
covered. In keeping with the discussion above on the importance of statistics, this unit
contains a step-by-step tutorial on processing high-dimensional chemoinformatic data for
basic statistical information and correlation in computer representation of compounds. The
third unit is focused on techniques to analyze specific proteins or compounds based on their
structures. Da Silva and Rognan present a robust workflow for analyzing protein surfaces
when structural data is available, Song and Zhang demonstrate how to use resources
dedicated to the cataloging and understanding of allosteric binding, Dimova and Bajorath
detail methods for looking at the diversity of chemical structures in a large chemogenomic
dataset, and Hu and Bajorath give the steps necessary to derive analyses indicating how small
changes in scaffold decoration correlate to changes in panels of targets. In the fourth unit,
statistical pattern recognition techniques are the focus. Yamanishi provides the reader the
fundamental methods and knowledge needed for building custom methods of compound-
protein matrix modeling, and Reker and Brown extensively detail the implementation of a
new technique used for identifying points in the ligand-target matrix that result in predictive
protein family models. The fifth and final unit is concerned with the future of chemoge-
nomics and its application to medical care. Kou et al. describe their implementation of a
clinical platform to analyze patient genomes and select chemical therapies based on the
protein products of potentially altered genes. Jacoby and Brown conclude by discussing
what computational chemogenomics has done so far, and what directions is it likely to
pursue going forward.
This book is the culmination of many individuals dedicating their time and efforts
toward its completion. I wish to express heartfelt thanks to all of the contributing authors,
who sacrificed their limited time to describe their protocols in detail. Without their efforts,
this book would not be possible. Continuous support by Springer to guide the completion
of the book and handle unexpected situations during its development was key, with special
thanks to series editors John Walker and Patrick Marton, and coordination by Anna
Rakovsky. I also wish to thank colleagues at the Kyoto University Graduate School of
Medicine and Kyoto University Hospital who have pushed me to new levels in order to
perform chemogenomic research that is not only computationally attractive but equally
helpful in translational research. A very special acknowledgement goes to Professor
Dr. Jürgen Bajorath of the University of Bonn, who provided essential ideas and advice
that played a major role in shaping the organization of the text. I also wish to thank Prof.
Dr. Gisbert Schneider, Dr. Anthony Nicholls, Prof. Dr. Shunichi Takeda, and Prof.
Dr. Yasushi Okuno for the various wisdoms that they imparted on me over the years of
my career. Finally, my most sincere thanks goes to my wife, who accepted uncountable days
and nights of canceled plans in order to allow me to concentrate on the completion of this
text, as well as my children and my family, for with their understanding and support I draw
motivation to push my scientific endeavors to new heights that can benefit society.

Kyoto, Japan J.B. Brown


Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

PART I DATA RESOURCES FOR COMPUTATIONAL CHEMOGENOMICS


1 A Survey of Web-Based Chemogenomic Data Resources. . . . . . . . . . . . . . . . . . . . . 3
Rasel Al Mahmud, Rifat Ara Najnin, and Ahsan Habib Polash
2 Finding Potential Multitarget Ligands Using PubChem . . . . . . . . . . . . . . . . . . . . . 63
Sunghwan Kim, Benjamin A. Shoemaker, Evan E. Bolton,
and Stephen H. Bryant

PART II FUNDAMENTAL DATA PROCESSING

3 Fundamental Bioinformatic and Chemoinformatic Data Processing . . . . . . . . . . . 95


J.B. Brown
4 Parsing Compound–Protein Bioactivity Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
J.B. Brown
5 Impact of Molecular Descriptors on Computational Models . . . . . . . . . . . . . . . . . 171
Francesca Grisoni, Viviana Consonni, and Roberto Todeschini
6 Physicochemical Property Labels as Molecular Descriptors for Improved
Analysis of Compound–Protein and Compound–Compound Networks . . . . . . . 211
Masaaki Kotera
7 Core Statistical Methods for Chemogenomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Christin Rakers

PART III STRUCTURAL ANALYSIS METHODS IN 2D AND 3D

8 Structure-Based Detection of Orthosteric and Allosteric Pockets


at Protein–Protein Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Franck Da Silva and Didier Rognan
9 Single Binding Pockets Versus Allosteric Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Kun Song and Jian Zhang
10 Mapping Biological Activities to Different Types of Molecular
Scaffolds: Exemplary Application to Protein Kinase Inhibitors . . . . . . . . . . . . . . . . 327
Dilyana Dimova and Jürgen Bajorath
11 SAR Matrix Method for Large-Scale Analysis of Compound
Structure–Activity Relationships and Exploration
of Multitarget Activity Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Ye Hu and Jürgen Bajorath

ix
x Contents

PART IV STATISTICAL PATTERN RECOGNITION

12 Linear and Kernel Model Construction Methods for Predicting


Drug–Target Interactions in a Chemogenomic Framework . . . . . . . . . . . . . . . . . . 355
Yoshihiro Yamanishi
13 Selection of Informative Examples in Chemogenomic Datasets . . . . . . . . . . . . . . . 369
Daniel Reker and J.B. Brown

PART V EMERGING TOPICS

14 A Platform for Comprehensive Genomic Profiling in Human


Cancers and Pharmacogenomics Therapy Selection . . . . . . . . . . . . . . . . . . . . . . . . . 413
Tadayuki Kou, Masashi Kanai, Mayumi Kamada, Masahiko Nakatsui,
Shigemi Matsumoto, Yasushi Okuno, and Manabu Muto
15 The Future of Computational Chemogenomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Edgar Jacoby and J.B. Brown

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Contributors

RASEL AL MAHMUD  Department of Radiation Genetics, Graduate School of Medicine, Kyoto


University, Kyoto, Japan
JÜRGEN BAJORATH  Department of Life Science Informatics, B-IT, LIMES Program Unit
Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universit€ at,
Bonn, Germany
EVAN E. BOLTON  Department of Health and Human Services, National Center for
Biotechnology Information, National Library of Medicine, National Institutes of Health,
Bethesda, MD, USA
J.B. BROWN  Life Science Informatics Research Unit, Laboratory of Molecular Biosciences,
Kyoto University Graduate School of Medicine, Kyoto, Japan
STEPHEN H. BRYANT  Department of Health and Human Services, National Center for
Biotechnology Information, National Library of Medicine, National Institutes of Health,
Bethesda, MD, USA
VIVIANA CONSONNI  Department of Earth and Environmental Sciences, Milano
Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
FRANCK DA SILVA  CNRS, LIT UMR 7200, Université de Strasbourg, Strasbourg, France
DILYANA DIMOVA  Department of Life Science Informatics, B-IT, LIMES Program Unit
Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universit€ at,
Bonn, Germany
FRANCESCA GRISONI  Department of Earth and Environmental Sciences, Milano
Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
YE HU  Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical
Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universit€a t, Bonn,
Germany
EDGAR JACOBY  Janssen Research & Development, Beerse, Belgium
MAYUMI KAMADA  Department of Biomedical Data Intelligence, Graduate School of
Medicine, Kyoto University, Kyoto, Japan
MASASHI KANAI  Department of Therapeutic Oncology, Graduate School of Medicine, Kyoto
University, Kyoto, Japan
SUNGHWAN KIM  Department of Health and Human Services, National Center for
Biotechnology Information, National Library of Medicine, National Institutes of Health,
Bethesda, MD, USA
MASAAKI KOTERA  Department of Chemical System Engineering, School of Engineering, The
University of Tokyo, Tokyo, Japan
TADAYUKI KOU  Department of Therapeutic Oncology, Graduate School of Medicine, Kyoto
University, Kyoto, Japan
SHIGEMI MATSUMOTO  Department of Therapeutic Oncology, Graduate School of Medicine,
Kyoto University, Kyoto, Japan
MANABU MUTO  Department of Therapeutic Oncology, Graduate School of Medicine, Kyoto
University, Kyoto, Japan
RIFAT ARA NAJNIN  Department of Radiation Genetics, Graduate School of Medicine, Kyoto
University, Kyoto, Japan

xi
xii Contributors

MASAHIKO NAKATSUI  Department of Biomedical Data Intelligence, Graduate School


of Medicine, Kyoto University, Kyoto, Japan
YASUSHI OKUNO  Department of Biomedical Data Intelligence, Graduate School
of Medicine, Kyoto University, Kyoto, Japan
AHSAN HABIB POLASH  Department of Radiation Genetics, Graduate School of Medicine,
Kyoto University, Kyoto, Japan
CHRISTIN RAKERS  Graduate School of Pharmaceutical Sciences, Yoshida-shimoadachicho,
Kyoto University, Sakyo-ku, Kyoto, Japan; Graduate School of Science Nagoya University,
Nagoya, Japan
DANIEL REKER  Koch Institute for Integrative Cancer Research, Massachusetts Institute
of Technology, Cambridge, MA, USA
DIDIER ROGNAN  CNRS, LIT UMR 7200, Université de Strasbourg, Strasbourg, France
BENJAMIN A. SHOEMAKER  Department of Health and Human Services, National Center for
Biotechnology Information, National Library of Medicine, National Institutes of Health,
Bethesda, MD, USA
KUN SONG  Department of Pathophysiology, Key Laboratory of Cell Differentiation and
Apoptosis of Ministry of Education, Shanghai Jiao-Tong University School of Medicine,
Shanghai, China
ROBERTO TODESCHINI  Department of Earth and Environmental Sciences, Milano
Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
YOSHIHIRO YAMANISHI  Department of Bioscience and Bioinformatics, Faculty of Computer
Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan;
PRESTO, Japan Science and Technology Agency, Kawaguchi, Saitama, Japan
JIAN ZHANG  Department of Pathophysiology, Key Laboratory of Cell Differentiation and
Apoptosis of Ministry of Education, Shanghai Jiao-Tong University School of Medicine,
Shanghai, China
Part I

Data Resources for Computational Chemogenomics


Chapter 1

A Survey of Web-Based Chemogenomic Data Resources


Rasel Al Mahmud, Rifat Ara Najnin, and Ahsan Habib Polash

Abstract
Chemogenomics is a comparatively nascent branch dealing with the effects of drugs and chemicals on
molecular level systems. With the emergence of this new epoch, the quantity of data sources is also
unprecedentedly increasing. Despite having a plethora of a databases, the variation in bioactivity measure-
ment as well as bias toward specific protein studies, varied computational procedures and redundant
information make data mining tedious, especially for newcomers in the field. In this chapter, we give an
overview of hands-on data collection and domains of applicability from some useful Web-based chemoge-
nomic resources that are accessible with nothing more than a Web browser. This overview can help assist
users in acquiring chemogenomic datasets for their project at hand.

Key words Chemogenomic resources, World Wide Web, Ligand-target data, ChemProt, STITCH,
PubChem, ChEMBL, ChEBI, ChemSpider, PharmGKB

1 Introduction

The number as well as the volume of bioactivity databases is grow-


ing larger than imagined, and scientists in different sectors, espe-
cially in biomedical arenas, realize that vast amounts of data stored
in these continually growing databases are generally difficult to be
subject to manual analysis and interpretation. For instance, if we
consider a single cell in a human body there are myriad factors
interacting with one another at different levels to orchestrate a
single biological effect in response to a particular native or foreign
molecule. The molecule could be a native protein, RNA, DNA, or
other cellular moiety. If foreign, it could be a chemical like a drug in
order to restore the normal physiological response of a cell in the
host organism, or it could be a toxin from pathogenic foreign
invaders such as bacteria or viruses, or it could even be an industrial
cytotoxic chemical inhaled through the respiratory system.

Rasel Al Mahmud, Rifat Ara Najnin, and Ahsan Habib Polash contributed equally to the chapter.

J.B. Brown (ed.), Computational Chemogenomics, Methods in Molecular Biology, vol. 1825,
https://doi.org/10.1007/978-1-4939-8639-2_1, © Springer Science+Business Media, LLC, part of Springer Nature 2018

3
4 Rasel Al Mahmud et al.

The discoveries as well as findings in biomedical sciences are


stored in different online services so that from the anywhere in the
world, a researcher in the relevant field could utilize the informa-
tion for further analysis and discoveries in efforts to combat a
disease and its underlying biological events. The rate of discovery
of molecular factors associated with particular diseases is paced with
the cooperation and contribution by these online databases. Not
only it is easy and convenient now more than ever to narrow the
potential molecular candidates for an illness in terms of money,
labor, and time, but also it is possible to use enhanced tools that
accompany each database.
Hence, it is prime time to navigate these databases for the
extraction of quality data using suitable tools as well as algorithms
to ascertain higher reliability and predictability in a mini-scale
investigation. It is also fitting to accelerate initial phases of hit to
lead for drug discovery protocols by leveraging these resources.
As a biological motivation for using chemogenomic resources,
let us consider the following biological aspects of DNA. DNA
(deoxyribonucleic acid) topology is defined by the intertwining
capacity of two complementary single strands to maintain a sustain-
able double helical structure. The configuration of the complemen-
tary strands immediately suggested a replication mechanism in
which each antiparallel strand (two strands moving in opposite
direction, 50 –30 and 30 –50 ) serves as a template for a daughter
strand. Beside the maintenance of an elegant semiconservative
replication model, DNA requires untwisting the double helix for
access to and expression of the information that is deposited in
it [1].
It is easy to conceptualize the “coiled-coil” nature of DNA by
considering a rope. The raw strands making up the rope are the raw
information in DNA. How the rope is twisted over itself (spatial
topology) to make it stronger is mirrored in DNA by a second layer
of coiling. Practically, one might consider a rope twisted over itself
to be strong enough for mooring a boat at a harbor. Nuclear DNA
is similarly packaged tightly like such a rope.
Topoisomerases are a family of proteins encoded in all living
beings that play key roles in DNA information fidelity and the
health of an organism. Topoisomerase II (TopII), a dimeric
enzyme, changes the topology of DNA and thus plays essential
roles in diverse DNA transactions such as replication, transcription,
chromosome condensation, and chromosome segregation.
Drugs that inhibit the action of TopII are broadly known as
TopII inhibitors/poisons [2]. Interestingly, topoisomerase inhibi-
tors form a trimer containing the topoisomerase, DNA, and the
inhibitor molecule. The inhibitor functions to block disassociation
of the complex. This mode of action is different from the classical
inhibitor molecule which binds to a functional site and inhibits the
function of the target (a dimer complex). Etoposide is one such
Web-Based Chemogenomic Resources 5

Fig. 1 Role of etoposide as DNA topoisomerase II inhibitor

TopII inhibitor highly often used in clinical practice which can lead
to abortive catalysis of the enzyme and generates an increased level
of TopII–DNA complex (Fig. 1). This abnormal complex structure
is known as a TopII adduct and persistence of this type of interme-
diate renders a lesion of the genome, impairing the DNA repair
pathway as well as gene expression, which ultimately can lead to
cancer and other diseases. Therefore, since etoposide and other Top
II inhibitors stall DNA synthesis and the cell cycle, etoposide is a
well-known chemotherapeutic agent for cancer patients.
Given a brief history of the developments in bioactivity data-
bases and a practical molecular biology context in which the data-
bases can be utilized, we provide in this chapter resources and
protocols for mining of data in several prominent and progressive
online databases with a special emphasis on examples useful for
chemogenomic research (consider Note 1).

2 Materials

Here, a selection of prominent databases for chemogenomic data


resources is introduced and overviews are given. The resources are
summarized with Web addresses and example applications in
Table 1.
6

Table 1
Computational chemogenomic data sources reviewed

Database World Wide Web Address Application Protocol subsection References


ChemProt potentia.cbs.dtu.dk/ChemProt Annotation and prediction of chemical–protein interaction 3.1 [3]
in silico association study of small molecules with diseases,
Rasel Al Mahmud et al.

chemicals, and molecular level


STITCH stitch.embl.de Displaying interaction network among proteins and small molecules 3.2 [4, 5]
in broad range
PubChem pubchem.ncbi.nlm.nih.gov Chemical structure, descriptor, and bioassay repository, including 3.3 [6–8]
links to relevant protein structure and gene information
ChEMBL www.ebi.ac.uk/chembl Chemical structure and bioassay information, including 3.4 [9–11]
automated curation of patent information
ChEBI www.ebi.ac.uk/chebi Biology-driven database with systematic manaul annotation based 3.5 [12]
on standardized ontologies
ChemSpider www.chemspider.com Convenient chemical structure search tool using common 3.6 [13, 14]
and systematic names, includes links to vendors, as well as
interactive spectra
PharmGKB www.pharmgkb.org Manually annotated knowledge base linking genetic variation, 3.7 [15, 16]
variation-specific therapy, and clinical information
Web-Based Chemogenomic Resources 7

2.1 ChemProt The conventional drug design paradigm, i.e., one drug selectively
interacts with one or two target molecules, has drastically changed
in recent times. Most of the drugs are now known to be involved in
multiple pathways with diverse interaction partners. To identify the
broad spectrum interactome of drugs and targets, an integrative
tool which could analyze the whole set of interactions on a single
platform has become a necessity. ChemProt 3 [3] is such a
Web-based disease-oriented chemical biology tool which can dis-
play multiple interactions of both chemical–protein and protein–-
protein on a single heatmap. By aggregating data from related
databases such as CheMBL, DrugBank, BindingDB, STITCH,
PharmGKB and IUPHAR, ChemProt can assist in the in silico
evaluation of small molecules (drugs, environmental chemicals,
and natural products) with the integration of molecular and cellular
level phenotypes. Moreover, it enables pharmacological space navi-
gation for small molecules based on a similarity ensemble approach
(SEA) [17] to relate protein pharmacology with respect to ligand
bioactivity profile. SEA organizes proteins by clustering them based
on their bioactivities with respect to a set of ligands, and can be
viewed in one sense as a chemical version of the well-known BLAST
approach for generating a score of protein homology.

2.2 STITCH Interaction patterns of proteins and small molecules are a pivotal
point for understanding metabolism, signaling, and development
of drugs. Although a myriad of data is stored in several databases
regarding chemical–protein or chemical–chemical interaction, their
discrete nature, varied precision (see above regarding protein bias
and measurement consistency) and focus make it cumbersome to
assemble a full picture of all available information. STITCH (stitch.
embl.de) is a consolidated search tool which aggregates high-
throughput experimental data, manually curated datasets, and the
results of several prediction methods into a single global network of
protein–protein and protein–chemical interactions (STITCH 4 and
STITCH 5) (STITCH does not include chemical–chemical inter-
action links).

2.3 PubChem PubChem is one of the prominent public databases with a special
emphasis on providing information about chemical substances
along with their specific compound structure as well as biological
activities for the scientific research community. This database com-
menced in 2004 as a public repository hosted by the National
Center for Biotechnology Information (NCBI), a research center
of the National Library of Medicine, which is part of the US
National Institutes of Health (NIH). Over the continued progres-
sive growth period of more than a decade by deposition of data
from worldwide researchers at academia, industry and government
agencies, the volume of the database has become massive. Thus, at
present PubChem comprises three component databases; though
8 Rasel Al Mahmud et al.

each is dedicated to a specific area, they are interlinked with respect


to contents to enable acceleration of further innovations and dis-
coveries. The component databases are next summarized.

2.3.1 PubChem BioAssay The PubChem BioAssay database contains bioactivity screens of
Database small-molecules and RNAi screening data. The bioactivities stored
in each bioassay are indexed by an assay ID (AID) serving as the
primary accession. At present it is a vital and highly comprehensive
information resource for biological screening results contributed by
the NIH Molecular Library Program, other public research orga-
nizations, and industrial companies to aid in drug discovery and
chemical biology research. It is integrated with all other databases
at the NCBI including PubMed, Protein, Gene, and so forth for a
unified approach to data exploration and discovery. Several recent
developments of PubChem BioAssay include the expansion of the
sources of bioactivity data, resynchronization of BioAssay record
page, addition of a new BioAssay classification browser (Fig. 2a), as
well as new features for its upload system to facilitate data sharing.
The database is equipped with many services to execute and display
analyses of bioactivity data from within a Web browser (Table 2).

Fig. 2 (a) PubChem BioAssay classification browser. (b) Snapshots of “limit search” and “advanced search”
interfaces both in PubChem Substance and PubChem Compound databases
Web-Based Chemogenomic Resources 9

Fig. 2 (continued)

2.3.2 PubChem The PubChem Substance database contains the storage of informa-
Substance and Compound tion provided by a depositor, thus a PubChem Substance sum-
Databases mary page is based on the data submitted by an individual
depositor. A depositor may include a pharmaceutical company, an
academic laboratory, or governmental research institute, to name a
few. The raw deposition of data is not subject to quality control or
review before public release. The data includes a chemical structure,
that is, the arrangement of atoms and bonds between atoms, and it
may include other packaging or delivery-related information, such
as the salt form of the substance that is used. In contrast, internally
reviewed chemical information is stored in PubChem Compound
to clarify substances in PubChem Substance. In addition, structures
are preclustered and cross-referenced by identity and similarity
groups in the PubChem Compound Database. In this compound
database, a compound summary page is dedicated to display data
organized by NCBI automated data processing, which in turn
serves as a hub of information for each unique chemical structure.
The primary identifiers for a substance and a compound are SID
and CID, respectively. A substance identifier (SID) is the
Table 2
10

A list of PubChem BioAssay services available as of writing

Database/Services World Wide Web Address/URL Application


BioAssay search www.ncbi.nlm.nih.gov/pcassay/ Enables users to search BioAssay database with Entrez
BioAssay record page pubchem.ncbi.nlm.nih.gov/bioassay/1485 This link enables users access to download a bioassay record
BioAssay advanced and www.ncbi.nlm.nih.gov/pcassay/advanced An interface for searching multiple search fields
limit search www.ncbi.nlm.nih.gov/pcassay/limits An interface for reviewing search history and refining search
results with Boolean operation
Rasel Al Mahmud et al.

BioAssay FTP ftp://ftp.ncbi.nlm.nih.gov/pubchem/Bioassay/ FTP for all PubChem BioAssay records and related information
BioAssay data ftp://ftp.ncbi.nlm.nih.gov/pubchem/data_spec/ Standard XML data specification for PubChem, BioAssay data
model
BioAssay classification https://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?p= To browse BioAssay classification tree
classification
Bioactivity data tool https://pubchem.ncbi.nlm.nih.gov/assay/ To retrieve a full data table from a single bioassay record
Structure–activity analysis https://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?p=heat To analyze and visualize structure–activity relationship with
(SAR) clustering tools and a heatmap-style display
Dose–response curve tool https://pubchem.ncbi.nlm.nih.gov/assay/plot.cgi? To analyze bioassay test results and visualize dose–response
Plottype=1 curve
BioActivity summary - https://pubchem.ncbi.nlm.nih.gov/assay/bioactivity.cgi? To summarize and analyze bioactivity data for a set of records,
compound-centric tab=1 presented from the compound point of view
BioActivity summary - https://pubchem.ncbi.nlm.nih.gov/assay/bioactivity.cgi? To summarize and analyze bioactivity data for a set of records,
assay-centric tab=2 presented from the assay point of view
BioActivity summary - https://pubchem.ncbi.nlm.nih.gov/assay/bioactivity.cgi? To summarize and analyze bioactivity data for a set of records,
target-centric tab=3 presented from the target point of view
Web-Based Chemogenomic Resources 11

permanent identifier for a depositor-supplied molecule which


belongs to PubChem Substance Database. In addition, each SID
corresponds to a unique external registry ID provided by a Pub-
Chem data source. On the other hand, a compound identifier (CID)
is the permanent identifier for a unique chemical structure and it is
found in the PubChem Compound database. To be more precise, for
instance, each stereoisomer of a compound has its own CID and it is
also possible for different tautomeric forms of the same compound
to have different CIDs. There are many tools and services in these
two databases (Table 3). Both subdatabases provide limiting-type as
well as advanced-type search for the exploration of data (Fig. 2b).

2.4 ChEMBL ChEMBL is a manually curated database of bioactive drug-like


small molecules. It is hosted by the European Bioinformatics Insti-
tute (EBI) of the European Molecular Biology Laboratory
(EMBL). Among many types of information, it provides online
information about the 2D structures and calculated properties
(logP, Molecular Weight, Lipinski Parameters, etc.) of small mole-
cules, along with per-protein binding constants, multiprotein phar-
macology, and ADMET data. With regard to database content,
first, data are abstracted and curated from primary scientific litera-
ture which covers a significant fraction of the SAR studies and hit
finding of modern drugs. Again, a curated linkage between indexed
2D chemical structures and biological targets is provided, along
with standardization of measurements to common types and units,
where possible. Extended information about targets accessions
such as if they are being tested as single proteins, as part of protein
complexes, the target’s subcellular localization(s), the cell lines in
which the target is expressed and/or was tested in, the tissues
where the targets expressed are included, and finally in vivo infor-
mation of a host organism. In addition to the literature-extracted
information, ChEMBL also integrates deposited screening results
from PubChem Bioassay (see above), along with information on
approved drugs, late-stage clinical development candidate drugs,
and drugs with improved efficacy with respect to specific targets
(specificity). In this way data are optimized for quality and utility
across a broad range of chemical biology and drug-discovery
research problems. Hence for a chemogenomics study this database
is very trustworthy as well as potentially useful to assemble a large
and reliable base of information for a project.

2.4.1 Data Content The data content of this online resource grows continuously; release
22 published in August 2016 contains information that is extracted
from more than 65,000 scientific articles, along with 50 stored data
sets (Table 4). To be more specific, this resource at present organizes
1,686,695 distinct compounds of which 1,678,393 (99.5%) have
molecular structure stored and available. In addition, the newest
release represents more than 14 million activity values from
Table 3
12

Tools and services in the PubChem compound and substance databases

Database/Services World Wide Web Address/URL Application

Chemical structure https://pubchem.ncbi.nlm.nih.gov/search/search.cgi Allows users to query the PubChem compound database by chemical structure or
search chemical structure pattern.
Chemical structure https://pubchem.ncbi.nlm.nih.gov/edit/ A platform-independent 2D molecule drawer, compatible with major web browsers.
sketcher
Standardization https://pubchem.ncbi.nlm.nih.gov/standardize/ Validates and normalizes an input chemical structure in the same way as PubChem
service standardization process.
Rasel Al Mahmud et al.

Classification https://pubchem.ncbi.nlm.nih.gov/classification/ Allows users to browse PubChem data using a classification of interest, or search for
browser records annotated with the desired classification/term.
Identifier exchange https://pubchem.ncbi.nlm.nih.gov/idexchange/ Converts one type of identifiers for a given set of chemical structures into a different
service type of identifiers for identical or similar chemical structures.
Score matrix service https://pubchem.ncbi.nlm.nih.gov/score_matrix/ Computes matrices of 2D and 3D similarity scores for a given set of compounds.
Structure clustering https://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi? Clusters compounds/substances based on their structural similarity using the single
p¼clustering linkage algorithm.
Widgets https://pubchem.ncbi.nlm.nih.gov/widget/docs/ Provides a rapid way to display some commonly requested PubChem data views.
Web-based 3D https://pubchem.ncbi.nlm.nih.gov/vw3d/ An interactive web-based viewer for 3D conformations of molecules, which visualizes
viewer 3D information available within PubChem.
Pc3D viewer https://pubchem.ncbi.nlm.nih.gov/pc3d/ An interactive 3D molecular viewer that can be downloaded and installed on local
machines.
Structure download https://pubchem.ncbi.nlm.nih.gov/pc_fetch/ Downloads a set of substance or compound records in PubChem.
Power user gateway, https://pubchem.ncbi.nlm.nih.gov/pug/pughelp.html Provides programmatic access to PubChem services via a single common gateway
PUG) interface (CGI), called “pug.Cgi”.
PUG-REST https://pubchem.ncbi.nlm.nih.gov/pug_rest/ A representational state transfer (REST)-full style web service access layer to
PubChem.
PUG-SOAP https://pubchem.ncbi.nlm.nih.gov/pug_soap/ A web service access method that uses the simple object access protocol (SOAP).
PubChemRDF https://pubchem.ncbi.nlm.nih.gov/rdf/ The RDF-based resource compatible with semantic web standards and technologies.
Web-Based Chemogenomic Resources 13

Table 4
Data sources included in the ChEMBL release 22

No. of No. of No. of


Short name Source compounds assays activities
LITERATURE Scientific literature 967,242 963,186 5,635,084
PUBCHEM PubChem BioAssays 489,575 2937 7,559,601
BIOASSAY
BINDINGDB BindingDB database 68,149 1317 99,061
SUPPLEMENTARY Deposited supplementary bioactivity 1786 13 4817
data
ANDIDATES Clinical candidates 1633 0 0
TP TRANSPORTER TP-search transporter database 1434 3592 6765
DRUGMATRIX DrugMatrix 930 113,678 350,929
METABOLISM Curated drug metabolism pathways 828 0 0
ATLAS gene Expression atlas compounds 378 0 0
GSK PKIS GSK published kinase inhibitor set 366 456 169,451
SANGER Sanger institute genomics of drug 137 714 73,169
sensitivity in cancer
FDA APPROVAL FDA approval packages 43 1386 1387
HARVARD Harvard malaria screening 37 4 111

1,246,132 assays. In turn, these assays are mapped to more than


11,000 targets (single targets, complexes, etc.) encompassing 9052
proteins which includes 4255 human proteins.

2.4.2 Data Access ChEMBL is accessible from The European Bioinformatics Institute
(EMBL-EBI) home page under the service section on Tools and
Databases (see Table 1). The ChEMBL interface is accessible
through simple browsing using ChEMBL with keyword text
searches (Fig. 3). This interface provides versatile tools such as the
primary ChEMBL database which provides bioactivity data to facil-
itate drug discovery, SureChEMBL dedicated for chemical struc-
tures from patents, while UniChem is useful for the chemical
structure integration through different number of public sources.
In addition, The SARfari collections deal with the system-level
views of kinases, GPCRS, and ADME biology, and DrugEBIlity
provides a way for drug target prioritization for the users. Thus,
these versatile tools make the data access, exploration, retrieval, and
analysis procedure more user friendly and systematic for com-
pounds, targets, or assays deposited in ChEMBL.
14 Rasel Al Mahmud et al.

Fig. 3 The ChEMBL interface

2.5 ChEBI Chemical Entities of Biological Interest also known as ChEBI [12]
is maintained by EMBL-EBI. This database manually annotates
2.5.1 Overview
small molecular entities where a molecular entity is defined as any
of Database
constitutionally or isotopically distinct atom, molecule, ion, ion
pair, radical, radical ion, complex, conformer, etc. identifiable as a
separately distinguishable entity.
This database provides information of molecules based on such
chemical structure and nomenclature. Ontology is used to describe
the relation among different molecules. For example, if A, B, and C
are three compounds, there might be the relations that A is a
conjugate acid of B, and B is a tautomer of C. For the nomenclature
and terminology determination, ChEBI follows the guideline of
the International Union of Pure and Applied Chemistry (IUPAC)
and the International Union of Biochemistry and Molecular Biol-
ogy (NC-IUBMB).

2.5.2 Data Access The address of the home page is https://www.ebi.ac.uk/chebi/. In


the home page there are interfaces for browsing the database. A
search box is present in the top middle of the home page.

Searching ChEBI There are two types of search in ChEBI. One is the quick search,
where simply a keyword for a compound is provided as input, e.g.,
“etoposide.” This is the most convenient one. The other type of
Web-Based Chemogenomic Resources 15

search is the Advanced search where a structure is drawn and


additional molecular parameters are added to search the database.

2.6 ChemSpider ChemSpider [13] was initially developed with a goal to accumulate
and index the available sources of chemical structures and their
2.6.1 Overview
respective information in a single database.
of Database
After being started in 2007 to focus on building a structure-
oriented platform for chemists, ChemSpider currently deposits
more than 58 million unique chemical structures derived from
484 sources ranging from chemical vendors to commercial database
vendors and publishers, and members of the Open Notebook
Science community. By using interlinked connections ChemSpider
can provide important data beyond chemical structure including
interactive spectra, crystallographic data, patents, and so forth.

2.6.2 Database Access For accessing the database a Web browser is needed, and visiting
the following link will take the user to ChemSpider home page:
http://www.chemspider.com/.

Searching in ChemSpider Three types of searches can be performed in ChemSpider: Simple,


Structure-based, and Advanced, where Advanced is a combination
of first two.
In the Simple search, a simple keyword can be used for search-
ing, but the latter two are more complex. The latter two combine
keywords with structure and more molecular parameters for
searching.

2.7 PharmGKB The PharmGKB [15] is a pharmacogenomics knowledge


Databases resource which curates the pharmacogenomic data of different
drugs. The beauty of this database is the practical relation of
2.7.1 Database Overview
drugs to the metabolic pathways and genes in which the impact
of any drug with potential genetic variation is also provided. The
data sources in PharmGKB and relation of them is illustrated in
Fig. 4.

2.7.2 Database Access PharmGKB is hosted on at following address https://www.


Method pharmgkb.org/index.jsp. By specifying the Web address in a Web
browser, the user can visit the PharmGKB home page.

Searching in PharmGKB In PharmGKB several types of keywords ranging from drug/


chemical name to gene name, variant locus or phenotype can be
used as search criteria. In the search box, any keyword can be provided
and search can be performed. Additionally, the data are arranged
based on drug name/labels, related pathways or dosing guidelines.
From the home page, these links can lead to corresponding arrange-
ment of the data appropriate to the aspect being investigated (see
Note 2).
16 Rasel Al Mahmud et al.

Fig. 4 Interconnected data sources of PharmGKB. Adapted from [16]

3 Methods

3.1 ChemProt Chemprot is a Web-based resource of annotated and predicted


protein–protein and chemical–protein interaction which can display
multiple interactions on a single heatmap.
The following steps will briefly describe the data mining proce-
dure from ChemProt for the etoposide example.

3.1.1 Searching Data A user can search for a query in ChemProt such as by typing a
compound in the “compound” field, by either protein sequence or
Uniprot identifier, by a common disease name, by a side effect, or
by ATC (Anatomical Therapeutic Chemical Classification System)
code (Fig. 5a). The outcome of the data varies according to the
searching option; for instance, if etoposide is searched as a query
compound, ChemProt automatically looks for similar compounds
in the database (based on SEA) and displays these data in conjuga-
tion with etoposide. In Fig. 5b, the heatmap represents the com-
bined data for etoposide and protein interaction where the
horizontal axis represents associated proteins and vertical axis
represents bioactivity data. The color of the heatmap represents
the strength of interaction, i.e., blue and orange color represent
weak and strong interaction respectively (Fig. 5b). Please see Note 3
for generating a new heatmap based on substructures within a
query compound and target collection.
On the other hand, searching by side effect or ATC code will
return all chemicals in the database associated with such a side effect
or ATC code respectively. Similarly, searching for a disease will
Web-Based Chemogenomic Resources 17

Fig. 5 (a) Home page of ChemProt with etoposide as a query compound. (b) The etoposide–protein interaction
heatmap for disease-associated proteins. Here the horizontal axis represents associated proteins and the
vertical axis represents bioactivity data. Colors of the heatmap represent the strength of interaction, i.e., blue
and orange colors represent weak and strong interactions, respectively
18 Rasel Al Mahmud et al.

Fig. 5 (continued)

return all proteins associated with the disease. In these types of


search results, heatmaps are also returned containing associated
proteins and bioactivities. All of this functionality is provided in a
single, unified search box.

3.1.2 Analyze The heatmap in Fig. 5b displays the association of a related protein
the Heatmap Data and a disease interaction with Etoposide. To navigate the func-
tional or pathway related protein association with the query com-
pound, the user has to select these two options respectively from
the annotated protein bar.
By clicking on the “flag” logo next to a compound name, a user
can get access to the chemical structure of the compound and upon
selecting the specific structure from the structure list, detailed
chemical information for the queried compound will appear (see
Note 4). Here for the Etoposide example, sets of chemical infor-
mation are found as shown in Fig. 6.
By clicking on the “fingerprint” logo in the vicinity of the
compound name, a chemical structure similarity profiling can be
performed, enabling the user to visualize and to navigate within
that chemical space.
A detailed bioactivity profile is available for each of the enlisted
compounds of ChemProt based on Ki, AC50, or IC50 value. For
the etoposide example the bioactivity information available includ-
ing the total number of associated proteins and interactions with
etoposide is as shown in Fig. 7.
Web-Based Chemogenomic Resources 19

Fig. 6 Basic chemical data for etoposide in ChemProt

Fig. 7 ChemProt annotation information about interactions with etoposide

External database information is stored in the “Database Info”


icon. For the etoposide query, the majority of the data is acquired
from CHEMBL while the rest is from Drugbank and BindingDB
(Fig. 8).

Step-3: Data Acquisition From the “Download list” icon, a user can download all of the
available data in CSV format; this covers the sources of data,
ChemProt ID, chemical formula in SMILES form, UniProt
name, SEA values, and other related information for the queried
compound as well as other similar compounds listed.
Another random document with
no related content on Scribd:
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law in
the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms of
this agreement by keeping this work in the same format with its
attached full Project Gutenberg™ License when you share it without
charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears, or
with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning of
this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1 with
active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or expense
to the user, provide a copy, a means of exporting a copy, or a means
of obtaining a copy upon request, of the work in its original “Plain
Vanilla ASCII” or other form. Any alternate format must include the
full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in paragraph
1.F.3, the Project Gutenberg Literary Archive Foundation, the owner
of the Project Gutenberg™ trademark, and any other party
distributing a Project Gutenberg™ electronic work under this
agreement, disclaim all liability to you for damages, costs and
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO
REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF
WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE
PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE
FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving it,
you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or entity
that provided you with the defective work may elect to provide a
replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like