Professional Documents
Culture Documents
Omics-Based On Science, Technology, and Applications Omics
Omics-Based On Science, Technology, and Applications Omics
The word omics refers to a field of study in biological sciences that ends with -omics, such as
genomics, transcriptomics, proteomics, or metabolomics. The ending -ome is used to address the
objects of study of such fields, such as the genome, proteome, transcriptome, or metabolome,
respectively.
The study of genome, proteome, and metabolome is called omics. ... These include their role in
determining soil microbial ecology, controlling environmental pollution, identification of
chemicals-induced toxicity and modification of geno-proteome composition of an organism.
Omics aims at the collective characterization and quantification of pools of biological
molecules that translate into the structure, function, and dynamics of an organism or
organisms. ... It combines different -omics techniques such as transcriptomics and proteomics
with saturated mutant collections
What is Omics
1. Is the part of biotechnology which analyzes the structure and functions of the whole
makeup of a given biological function, at different levels, including the molecular gene level
(“genomics”), the protein level (“proteomics”), and the metabolic level (“metabolomics”). Learn
more in: Risk-Benefit Evaluation in Clinical Research Practice
2. Due to the recent advent of highly parallel assays, it is now possible to monitor the behavior
of not just one or a couple of variables but rather tens of thousands of variables at once. A
growing number of disciplines with the ‘-omics’ suffix like genomics, transcriptomics,
metabolomics and so on, intend to describe and understand completely a given level. Learn more
in: Systems Biology Applied to Cancer Research
3. A useful concept in biology which informally annotates a field of study ending in ‘-omics’.
Omics aims at the collective characterization and quantification of pools of biological molecules
that translate into the structure, dynamics and function, of an organism. Accordingly ‘genomics’
deals with the entirety of an organism's hereditary information coded in its DNA (also called
genome); ‘transcriptomics’ deals with the entirety of RNA transcribed from the DNA
(transcriptome), ‘proteomics’ deals with the entirety of proteins translated from the mRNA
(proteome) and ‘epigenomics’ addresses factors and mechanisms affecting the accessibility of
genomic information by modifications of its structure, e.g. via DNA-methylation or chemical
modifications of the histones serving as DNA-packing proteins (epigenome). Historically, the
first ‘omics’ term used was ‘genome’ created in 1920 by the botanist H. Winkler as a blend of
the words ‘gene’ and ‘chromosome’ to annotate the chromosome set as the material foundations
of an organism. In the last years one observes however an inflation of ‘omics’-terms often used
simply to annotate any field of study
Since the process of mapping and sequencing the human genome began, new technologies have
made it possible to obtain a huge number of molecular measurements within a tissue or cell.
These technologies can be applied to a biological system of interest to obtain a snapshot of the
underlying biology at a resolution that has never before been possible. Broadly speaking, the
scientific fields associated with measuring such biological molecules in a high-throughput way
are called “omics.”
Many areas of research can be classified as omics. Examples include proteomics,
transcriptomics, genomics, metabolomics, lipidomics, and epigenomics, which correspond to
global analyses of proteins, RNA, genes, metabolites, lipids, and methylated DNA or modified
histone proteins in chromosomes, respectively. There are many motivations for conducting
omics research. One common reason is to obtain a comprehensive understanding of the
biological system under study. For instance, one might perform a proteomics study on normal
human kidney tissues to better understand protein activity, functional pathways, and protein
interactions in the kidney. Another common goal of omics studies is to associate the omics-based
molecular measurements with a clinical outcome of interest, such as prostate cancer survival
time, risk of breast cancer recurrence, or response to therapy. The rationale is that by taking
advantage of omics-based measurements, there is the potential to develop a more accurate
predictive or prognostic model of a particular condition or disease—namely, an omics-based test
(see definition in the Introduction)—that is more accurate than can be obtained using standard
clinical approaches.
This report focuses on the the stages of omics-based test development that should occur prior to
use to direct treatment choice in a clinical trial. In this chapter, the discovery phase (see Figures
2-1 and S-1) of the recommended omics-based test development process is discussed, beginning
with examples of specific types of omics studies and the technologies involved, followed by the
statistical, computational, and bioinformatics challenges that arise in the analysis of omics data.
Some of these challenges are unique to omics data, whereas others relate to fundamental
principles of good scientific research. The chapter begins with an overview of the types of omics
data and a discussion of emerging directions for omics research as they relate to the discovery
and future development of omics-based tests for clinical use.
Omics-based test development process, highlighting the discovery phase. In the discovery phase,
a candidate test is developed, precisely defined, and confirmed
Modern drug discovery and the commensurate need to better understand and define disease is
utilizing bioinformatics techniques to gather information from multiple sources (such as the
HGP, functional genomic studies, proteomics, phenotyping, patient medical records, and
bioassay results including toxicology studies), integrate the data, apply life science developed
algorithms, and generate useful target identification and drug lead identification data. As seen in
Fig. 8.2, the hierarchy of information collection goes well beyond the biodata contained in the
genetic code that is transcribed and translated. A recent National Research Council report for the
US National Academies entitled “Toward Precision Medicine: Building a Knowledge Network
for Biomedical Research and a New Taxonomy of Disease” calls for a new data network that
integrates emerging research on the molecular basis of diseases with the clinical data from
individual patients to drive the development of a more accurate taxonomy of disease that
ultimately improves disease diagnosis and patient outcomes (U.S. National Academies 2011).
The report notes that challenges include both scientific (technical advances needed to correlate
genetic and environmental findings with incidence of disease) and legal and ethical challenges
(privacy issues, electronic health records or EHR, etc.).
Figure 8.2 ■
Examples of the types of omics data that can be used to develop an omics-based test are
discussed below. This list is by no means meant to be comprehensive, and indeed a
comprehensive list would be impossible because new omics technologies are rapidly developing.
Genomics
The genome is the complete sequence of DNA in a cell or organism. This genetic
material may be found in the cell nucleus or in other organelles, such as mitochondria.
With the exception of mutations and chromosomal rearrangements, the genome of an
organism remains essentially constant over time. Complete or partial DNA sequence can
be assayed using various experimental platforms, including single nucleotide
polymorphism (SNP) chips and DNA sequencing technology. SNP chips are arrays of
thousands of oligonucleotide probes that hybridize (or bind) to specific DNA sequences
in which nucleotide variants are known to occur. Only known sequence variants can be
assayed using SNP chips, and in practice only common variants are assayed in this way.
Genomic analysis also can detect insertions and deletions and copy number variation,
referring to loss of or amplification of the expected two copies of each gene (one from the
mother and one from the father at each gene locus). Personal genome sequencing is a
more recent and powerful technology, which allows for direct and complete sequencing
of genomes and transcriptomes (see below). DNA also can be modified by methylation of
cytosines (see Epigenomics, below). There is also an emerging interest in using genomics
technologies to study the impact of an individual’s microbiome (the aggregate of
microorganisms that reside within the human body) in health and disease (Honda and
Littman, 2011; Kinros et al., 2011; Tilg and Kaser, 2011).
Transcriptomics
The transcriptome is the complete set of RNA transcripts from DNA in a cell or tissue.
The transcriptome includes ribosomal RNA (rRNA), messenger RNA (mRNA), transfer
RNA (tRNA), micro RNA (miRNA), and other non-coding RNA (ncRNA). In humans,
only 1.5 to 2 percent of the genome is represented in the transcriptome as protein-coding
genes. The two dominant classes of measurement technologies for the transcriptome are
microarrays and RNA sequencing (RNAseq). Microarrays are based on oligonucleotide
probes that hybridize to specific RNA transcripts. RNAseq is a much more recent
approach, which allows for direct sequencing of RNAs without the need for probes.
Oncotype DX, MammaPrint, Tissue of Origin, AlloMap, CorusCAD, and the Duke case
studies described in Appendix A and B all involve transcriptomics-based tests.
Proteomics
The proteome is the complete set of proteins expressed by a cell, tissue, or organism. The
proteome is inherently quite complex because proteins can undergo posttranslational
modifications (glycosylation, phosphorylation, acetylation, ubiquitylation, and many
other modifications to the amino acids comprising proteins), have different spatial
configurations and intracellular localizations, and interact with other proteins as well as
other molecules. This complexity can lead to challenges in proteomics-based test
development. The proteome can be assayed using mass spectrometry and protein
microarrays (reviewed in Ahrens et al., 2010; Wolf-Yadlin et al., 2009). Unlike RNA
transcripts, proteins do not have obvious complementary binding partners, so the
identification and characterization of capture agents is critical to the success of protein
arrays. The Ova1 and OvaCheck tests discussed in Appendix A are proteomics-based
tests.
Epigenomics
The epigenome consists of reversible chemical modifications to the DNA, or to the
histones that bind DNA, and produce changes in the expression of genes without altering
their base sequence. Epigenomic modifications can occur in a tissue-specific manner, in
response to environmental factors, or in the development of disease states, and can persist
across generations. The epigenome can vary substantially among different cell types
within the same organism. Biochemically, epigenetic changes that are measured at high-
throughput belong to two categories: methylation of DNA cytosine residues (at CpG) and
multiple kinds of modifications of specific histone proteins in the chromosomes (histone
marks). RNA editing is another mechanism for epigenetic changes in gene expression,
measured primarily by transcriptomic methods (Maas, 2010).
Metabolomics
The metabolome is the complete set of small molecule metabolites found within a
biological sample (including metabolic intermediates in carbohydrate, lipid, amino acid,
nucleic acid, and other biochemical pathways, along with hormones and other signaling
molecules, as well as exogenous substances such as drugs and their metabolites). The
metabolome is dynamic and can vary within a single organism and among organisms of
the same species because of many factors such as changes in diet, stress, physical
activity, pharmacological effects, and disease. The components of the metabolome can be
measured with mass spectrometry (reviewed in Weckwerth, 2003) as well as by nuclear
magnetic resonance spectroscopy (Zhang et al., 2011). This method also can be used to
study the lipidome (reviewed in Seppanen-Laakso and Oresic, 2009), which is the
complete set of lipids in a biological sample.
Synthetic biology and metabolic engineering can benefit from systems biology
approaches, which in turn rely on “omics” technologies to characterize and understand
metabolic networks. The considerable amount of knowledge obtained from omics-driven
systems biology experiments can be used in the development of synthetic biology tools
and the advancement of metabolic engineering. This facilitates the manipulation of
complex biological systems toward establishing robust industrial biomanufacturing
platforms (Baidoo and Teixeira Benites, 2019).
Figure 1
An overview of the flow of molecular information from genes to metabolites to function
and phenotype, and the interactions between the “omes” and the “omics” techniques used
to measure them.
In comparison to metabolites and proteins, genes are less chemically heterogeneous. Each
gene is made up of DNA that is composed of only four basic nucleotides (i.e., guanine,
adenine, cytosine and thymine), whereas each protein is composed of a mixture of 32
amino acids, while metabolites are much more variable in their chemical structures
(Wang et al., 2010). Therefore, it is analytically less challenging to perform genomics
and transcriptomics, when compared to proteomics and metabolomics (Aizat et al.,
2018). Consequently, genomics and transcriptomics provide the most comprehensive and
robust platforms for biotechnology applications. Over the past few decades, research has
shown that genomics and transcriptomics cannot solely provide a complete description of
complex biological systems as genetic information can produce more questions than
answers. For instance, genomics can describe genes and their interactions (measure
genotype) but cannot explain phenotypes. Thus, the attention is turned to the utilization
of other “omics” techniques, such as proteomics and metabolomics, which can bridge the
gap between genetic potential and final phenotype to facilitate a greater understanding of
biological systems (Smith and Figeys, 2006; Wilmes et al., 2015). While transcriptomics
(transcription) and proteomics (translation) provide information on gene expression, the
latter directly links genotype to phenotype. In addition to providing phenotypic
information, the metabolome provides an instant response to genetic and/or
environmental perturbations and, therefore, provides a snap shot of the actual metabolic
and physiological state of a cell (Tang, 2011). However, metabolomics alone is not able
to measure changes at the gene level and correlate them with the observable properties of
organisms, the phenotypes, which are produced by the genotype in the first place (Fiehn,
2001). Therefore, a comprehensive understanding of an organism on a molecular level
requires the integration of “omics” data in order to discover new molecules and pathways
(Wang et al., 2010) (Figure 1). Integration of “omics” data helps to assess the flow of
information from one “omics” level to the other and, therefore, links genotype to
phenotype (Subramanian et al., 2020). Furthermore, the combination of “omics”
techniques is important to address open biological questions (i.e., data driven research)
that accelerate our understanding of the system as a whole and boost the use of systems
metabolic engineering tools in industrial settings (Zhao et al., 2020).
Metagenomics studies the structure and function of genetic material in complex samples
of multi-organisms as well as of entire microbial communities without a cultivation step
and can offer a solution for such challenges and facilitate the discovery of novel genes,
enzymes, and metabolic pathways. Metagenomics analyses are classified as sequence-
based and function-based screening, which are used to discover and identify,
respectively, novel natural genes and compounds from environmental samples
(Chistoserdova, 2010; Gilbert and Heiner, 2015; Kumar Awasthi et al., 2020). For
example, metagenomics is actively used in agricultural research to understand the
microbial communities in the soil system (Durot et al., 2009), to examine various
microbes that can stimulate the cycling of macro- and micro-nutrients, and the release of
essential enzymes, which enhance crop production (Cupples, 2005).
Proteomics
Proteomics focuses on the analysis of proteins and peptides produced by cells at different
stages of development and life cycle and in biological systems under a given growth
condition. Proteomics is also used to elucidate the temporal dynamics of protein
expression levels or post-translational modification (PTM) (VerBerkmoes et al., 2009).
Accurate determination of protein structure helps to define their roles and functions in
biological systems. However, many folded proteins have complex structures, which
complicates their structural elucidation (Yates, 2019). Therefore, cryogenic electron
microscopy and ion-mobility-MS are used to determine the structures of such proteins
such proteins (Yates, 2019). Moreover, a combination of MALDI, high resolution MS
(i.e., orbitrap and ion trap MS) and a UV–Vis-based reduction assay is used to elucidate
peptide modification via the analysis of specific fragmentation of synthesized peptides,
which might have inhibitory effects on various diseases (Rühl et al., 2019).
The identification of PTM peptides can be difficult in the case of labile modifications
(e.g., phosphorylation and S-nitrosylation) that might break down during MS/MS
fragmentation. Such modifications require soft fragmentation and high-resolution
methods to identify and determine the location of a PTM. Electron-transfer dissociation is
considered to be the favorable choice for the identification of liable PTM as it transfers
electrons to multi-protonated proteins or peptides, which leads to N-Cα backbone bond
cleavage (Chen et al., 2017).
Protein Quantification
Besides identification, MS-based technologies became the tools of choice for the
quantification of proteins in an organism (Karpievitch et al., 2010). Stable isotope
labeling is one approach that can be used to quantify proteins by measuring the relative
abundance of labeled protein to non-labeled protein (VerBerkmoes et al., 2009).
However, the variation in ionization efficiency among peptides and proteins and the low
recovery of some peptides (e.g., hydrophobic peptides adhere to surfaces) can affect the
accuracy of their direct quantification. Recent advances in MS acquisition rate, detection,
and resolution have addressed much of the sensitivity concerns of MS-based
quantification for proteomics (Iwamoto and Shimada, 2018). MS sensitivity was further
enhanced with the application of micro-flow (Krisp et al., 2015; Bian et al., 2020) and
nano-flow (Wilson et al., 2015) LC-MS. Another major advancement for global protein
quantification was the introduction of isobaric tags or multiplexed proteomics, which in a
single experiment enables the quantification of proteins across multiple samples
(Pappireddi et al., 2019). Tandem-mass-tags are examples of commonly used isobaric
tags for instance in human cerebrospinal fluids (Dayon et al., 2008).
Metabolomics
Although GC-MS requires more sample preparation steps when derivatizing hydrophilic
non-volatile metabolites, it is more robust than LC-MS. Moreover, method development
is easier for GC-MS than for LC-MS. GC-MS also achieves better identification of
untargeted metabolites due to standardized ionization conditions, which makes it possible
to set up a universal compound identification library/database, such as NIST. While CE
achieves the highest separation efficiency, CE-MS is the least robust and sensitive of the
three separation techniques.
Real-time metabolomics enables the simultaneous and high-throughput analysis of
microbial metabolites without the need for time-consuming sample preparation steps
(Link et al., 2015; Boguszewicz et al., 2019; Nguyen et al., 2020). However, the lack of
chromatographic or electrophoretic separation in this approach reduces the quantitative
capability of this technique (Baidoo and Teixeira Benites, 2019). While MALDI can be
used for high-throughput metabolite screening, MALDI imaging MS has emerged as a
powerful tool for analyzing tissue specimen in an unprecedented detail. MALDI imaging
MS has made significant contributions to the understanding of the biology of disease and
its perspectives for pathology research and practice, as well as in pharmaceutical studies
(Aichler and Walch, 2015; Mahajan and Ontaneda, 2017; Schulz et al., 2019).
Metabolomics technologies are regularly applied to metabolic flux analysis (MFA, i.e.,
13C) studies (Baidoo and Teixeira Benites, 2019). MFA determines the rates of in vivo
metabolic reactions. Thus, enabling an understanding of carbon and energy flow
throughout the metabolic network in a cell. Overall, MFA accelerates the discovery of
novel metabolic pathways and enzymes for improved synthetic bioproduction (Feng et
al., 2010; Ando and García Martín, 2019; Babele and Young, 2019; Vavricka et al.,
2020). However, the availability and high cost of stable isotope compounds can limit
MFA capability (Gonzalez and Pierron, 2015).
The recent advancement in omics technologies has improved the analysis efficiency by
reducing cost and time, but also by collecting informative and meaningful multi-omics
data. Thus, facilitating the implementation of multi-omics techniques in systems biology
studies. However, integrating multi-omics platforms is still an ongoing challenge due to
their inherent data differences (Saito and Matsuda, 2010; Yizhak et al., 2010; Brunk et
al., 2016; Koh et al., 2018; Pinu et al., 2019; Vavricka et al., 2020). For example,
genomics data are qualitative, accurate and reproducible, while other “omics” data, such
as proteomics and metabolomics are both qualitative and quantitative, not as
reproducible, and noisy (Kuo et al., 2002; MacLean et al., 2010; Guo et al., 2013; Gross
et al., 2018). Further, multi-omics data is normally pre-treated by various data treatment
methods (e.g., deconvolution, normalization, scaling, and transformation) and software
before being integrated. Multi-omics studies also require experts in their respective
“omics” fields (as well as IT support) to validate multi-omics data. While this provides
greater data interpretation accuracy it does, however, complicate data acquisition and
analysis.
Recently, Pinu et al. discussed some recommendations to overcome the major challenge
facing the implementation of multi-omics techniques in systems biology, which is the
differences among their inherent data. The aim of their recommendations is to make
researchers aware of the importance of having a proper experimental design in the first
place. Thus, the appropriate biological samples should be carefully selected, prepared,
and stored before planning any “omics” study. Afterward, researchers should carefully
collect quantitative multi-omics data and associated meta-data and select better tools for
integration and interpretation of the data. Finally, develop new resources for the
deposition of intact multi-omics data sets (Pinu et al., 2019). It is also necessary to select
or develop methods that keep the optimum balance between high recovery and low
degradation of extracted biological features.