You are on page 1of 32

Metabolomic Data Analysis Case Studies

Case Studies

Dmitry Grapov, PhD

Case Studies
1. Data Exploration and Analysis Planning Lung Cancer 2. Multifactorial Design Mouse Cerebellum 3. Time Course OGTT Metabolomics

Analysis Planning DOD Lung Cancer Plasma (CARET)

Summary Analysis of plasma primary metabolites to identify circulating markers related with lung cancer histology type. Methods Exploratory data analysis using principal components analysis (PCA) Analysis of covariance (ANCOVA) Orthogonal partial least squares discriminant analysis (OPLS-DA) Hierarchical cluster analysis (HCA) and multidimensional scaling (MDS)

Lung Cancer: Exploratory Analysis

Purpose Overview data variance structure Methods Singular value decomposition (SVD) on autoscaled data
PC1 and 2 (14% variance explained) display 2 clusters of points Cluster structure could not be explained by histology or any other metadata Cluster structure is best explained by instrumental acquisition date

Black - 110629 to 110701 Red - 110702 to 110705

Lung Cancer: Analysis Planning

Purpose Identify significant changes in metabolites while adjusting for the noted batch effect, gender and smoking status covariates. Methods Shifted logarithm (natural) transformed data ANCOVA: batch + gender + smoking False Discovery Rate correction and estimation PCA used to overview covariate adjusted data structure OPLS-DA was used to evaluate covariate adjustments and hypothesis testing strategies

Cluster structure in the adjusted data suggests that there is another unexplained covariate

Modeling histology (control in green)

Modeling control/cancer and histology

Lung Cancer: ANCOVA

Summary Optimal testing strategy was identified as : Using covariate adjusted data ( ~batch +gender +smoking) to test for differences between control and cancer (adenocarcinoma, NSCLC and squamous)

OPLS-DA overview of optimized modeling strategy

Identified 24 (8%) significantly changes species (3 post FDR)

Lung Cancer: Correlation Analysis

Purpose Identify relationships between known and unknown metabolic features. Methods Hierarchical cluster analysis (euclidean distances from spearmans correlations, linked by wards method) Summary Top features could be grouped into 8 major correlated clusters

Top changed unknown metabolites could be linked to named species 223566 tryptophan 225405 1/ beta-alanine 274174 methionine, glucuronic acid 228377 tryptophan 362112 tryptophan

Lung Cancer

Conclusions Metabolic data contained batch effects, which could be in part explained by data acquisition date Univariate analyses were limited by the effects of outliers Multivariate modeling was used to identify 64 features (21%) which best explain differences in plasma metabolites from patients with or without lung cancer hydroxylamine, aspartic acid, and tryptophan displayed patterns of change consistent with differences in patient cancer histology Correlation analysis was used to link many significant changes in unknowns to tryptophan

Multifactorial Design Mouse Cerebellum Metabolomics

Summary Analysis of mice carrying a gene mutation in ERCC8. Cockayne Syndrome B, rare autosomal recessive congenital disorder, which is related to premature aging. Mutant animals display altered glycolytic and mitochondrial metabolism which is benefited by a high fat diet.
Study Design 2 genotypes (WT, CSB; n=20) 4 diets per genotype (SD, Resv, CR, HFD; n=5)

Analysis principal components analysis (PCA) two-way analysis of variance (ANOVA) orthogonal partial least squares discriminant analysis (OPLS-DA) network mapping

Mouse Cerebellum: PCA

Method Conducted on autoscaled data using SVD. Findings Identified 6 possible outliers all of which are in the WT genotype

Mouse Cerebellum: Outliers

methods Use PLS-DA to determine if outlier samples hold when trying to maximize the difference between WT and CSB animals.


Findings Noted outliers in WT should be removed or analyzed separately


Mouse Cerebellum: ANOVA

Methods shifted log transformed data two-way ANOVA (genotype, diet) Findings Identification of significant changes in metabolites due to genotype, diet (treatment) and interaction between genotype and diet

genotype effect

treatment effect

interaction effect

Mouse Cerebellum: Multivariate Modeling

Methods autoscaled data classification of sample genotype OSC-PLS-DA/OPLS-DA



Mouse Cerebellum: Multivariate Modeling

Methods autoscaled data classification of sample genotype and diet (OPLS-DA) evaluation of Y construction (separate and combined)

multiple Y

single Y

Mouse Cerebellum: Multivariate Modeling

Methods autoscaled data classification of diet (treatment) effects independently in each genotype



Mouse Cerebellum: Network Analysis

Methods generate biochemical and chemical similarity network map statistical and OPLS-DA model results to network Analyze
genotype network Treatment networks in WT and CSB separately

Mouse Cerebellum: Genotype Network

Mouse Cerebellum: WT Treatment Network

Mouse Cerebellum: CSB Treatment Network

Mouse Cerebellum
Major differences between CSB and WT : elevation of 2-hydroxyglutaric acid in CSB
2-hydroxyglutaric aciduria is either autosomal recessive or autosomal dominant

perturbations in methionine and (potentially) single-carbon metabolisms.

Increase in the related species methionine, homoserine and serine and decrease in adenosine-5'phosphate may point to decreases in sadenosyl methionine (SAM-e) synthesis. Reduction in SAM-e could have detrimental effects on single carbon metabolism and methylation reactions, which through a systemic reduction in choline would impact phospotidylcholine synthesis.

Independent of genotype, treatment effects can be classified on a continuum of metabolic change from CR >HFD > Resv > SD.
Treatment-related changes in citrulline were modified based on genotype (strong genotype/treatment interaction).

Similar changes due to treatment in both genotypes (e.g. 1,5anhydroglycitol) may be an outcome of diet composition and not biology.

Time Course
Oral Glucose Tolerance Test Metabolomics
Summary Analysis of changes in plasma primary metabolites during an oral glucose tolerance test (OGTT) before and after a 14 week diet and exercise intervention. Study Design Overweight women (12-15, obese sedentary, glucose 100 -128 mg/dL ) Pre and post intervention Clinical panel: insulin, glucose, lipids Primary metabolites at 0, 30, 60, 90, 120 minutes
Analysis principal components analysis (PCA) two-way analysis of variance (ANOVA) orthogonal partial least squares discriminant analysis (OPLS-DA) network mapping

OGTT: Data Properties


Baseline and Area Under the Curve (AUC)

Time Course: Options

Raw (top) vs Baseline adjusted (bottom)

Baseline adjusted vs AUC

OGTT: Data Analysis

Identification of OGTT effects
significant metabolomic excursions (one sample t-Test on AUC) pre, post or both intervention-adjusted PLS model OGTT biochemical/chemical similarity network

Identification of treatment effects

Univariate statics
Two-way ANOVA time and intervention Mixed effects modeling (intervention as the main effect and individual subjects as random effects)

PLS-DA modeling and feature selection of changes in Baseline (t =0) AUC Combined baseline and AUC

Analysis of correlations

OGTT: effects on primary metabolism

(intervention adjusted data modeling time)


OGTT: effects network

OGTT: Treatment Effects


OGTT: Treatment Effects

Learning from the samples scores position

OGTT: Treatment Effects

Variable Loadings Feature Selection on Loadings

OGTT: Linking biology with our experiment

OGTT: Analysis of Correlations

Each data analysis is unique Which method should be used is defined by how the data looks and the goal of the analysis Different analysis techniques are used to get independent perspectives of the data Combination of similar evidence from different techniques is used to define the robust explanation of the experiment