You are on page 1of 20

talks

Variant annota)on

Annota)ng context, func)on


and genotype phasing
We are here in the Best Practices workflow
Variant annota)on
Two types of variant annota)ons:
Context annota)ons and Func)onal annota)ons

Context annota)ons Func)onal annota)ons


describe the evidence for describe their predicted
and context of variants biological eects
Are there repeats nearby? What gene does it aect?
What is the sequence Is it in a coding or a non-
quality like? coding region?
How frequent is the variant Is it a synonymous or a non-
in related individuals? synonymous muta)on?

Context annota)ons help Func)onal annota)ons


rene our es)mate of help nd links between
how likely a variant is to variants and traits, e.g.
be true disease
Tools for annota)ng variants

VariantAnnotator
Built-in context annota)ons

Genotype phasing tools


Physical and familial inheritance phasing (per-sample context)

Non-GATK (e.g. Oncotator)


Add func)onal annota)ons to a set of variants

ANNOTATING CONTEXT
GATKs built-in context annota)ons

... the list goes on! See documenta)on for details.


TOOL TIPS
Annota)ng context with VariantAnnotator


java jar GenomeAnalysisTK.jar T VariantAnnotator \
R human.fasta \
A FisherStrand \
I input_reads.bam \
V original.vcf \
o annotated.vcf

Useful for adding annota)ons that were omiRed from the ini)al
variant calling process (hindsight is 20/20)
Some need to see the reads or other data (seq context)
Some need a minimum number of samples (pop gene)cs)
Genotype phasing annota)ons

Phasing genotypes aims to determine whether variants are


associated with the same haplotypes in specic samples.

Physical phasing looks at how Phasing by transmission uses


reads overlap with each other family structure (trios and
Can be produced directly by parent/child pairs)
HaplotypeCaller or annotated Can be used to establish
aXer calling by Mendelian inheritance and
ReadBackedPhasing predict likely de novo muta)ons
Example site showing Mendelian inheritance in a trio

Haplotype #1 No muta)ons
Daughter from father from mother

Father Haplotype #1 Haplotype #2

Mother No muta)ons
TOOL TIPS
PhaseByTransmission

Phases parent/child groups at unambiguous sites


Based on genotype likelihoods of family members
Ambiguous sites: all individuals heterozygous, Mendelian viola)ons etc.
Will try to model de novo muta)ons


java jar GenomeAnalysisTK.jar T PhaseByTransmission \
R human.fasta \
input original.vcf \
ped input.fam \
o phased.vcf

Phases genotypes between samples at a par)cular posi)on

Requires pedigree le describing rela)onships


See hRp://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped
This is what a phased site looks like in the VCF

Original VCF

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT MOTHER FATHER CHILD
1 10109 . A T 99 PASS . GT:PL 0/0:0,50,200 0/0:0,40,200 0/1:30,0,200
1 10147 . C A 99 PASS . GT:PL 0/1:0,30,200 0/0:0,50,200 0/1:200,40,0
1 10150 . C T 99 PASS . GT:PL 0/1:0,40,200 0/1:30,0,200 1/1:200,50,0

Phased VCF

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT MOTHER FATHER CHILD
1 10109 . A T 99 PASS . GT:PL:TP 0|0:0,50,200:10 0|0:0,40,200:10 0|0:30,0,200:10
1 10147 . C A 99 PASS . GT:PL:TP 1|0:0,30,200:10 0|0:0,50,200:10 1|0:200,40,0:10
1 10150 . C T 99 PASS . GT:PL:TP 1|0:0,40,200:10 1|0:30,0,200:10 1|1:200,50,0:10

The conven)on is:


Allele From Mother | Allele From Father
Ploidy assump)ons in annota)ons / phasing

Problem(s) for non-diploid organisms:



Many annota)ons assume samples are diploid
Phasing tools also assume diploid


No real solu)on for now just dont use them

No available resources for annota)on / phasing

Problem(s) for non-model organisms


E.g. func)onal annota)on, no database of transcripts
available for your organism!

May need to use another tool or a dierent


approach.

Ask on the forum, we have quite a few users who are


able to recommend resources.

ANNOTATING FUNCTION
Oncotator

Annota)on tool for point muta)ons and indels



Iden)es overlapping transcripts and genes
Determines eect on protein sequences
Aggregates data from external sources
Supports mul)ple input and output formats

Ships with datasources relevant to cancer researchers, but


adaptable to any domain of study

Web app: broadins)tute.org/oncotator_beta/


Github: github.com/broadins)tute/oncotator Ramos AH, Lichtenstein L, et al.
Human Muta)on. 2015. In press.
Classica)on Using GENCODE

Variant type: Variant classica)on:


SNP, DEL, INS
Frame_ShiX_Del Silent
Frame_ShiX_Ins Missense_Muta)on
In_Frame_Del Nonsense_Muta)on
In_Frame_Ins etc.
Handling Mul)ple Transcripts

uc002fpe.4 (MC1R)

uc002fpf.2 (TUBB3)
Is this MC1R or TUBB3?

Selec)on strategies:
CANONICAL (default) GAF 3.0 has a condence score for each transcript.
Choose the transcript with the highest condence
EFFECT Choose the transcript with the most deleterious eect.

Transcripts that are not selected appear in the other_transcripts
annota)on with variant classica)on and gene.
Supported formats

Input Output
File formats File formats
*.tsv *.tcga.maf.txt
*.vcf *.vcf
*.seg *.bed
Annotator

Web API Web API
hRp request JSON response

Source
formats


*.tsv
*.vcf
*.bigwig Datasources
We are here in the Best Practices workflow
Variant annota)on
talks

Further reading
hRp://www.broadins)tute.org/gatk/guide/
hRps://www.broadins)tute.org/gatk/guide/tooldocs/
org_broadins)tute_gatk_tools_walkers_annotator_VariantAnnotator.php
hRps://www.broadins)tute.org/gatk/guide/tooldocs/
org_broadins)tute_gatk_tools_walkers_phasing_PhaseByTransmission.php
hRps://www.broadins)tute.org/gatk/guide/tooldocs/
org_broadins)tute_gatk_tools_walkers_phasing_ReadBackedPhasing.php