Professional Documents
Culture Documents
In
the
dialog
above,
click
the
Download
plug-ins
button
and
select
the
Annotate
With
GFF
file
plug-in.
Click
the
download
and
install
button.
Please
note
that
you
need
an
administrator
login
to
do
this.
Next
we
will
visit
the
UCSC
Genome
Browser:
http://genome.ucsc.edu/
Under
the
Download
link
on
the
left,
you
can
download
sequence
and
annotation
information
for
a
variety
of
model
organisms.
In
this
exercise
we
will
download
the
mouse
chromosome
2,
as
a
FASTA
file,
and
decorate
it
with
annotations
for
know
genes
from
RefSeq,
as
well
as,
known
Single
Nucleotide
Polymorphisms
(SNPs)
from
dbSNP.
Start
with
the
backbone
sequence
of
Chromosome
2
here
under
Data
set
by
chromosome:
http://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/chr2.fa.gz
Once
the
download
completes,
un-zip
the
file
and
import
it
into
a
new
folder
in
the
CLC
Workbench.
At
this
point
the
sequence
is
completely
un-annotated.
You
may
wish
to
add
comments
to
this
sequence
under
the
Element
info
view
in
the
Workbench.
Useful
comments
here
would
be
what
build
of
the
genome
this
is
from,
important
features
on
the
sequence,
description,
etc.
Next
we
will
retrieve
the
annotation
tracks,
for
RefSeq
Genes
and
SNPs
from
the
UCSC
Table
Browser,
and
save
them
to
a
GTF
file:
Set
the
drop
down
choices
in
the
Browser
as
shown
above,
and
click
the
Get
Output
button
to
download
the
file.
Once
you
have
completed
the
download
and
inflated
the
file,
you
can
inspect
its
contents
in
any
simple
text
editor
as
shown
below.
This
file
can
now
be
used
to
batch
annotate
the
chr2
sequence
previously
saved
in
the
Workbench.
After
installing
the
Annotate
with
GFF
plug-in,
you
should
now
see
it
in
the
Toolbox
as
shown:
Double-click
to
launch
the
tool
wizard
and
you
will
be
prompted
for
the
template
sequence
and
the
GFF
file.
Save
the
result
and
open
up
the
chr2
file
and
inspect
the
annotations.
You
will
need
to
zoom
out
and
scroll
to
a
feature
dense
region
of
the
chromosome.
Note
the
useful
sections
of
the
right-
sidebar
settings,
which
are
expanded
in
the
following
screen:
The
Annotation
Layout
and
Annotation
Types
settings
offer
a
large
degree
of
flexibility
with
regards
to
the
display
of
features
and
labeling.
Adjusting
these
settings
for
optimal
display
at
differing
zoom
levels
and
feature
density,
will
be
important
when
you
go
to
generate
figures
for
publication
and
presentation.
Also,
note
the
redundant
nature
of
some
feature
types.
You
can
choose
which
to
hide
or
show.
Using
a
split
view
of
the
linear
sequence,
together
with
the
annotation
table
view,
makes
it
easy
to
inspect
and
edit
annotations.
Command-click
the
small
icon
for
the
Annotation
Table
to
do
this:
The
Annotation
table
lets
you
easily
navigate
a
huge
sequence
file
or
contig
by
making
selections
on
the
table
that
are
linked
to
the
graphic
display.
Filtering,
column
sorting,
and
batch
renaming
under
the
Advance
Renaming
function
are
all
useful
functions
for
creating
visual
reference
sequences.
Next,
we
will
add
known
SNPs
to
the
protein
coding
regions,
of
this
chromosome.
We
return
to
the
UCSC
Genome
Browser,
and
the
Table
Browser
interface.
This
time
we
will
create
a
GTF
file
containing
SNP
features,
from
the
group
Variations
and
Repeats,
and
the
SNP128
feature
track.
Much
of
the
data
in
this
track
is
in
the
region
outside
of
protein
coding
regions,
and
of
much
less
interest
in
many
targeted
re-sequencing
strategies.
As
a
practical
matter
also,
annotating
just
the
known
SNPs
in
coding
regions,
rather
than
the
entire
chromosome,
or
genome,
can
prevent
the
file
size
from
exploding.
For
these
reasons,
we
will
use
the
Table
Browser
interface
to
intersect
the
SNP
feature
tracks
with
the
Refseq
Gene
tracks.
Click
the
Create
button
next
to
Intersection.
Create
an
intersection
of
SNP128
data
with
Refseq
Genes
as
show
above,
click
the
submit
button,
then
get
the
output
GTF
file,
and
annotate
chr2
as
done
before
with
the
Refseq
track.
Unzip
and
open
the
new
GTF
file
a
text
editor.
In
this
file
the
SNPs,
as
you
will
see
below,
are
still
named
with
the
exon
feature
type.
We
will
correct
this
with
a
simple
search
and
replace
step.
replacing
Exon
with
Variation-SNP.
Save
the
file,
and
run
the
Annotate
Using
GFF
tool
in
the
Workbench.
An
inspection
of
the
chr2
sequence
in
the
Workbench
will
now
show
a
great
many
SNPs
but
only
on
those
areas
overlapping
with
the
pre-existing
CDS
and
Exon
features
from
the
RefSeq
track.
Since
the
naming
of
the
features
is
a
bit
redundant
with
the
feature
type,
we
will
use
the
Advance
Renaming
feature
to
rename
all
the
features
with
the
/gene_id
qualifier.
There
are
thousands
of
features,
on
this
table
and
the
Advance
Rename
function
has
a
limit
of
10,000,
so
we
will
need
to
repeat
the
following
process
several
times,
with
the
aid
of
a
table
filter,
to
filter
on
the
names
of
the
features
we
want
to
replace.
For
example,
filter
on
name,
CDS,
then
select
about
10,000,
right-click
on
the
selection,
and
choose
Advance
Rename
from
the
pop-up
menu,
then
choose
the
gene_id
qualifier.
In
the
following
screen
shot
we
are
looking
at
an
alignment
against
the
reference
sequence,
just
created,
with
newly
found
SNPs
annotated
on
the
consensus.
Note
the
table
on
the
lower
panel.
This
is
the
SNP
Detection
table
showing
over
5000
SNPs,
filtered
down
to
3
non-synonymous
SNPs
that
are
also
noted
in
dbSNP.