You are on page 1of 9

UCSC Genome Browser

The UCSC Genome Browser is an online and downloadable


genome browser hosted by the University of California, Santa
The UCSC Genome
Cruz (UCSC).[2][3][4] It is an interactive website offering access to Browser
genome sequence data from a variety of vertebrate and invertebrate
species and major model organisms, integrated with a large
collection of aligned annotations. The Browser is a graphical Content
viewer optimized to support fast interactive performance and is an
Description The UCSC
open-source, web-based tool suite built on top of a MySQL
database for rapid visualization, examination, and querying of the Genome
data at many levels. The Genome Browser Database, browsing Browser
tools, downloadable data files, and documentation can all be found Contact
on the UCSC Genome Bioinformatics website.
Research center University of
California Santa
History Cruz
Laboratory Center for
Initially built and still managed by Jim Kent, then a graduate
Biomolecular
student, and David Haussler, professor of Computer Science (now
Biomolecular Engineering) at the University of California, Santa Science and
Cruz in 2000, the UCSC Genome Browser began as a resource for Engineering,
the distribution of the initial fruits of the Human Genome Project. Baskin School of
Funded by the Howard Hughes Medical Institute and the National Engineering
Human Genome Research Institute, NHGRI (one of the US Primary citation Navarro
National Institutes of Health), the browser offered a graphical Gonzalez & al.
display of the first full-chromosome draft assembly of human
(2021)[1]
genome sequence. Today the browser is used by geneticists,
molecular biologists and physicians as well as students and Access
teachers of evolution for access to genomic information.[5] Website genome.ucsc
.edu (http://geno
Genomes me.ucsc.edu)

In the years since its inception, the UCSC Browser has expanded to accommodate genome sequences of all
vertebrate species and selected invertebrates for which high-coverage genomic sequences is available,[6]
now including 108 species. High coverage is necessary to allow overlap to guide the construction of larger
contiguous regions. Genomic sequences with less coverage are included in multiple-alignment tracks on
some browsers, but the fragmented nature of these assemblies does not make them suitable for building full
featured browsers. (more below on multiple-alignment tracks). The species hosted with full-featured
genome browsers are shown in the table.[7]
Species
baboon, bonobo, chimpanzee, gibbon, gorilla,
great apes
human, orangutan

bushbaby, golden snub-nosed monkey, green


non-ape monkey, marmoset, mouse lemur, proboscis
primates monkey, rhesus macaque, squirrel monkey, tarsier,
tree shrew

alpaca, armadillo, bison, brown kiwi, cat, Chinese


hamster, Chinese pangolin, cow, dog, dolphin,
elephant, ferret, guinea pig, hawaiian monk seal,
hedgehog, horse, kangaroo rat, little brown bat,
non-primate
Malayan flying lemur, manatee, megabat, Minke
mammals
whale, mouse, naked mole-rat, opossum, panda,
pig, pika, platypus, rabbit, rat, rock hyrax, sheep,
shrew, sloth, squirrel, Tasmanian devil, tenrec,
wallaby, white rhinoceros
African clawed frog, American alligator, Atlantic cod,
budgerigar, chicken, coelacanth, elephant shark,
non-mammal Fugu, garter snake, goldean eagle, lamprey, lizard,
chordates medaka, medium ground finch, Nile tilapia, painted
turtle, stickleback, Tetraodon, Nanorana parkeri,
turkey, Xenopus tropicalis, zebra finch, zebrafish

Anopheles gambiae, Apis mellifera, Caenorhabditis


spp (5), California sea hare, Ciona intestinalis,
invertebrates
Drosophila spp. (11), Lancelet, Pristionchus UCSC Genomes
pacificus, sea squirt, sea urchin, yeast

viruses Ebolavirus, SARS-CoV-2 coronavirus

Apart from these 108 species and their assemblies, the UCSC Genome Browser also offers Assembly Hubs
(http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Assembly) , web-accessible directories of
genomic data that can be viewed on the browser and include assemblies that are not hosted natively on it.
There, users can load and annotate unique assemblies for which UCSC does not provide an annotation
database. A full list of species and their assemblies can be viewed in the GenArk Portal (https://hgdownloa
d.soe.ucsc.edu/hubs/), including 2,589 assemblies hosted by both UCSC Genome Browser database and
Assembly Hubs. An example can be seen in the Vertebrate Genomes Project (http://genome.ucsc.edu/golde
nPath/newsarch.html#082819) assembly hub.

Browser functionality
The large amount of data about biological systems that is accumulating in the literature makes it necessary
to collect and digest information using the tools of bioinformatics. The UCSC Genome Browser presents a
diverse collection of annotation datasets (known as "tracks" and presented graphically), including mRNA
alignments, mappings of DNA repeat elements, gene predictions, gene-expression data, disease-association
data (representing the relationships of genes to diseases), and mappings of commercially available gene
chips (e.g., Illumina and Agilent). The basic paradigm of display is to show the genome sequence in the
horizontal dimension, and show graphical representations of the locations of the mRNAs, gene predictions,
etc. Blocks of color along the coordinate axis show the locations of the alignments of the various data
types. The ability to show this large variety of data types on a single coordinate axis makes the browser a
handy tool for the vertical integration of the data.[8]

To find a specific gene or genomic region, the user may type in the gene name, a DNA sequence, an
accession number for an RNA, the name of a genomic cytological band (e.g., 20p13 for band 13 on the
short arm of chr20) or a chromosomal position (chr17:38,450,000-38,531,000 for the region around the
gene BRCA1).

Presenting the data in the graphical format allows the browser to present link access to detailed information
about any of the annotations. The gene details page of the UCSC Genes track provides a large number of
links to more specific information about the gene at many other data resources, such as Online Mendelian
Inheritance in Man (OMIM) and SwissProt.

Designed for the presentation of complex and voluminous data, the UCSC Browser is optimized for speed.
By pre-aligning millions of RNA secuences from GenBank to each of the 244 genome assemblies (many of
the 108 species have more than one assembly), the browser allows instant access to the alignments of any
RNA to any of the hosted species.

The juxtaposition of the many types of data allow researchers to


display exactly the combination of data that will answer specific
questions. A pdf/postscript output functionality allows export of a
camera-ready image for publication in academic journals.

One unique and useful feature that distinguishes the UCSC


Multiple gene products of FOXP2 Browser from other genome browsers is the continuously variable
gene (top) and evolutionary nature of the display. Sequence of any size can be displayed, from a
conservation shown in multiple single DNA base up to the entire chromosome (human chr1 = 245
alignment (bottom) million bases, Mb) with full annotation tracks. Researchers can
display a single gene, a single exon, or an entire chromosome band,
showing dozens or hundreds of genes and any combination of the
many annotations. A convenient drag-and-zoom feature allows the user to choose any region in the genome
image and expand it to occupy the full screen.

Researchers may also use the browser to display their own data via the Custom Tracks tool. This feature
allows users to upload a file of their own data and view the data in the context of the reference genome
assembly. Users may also use the data hosted by UCSC, creating subsets of the data of their choosing with
the Table Browser tool (such as only the SNPs that change the amino acid sequence of a protein) and
display this specific subset of the data in the browser as a Custom Track.

Any browser view created by a user, including those containing Custom Tracks, may be shared with other
users via the Saved Sessions tool.

Tracks

Below the displayed images of the UCSC Genome browser are eleven categories of additional tracks that
can be selected and displayed alongside the original data. Researchers can select tracks which best
represent their query to allow for more applicable data to be displayed depending on the type and depth of
research being done. These categories are as follows:
Categories
Category Description Examples of tracks

Base Position (http


s://genome.ucsc.ed
u/cgi-bin/hgTrackUi?
hgsid=666495619_a
App0CcomQnejeEK
It allows control over the 1SN02ImxjD1f&c=c
style of sequencing hr1&g=ruler).
displayed (e.g., genomic Mappability (https://
coordinates, sequences, genome.ucsc.edu/c
Mapping gaps etc.). It can also gi-bin/hgTrackUi?hg
and display a percentage sid=1422312387_K
Sequencing based track to show a S4ZKP11pQnDbDF
researcher if a particular GBuvvgG4eaP7S&d
genetic element is more b=hg38&c=chrX&g= UCSC Genome Browser Tracks for
prevalent in the mappability), Gap (h
specified area. ttps://genome.ucsc. Categories: Mapping and Sequencing,
edu/cgi-bin/hgTrack Genes and Gene Predictions, Phenotype
Ui?hgsid=66649561 and Literature, COVID-19, Single- Cell
9_aApp0CcomQneje RNA-Seq, mRNA and EST.
EK1SN02ImxjD1f&c
=chr1&g=gap)

GENCODE v24 (htt


ps://genome.ucsc.e
du/cgi-bin/hgTrackU
i?hgsid=666495619_
It offers programs to aApp0CcomQnejeE
predict genes and which K1SN02ImxjD1f&c=
databases to display chr1&g=knownGen
known genes from. The e), Geneid Genes (h
different tracks allow the ttps://genome.ucsc.
user to display gene edu/cgi-bin/hgTrack UCSC Genome Browser Tracks for
Genes and
models, protein coding Ui?hgsid=66649561 Categories: Regulation, Comparative
Gene
regions, non-coding 9_aApp0CcomQneje Genomics, Variation, Repeats
Predictions
RNA etc. Users can EK1SN02ImxjD1f&c
quickly compare their =chr1&g=geneid),
query with pre-selected Pfam in UCSC
sets of genes to look for Gene (https://genom
correlations between e.ucsc.edu/cgi-bin/h
known sets of genes. gTrackUi?hgsid=666
495619_aApp0Cco
mQnejeEK1SN02Im
xjD1f&c=chr1&g=uc
scGenePfam)
Databases containing
OMIM Alleles (http
specific styles of
s://genome.ucsc.ed
phenotype data. These
u/cgi-bin/hgTrackUi?
tracks are intended for
hgsid=666495619_a
use primarily by
App0CcomQnejeEK
physicians and other
1SN02ImxjD1f&c=c
professionals concerned
Phenotype hr1&g=omimAvSn
with genetic disorders
and p), Cancer Gene
(e.g., genetics
Literature Expr Super-track (ht
researchers, students in
tps://genome.ucsc.e
science and medicine).
du/cgi-bin/hgTrackU
Users can display a
i?hgsid=666495619_
track that shows the
aApp0CcomQnejeE
genomic positions of
K1SN02ImxjD1f&c=
natural and artificial
chr1&g=cancerExpr)
amino acid variants.
COVID GWAS v3 (https://genome.uc
sc.edu/cgi-bin/hgTrackUi?hgsid=1420
062887_itufADVn6oR7kgVaivqQsiUW
lHKg&db=hg38&c=chrX&g=covidHgi
Gwas), COVID GWAS v4 (https://gen
It shows data from Genome-Wide Association Studies
ome.ucsc.edu/cgi-bin/hgTrackUi?hgsi
(GWAS) and variant calling experiments to identify
COVID-19 d=1420062887_itufADVn6oR7kgVaiv
genetic variants associated with severity and
qQsiUWlHKg&db=hg38&c=chrX&g=c
susceptibility to COVID-19 disease.
ovidHgiGwasR4Pval), Rare Harmful
Vars (https://genome.ucsc.edu/cgi-bi
n/hgTrackUi?hgsid=1420062887_ituf
ADVn6oR7kgVaivqQsiUWlHKg&db=h
g38&c=chrX&g=covidMuts)

Blood (PBMC) (https://genome.ucsc.


edu/cgi-bin/hgTrackUi?hgsid=142006
2887_itufADVn6oR7kgVaivqQsiUWlH
Kg&db=hg38&c=chr1&g=bloodHao),
Heart Cell Atlas (https://genome.ucs
It offers RNA expression data at single cell level
c.edu/cgi-bin/hgTrackUi?hgsid=14200
Single Cell (scRNA-Seq) from different human tissues (e.g.,
62887_itufADVn6oR7kgVaivqQsiUWl
RNA-Seq kidney, colon, heart, muscle, placenta, peripheral blood
HKg&db=hg38&c=chr1&g=heartCellA
mononuclear cells etc.)
tlas), Colon Wang (https://genome.uc
sc.edu/cgi-bin/hgTrackUi?hgsid=1420
062887_itufADVn6oR7kgVaivqQsiUW
lHKg&db=hg38&c=chr1&g=colonWan
g)

mRNA and It shows Expressed Sequence Tags (ESTs) and Human ESTs (https://genome.ucsc.e
EST messenger RNA. ESTs are single-read sequences, du/cgi-bin/hgTrackUi?hgsid=6664956
typically about 500 bases in length, that usually 19_aApp0CcomQnejeEK1SN02ImxjD
represent fragments of transcribed genes. The mRNA 1f&c=chr1&g=est), Other ESTs (http
tracks allow the display of mRNA alignment data in s://genome.ucsc.edu/cgi-bin/hgTrack
Humans, as well as, other species. There are also Ui?hgsid=666495619_aApp0CcomQn
tracks allowing comparison with regions of ESTs that ejeEK1SN02ImxjD1f&c=chr1&g=xen
show signs of splicing when aligned with the genome. oEst), Other mRNAs (https://genom
e.ucsc.edu/cgi-bin/hgTrackUi?hgsid=
666495619_aApp0CcomQnejeEK1SN
02ImxjD1f&c=chr1&g=xenoMrna)
GTEx Gene (https://genome.ucsc.ed
It offers genetic data and related gene expression in u/cgi-bin/hgGtexTrackSettings?hgsid
tissue areas. This allows users to discover if a =666495619_aApp0CcomQnejeEK1S
particular gene or sequence is linked with various N02ImxjD1f&c=chr1&g=gtexGene),
Expression
tissues throughout the body. The expression tracks Affy U133 (https://genome.ucsc.edu/
also allow for displays of consensus data about the cgi-bin/hgTrackUi?hgsid=666495619_
tissues that express the query region. aApp0CcomQnejeEK1SN02ImxjD1f&
c=chr1&g=affyU133)

ENCODE Regulation Super-track


Information relevant to regulation of transcription from Settings (https://genome.ucsc.edu/cg
different studies. Users can adjust the regulation tracks i-bin/hgTrackUi?hgsid=666495619_a
to add a display graph to the genome browser. These App0CcomQnejeEK1SN02ImxjD1f&c
Regulation displays allow for more detail about regulatory regions, =chr1&g=wgEncodeReg), ORegAnno
transcription factor binding sites, RNA binding sites, (https://genome.ucsc.edu/cgi-bin/hgT
regulatory variants, haplotypes, and other regulatory rackUi?hgsid=666495619_aApp0Cco
elements. mQnejeEK1SN02ImxjD1f&c=chr1&g
=oreganno)

Comparative It shows sequences conservation data, including Conservation (https://genome.ucsc.e


Genomics primates, vertebrates, mammals among others. The du/cgi-bin/hgTrackUi?hgsid=6664956
comparative alignments give a graphical view of the 19_aApp0CcomQnejeEK1SN02ImxjD
evolutionary relationships among species. This makes 1f&c=chr1&g=cons100way), Cactus
it a useful tool both for the researcher, who can 241-way (https://genome.ucsc.edu/cg
visualize regions of conservation among a group of i-bin/hgTrackUi?hgsid=1422312387_
species and make predictions about functional KS4ZKP11pQnDbDFGBuvvgG4eaP7
elements in unknown DNA regions, and in the S&db=hg38&c=chrX&g=cons241wa
classroom as a tool to illustrate one of the most y), Cons 30 Primates (https://genom
compelling arguments for the evolution of species. The e.ucsc.edu/cgi-bin/hgTrackUi?hgsid=
Conservation track on the human assembly clearly 666495619_aApp0CcomQnejeEK1SN
shows that the farther one goes back in evolutionary 02ImxjD1f&c=chr1&g=cons30way)
time (this track includes 100 species), the less
sequence homology remains, but functionally important
regions of the genome (e.g., exons and control
elements, but not introns typically) are conserved much
farther back in evolutionary time.
Common SNPs(150) (https://genome.
ucsc.edu/cgi-bin/hgTrackUi?hgsid=66
It compares the searched sequence with known 6495619_aApp0CcomQnejeEK1SN02
variations. For example, the entire contents of each ImxjD1f&c=chr1&g=snp150Commo
release of the dbSNP database from NCBI are mapped n), All SNPs(146) (https://genome.uc
to human, mouse and other genomes. This includes the sc.edu/cgi-bin/hgTrackUi?hgsid=6664
Variation
fruits of the 1000 Genomes Project, as soon as they 95619_aApp0CcomQnejeEK1SN02I
are released in dbSNP. Other types of variation data mxjD1f&c=chr1&g=snp146), Flagged
include copy-number variation data (CNV) and human SNPs(144) (https://genome.ucsc.ed
population allele frequencies from the HapMap project. u/cgi-bin/hgTrackUi?hgsid=66649561
9_aApp0CcomQnejeEK1SN02ImxjD1
f&c=chr1&g=snp144Flagged)

RepeatMasker (https://genome.ucsc.
edu/cgi-bin/hgTrackUi?hgsid=666495
619_aApp0CcomQnejeEK1SN02Imxj
D1f&c=chr1&g=rmsk), Microsatellite
Allows tracking of different kinds of repeated
(https://genome.ucsc.edu/cgi-bin/hgT
sequences in the query. Users can quickly see if their
rackUi?hgsid=666495619_aApp0Cco
Repeats specified search contains large amounts of repeated
mQnejeEK1SN02ImxjD1f&c=chr1&g
sequences at a glance and adjust their search or track
=microsat), WM + SDust (https://gen
displays accordingly.
ome.ucsc.edu/cgi-bin/hgTrackUi?hgsi
d=666495619_aApp0CcomQnejeEK1
SN02ImxjD1f&c=chr1&g=windowmas
kerSdust)

Analysis tools
The UCSC site hosts a set of genome analysis tools, including a full-featured GUI interface for mining the
information in the browser database, a FASTA format sequence alignment tool BLAT[9] that is also useful
for simply finding sequences in the massive sequence (human genome = 3.23 billion bases [Gb]) of any of
the featured genomes.
A liftOver tool uses whole-genome alignments to allow conversion of sequences from one assembly to
another or between species. The Genome Graphs tool allows users to view all chromosomes at once and
display the results of genome-wide association studies (GWAS). The Gene Sorter displays genes grouped
by parameters not linked to genome location, such as expression pattern in tissues.

Open source / mirrors


The UCSC Browser code base is open-source for non-commercial use, and is mirrored locally by many
research groups, allowing private display of data in the context of the public data. The UCSC Browser is
mirrored at several locations worldwide, as shown in the table.

Official mirror sites


European mirror (https://genome-euro.ucsc.edu/) — maintained by UCSC at University of Bielefeld, Germany

Asian mirror (https://genome-asia.ucsc.edu/) — maintained by UCSC at RIKEN, Yokohama, Japan

The Browser code is also used in separate installations by the UCSC Malaria Genome Browser and the
Archaea Browser.

See also
Ensembl
ENCODE
List of biological databases

References
1. Navarro Gonzalez, J; Zweig, AS; Speir, ML; Schmelter, D; Rosenbloom, KR; Raney, BJ;
Powell, CC; Nassar, LR; Maulding, ND; Lee, CM; Lee, BT; Hinrichs, AS; Fyfe, AC;
Fernandes, JD; Diekhans, M; Clawson, H; Casper, J; Benet-Pagès, A; Barber, GP; Haussler,
D; Kuhn, RM; Haeussler, M; Kent, WJ (8 January 2021). "The UCSC Genome Browser
database: 2021 update" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7779060). Nucleic
Acids Research. 49 (D1): D1046–D1057. doi:10.1093/nar/gkaa1070 (https://doi.org/10.109
3%2Fnar%2Fgkaa1070). ISSN 0305-1048 (https://www.worldcat.org/issn/0305-1048).
PMC 7779060 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7779060). PMID 33221922
(https://pubmed.ncbi.nlm.nih.gov/33221922).
2. Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP,
Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J,
Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom
KR, Smith KE, Haussler D, Kent WJ (Jan 2011). "The UCSC Genome Browser database:
update 2011" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242726). Nucleic Acids Res.
39 (Database issue): D876-82. doi:10.1093/nar/gkq963 (https://doi.org/10.1093%2Fnar%2F
gkq963). PMC 3242726 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242726).
PMID 20959295 (https://pubmed.ncbi.nlm.nih.gov/20959295).
3. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (June
2002). "The human genome browser at UCSC" (http://genome.cshlp.org/content/12/6/996.ab
stract). Genome Res. 12 (6): 996–1006. doi:10.1101/gr.229102 (https://doi.org/10.1101%2Fg
r.229102). PMC 186604 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC186604).
PMID 12045153 (https://pubmed.ncbi.nlm.nih.gov/12045153).
4. Kuhn, R. M.; Karolchik, D.; Zweig, A. S.; Wang, T.; Smith, K. E.; Rosenbloom, K. R.; Rhead,
B.; Raney, B. J.; Pohl, A.; Pheasant, M.; Meyer, L. (2009-01-01). "The UCSC Genome
Browser Database: update 2009" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2686463).
Nucleic Acids Research. 37 (Database): D755–D761. doi:10.1093/nar/gkn875 (https://doi.or
g/10.1093%2Fnar%2Fgkn875). ISSN 0305-1048 (https://www.worldcat.org/issn/0305-1048).
PMC 2686463 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2686463). PMID 18996895
(https://pubmed.ncbi.nlm.nih.gov/18996895).
5. "History | Genomics Institute" (https://genomics.ucsc.edu/about/history/). genomics.ucsc.edu.
Retrieved 2022-08-07.
6. "High-coverage" here means 6x coverage, or six times more total sequence than the size of
the genome.
7. "UCSC Genome Browser: Acknowledgments" (https://genome.ucsc.edu/goldenPath/credits.
html). genome.ucsc.edu. Retrieved 2022-07-27.
8. Navarro Gonzalez, Jairo; Zweig, Ann S.; Speir, Matthew L.; Schmelter, Daniel; Rosenbloom,
Kate R.; Raney, Brian J.; Powell, Conner C.; Nassar, Luis R.; Maulding, Nathan D.; Lee,
Christopher M.; Lee, Brian T. (2021-01-08). "The UCSC Genome Browser database: 2021
update" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7779060). Nucleic Acids Research.
49 (D1): D1046–D1057. doi:10.1093/nar/gkaa1070 (https://doi.org/10.1093%2Fnar%2Fgkaa
1070). ISSN 1362-4962 (https://www.worldcat.org/issn/1362-4962). PMC 7779060 (https://w
ww.ncbi.nlm.nih.gov/pmc/articles/PMC7779060). PMID 33221922 (https://pubmed.ncbi.nlm.
nih.gov/33221922).
9. Kent, WJ. (Apr 2002). "BLAT - the BLAST-like alignment tool" (http://genome.cshlp.org/conte
nt/12/4/656.abstract). Genome Res. 12 (4): 656–64. doi:10.1101/gr.229202 (https://doi.org/1
0.1101%2Fgr.229202). PMC 187518 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC18751
8). PMID 11932250 (https://pubmed.ncbi.nlm.nih.gov/11932250).

External links
Official website (http://genome.ucsc.edu)
On-line Training/Tutorials & User's Guides (http://genome.ucsc.edu/training/index.html)
UCSC Genome tutorials (https://www.youtube.com/channel/UCQnUJepyNOw0p8s2otX4RY
Q/videos) (videos of YouTube)

Retrieved from "https://en.wikipedia.org/w/index.php?title=UCSC_Genome_Browser&oldid=1163511539"

You might also like