You are on page 1of 14

UCSC Genome Browser: FAQ

http://genome.ucsc.edu/FAQ/FAQdownloads.html

Home - Genomes - Blat - Tables - Gene Sorter - PCR - VisiGene - Session - Help

Frequently Asked Questions: Data and Downloads
Downloading sequence and annotation data Extracting sequence in batch from an assembly Downloading data from the UCSC DAS server Downloading the UCSC Genome Browser source Download restrictions Opening .fa files Data differences between downloaded data and browser display Strange characters in FASTA file Selection of GenBank ESTs EST strand direction Missing RefSeq ID Finished vs. draft segments chrN_random tables Chromosome Un Chromosome M N characters at beginning of human chr22 Erroneous duplicated chrY_random region on Mouse Build 34 (mm6) Problems with Mouse Build 32 (mm4) Mapping chimp chromosome numbers to human chromsomes numbers Converting genome coordinates between assemblies Linking gene name with accession number Obtaining a list of Known Genes Repeat-masking data Availability of repeat-masked data RepeatMasker version differences - UCSC vs. Repeatmasker website Obtaining promoter sequence Data from Evolutionary Conservation Score tracks Minus strand coordinates - axtNet files Mapping UCSC STS marker IDS to those of other groups deCODE map data Direct MySQL access to data Name of fourth column in BED output Return to FAQ Table of Contents

Downloading sequence and annotation data

1 of 14

11/9/2012 12:10 PM

then select the "sequence" output format to retrieve data. Create a custom track of the genomic coordinates in BED format and upload into the Genome Browser.ucsc. 1.edu/goldenPath/hg17/chromosomes/. Use the Table browser to extract sequence. You'll find instructions for obtaining our source programs and utilities here. the sequence for human assembly hg17 can be found in ftp://hgdownload. This is a convenient way to obtain small amounts of sequence. use the Table Browser. For information on extracting a large set of sequences from an assembly. execute it without arguments. Sequence data for most assemblies is located in the assembly's "chromosomes" subdirectory on the downloads server. For example. Download the appropriate fasta files from our ftp server and extract sequence data using your own tools or the tools from our source tree. Select the custom track in the Table browser.ucsc. To obtain usage information about most programs. as well as other fa* programs. The download directories are automatically updated nightly to incorporate additions and modifications to the data. You can also download data from our Downloads page or our DAS server. To download a specific subset of the data or to configure the output format of the data. see Downloading data from the UCSC DAS server.UCSC Genome Browser: FAQ http://genome. 2. B.edu/FAQ/FAQdownloads. Downloading data from the UCSC DAS server Question: "How do I download data using the UCSC DAS server?" 2 of 14 11/9/2012 12:10 PM . see Extracting sequence in batch from an assembly. You can download sequence and annotation data using our FTP server. For more information on using the UCSC DAS server. Some programs that you may find useful are nibFrag and twoBitToFa. This is the recommended method when you have very large sequence datasets or will be extracting data frequently. Extracting sequence in batch from an assembly Question: "I have a lot of coordinates for an assembly and want to extract the corresponding sequences.html Question: "How do I obtain the sequence and/or annotation data for a release?" Response: Sequence and annotation data downloads are usually made available within the first week of the release of a new assembly. We recommend that you save the file locally as gzip.cse. What is the best way to proceed? Response: There are two ways to extract genomic sequence in batch from an assembly: A.

If you need to run batch Blat jobs. To construct a DAS query. but not from programs. For example.edu/cgi-bin/das/dsn.edu/FAQ/FAQdownloads.g. Download restrictions Question: "Do you have restrictions on the amount of downloads one can do?" Response: Generally. The entry point specifies chromosome position. The latest version of the source code may be downloaded here. e. and personal use (see Licensing the Genome Browser or Blat for commerical licensing requirements).000 hits per day. see http://genome. mm4. unless they are themselves front ends for interactive sites.fa files 3 of 14 11/9/2012 12:10 PM . Downloading the UCSC Genome Browser source Question: "Where can I download the Genome Browser source code and executables?" Response: The Genome Browser source code and executables are freely available for academic.edu/cgi-bin/das/hg16/features?segment=1:1.ucsc.ucsc. We can handle the traffic from all the clicks that biologists are likely to generate. See Downloading Blat source and documentation for information on Blat downloads. combine an assembly's base URL with the sequence entry point and type specifiers available for that assembly. and the type indicates the annotation table requested.edu/cgi-bin/das/[db_name]/types where [db_name] is the UCSC name for the assembly. To view a list of the assemblies available from the DAS server and their base URLs.type=refGene For more information on DAS.100000. hg16.ucsc. You can view the lists of entry points and types available for an assembly with requests of the form: http://genome.edu/cgi-bin/das/[db_name]/entry_points http://genome. nonprofit. here is a query that returns all the records in the refGene table for the chromosome position chr1:1-100000 on the hg16 assembly: http://genome. see the Biodas website and the DAS specification.ucsc. Opening .UCSC Genome Browser: FAQ http://genome. see Downloading Blat source and documentation for a copy of Blat you can run locally. we'd prefer that you not hit our interactive site with programs.html Response: The UCSC DAS server provides access to genome annotation data for all current assemblies featured in the Genome Browser. Program-driven use is limited to a maximum of one hit every 15 seconds and no more than 5.ucsc.

C. Shouldn't they be in synch?" Response: Yes. but the mRNA locations didn't match what was showing in the Genome Browser. G. Type the name of a gene in which you're interested into the position box (or use the default position). You may want to search for an A to get past them. including those used for ambiguity: -------------------------------------Symbol Meaning Nucleic Acid -------------------------------------A A Adenine C C Cytosine G G Guanine T T Thymine U U Uracil M A or C R A or G Purine W A or T S C or G Y C or T Pyrimidine K G or T 4 of 14 11/9/2012 12:10 PM . How can I open the *. Strange characters in FASTA file Question: "I noticed several characters other than A. The following chart (IUPAC-IUB Symbols for Nucleotide Nomenclature: Cornish-Bowden (1985).ucsc. etc. Unless you have a particular need to view or use the raw data files.edu/FAQ/FAQdownloads. G. Check that your downloaded tables are from the same assembly version as the one you are viewing in the Genome Browser. C.UCSC Genome Browser: FAQ http://genome. and N in my fasta file. If the assembly dates don't match. s. Select the Extended case/color options button at the bottom of the next page. Now you can color the DNA sequence to display which portions are repeats. the coordinates of the data within the tables may differ. etc. It's not uncommon to see these "wobble" codes at polymorphic positions in DNA sequences. k. then click the submit button. In the resulting Genome Browser display. Nucl. and N. genetic markers. The Genome Browser and Table Browser are both driven by the same underlying MySQL database.fa files?" Response: Microsoft Word or any program that can handle large text files will do. known genes. Data differences between downloaded data and browser display Question: "I downloaded the genome annotations from your MySQL database tables.html Question: "I am trying to look at the final decoding of the human genome. In a very rare instance. but there are several other valid characters that are used in clones to indicate ambiguity about the identity of certain bases in the sequence. T. Is the file corrupted or are these characters valid?" Response: The characters most commonly seen in sequence are A. 13:3021-3030) lists nucleotide symbols. you might find it more interesting to look at the data using the Genome Browser. you could also be affected by the brief lag time between the update of the live databases underlying the Genome Browser and the time it takes for text dumps of these databases to become available in the downloads directory. T. Some of the chromosomes begin with long blocks of N's. Acids Res. click the DNA link on the menu bar at the top of the page. for example y.

ucsc. an intron has been spliced out).strand in graphical display AA928010 (chr22:20. The maximum intron length allowed by Blat is 500. some selection is done on the full set at GenBank. the display defaults to a denser display mode to prevent the user's web browser from being overloaded. it is also a candidate for the Spliced EST track. You can restore the EST track display to a fuller display mode by zooming in on the chromosomal range or by using the EST track filter to restrict the number of tracks displayed. When two ESTs have identical sequences. For tracks such as Non[Organism] ESTs and Non[Organism] mRNAs. provided it meets various quality controls for intron and exon length and match quality. If an EST aligns non-contiguously (i. does this always mean that the transcript is generated on the minus strand? Are two corresponding ESTs that are assigned .and + always complementary? I want to confirm the strand assignment for two human ESTs: BQ016549 (chr22:22. When a single EST aligns in multiple places.edu/FAQ/FAQdownloads.html V H D B X N A or A or A or C or G or A G or A C or C or G or G or or T or T G T T T or C or C Selection of GenBank ESTs Question: "I am interested in ESTs.310.332.e. Start and stop coordinates of each alignment block are available from the appropriate table within the Table Browser. which may eliminate some ESTs with very long introns that might otherwise align. the alignment having the highest base identity is found. For more information on the selection criteria specific to each organism. How do you select which ones from GenBank to display in the Genome Browser?" Response: All ESTs in GenBank on the date of the track data freeze for the given organism are used .354.UCSC Genome Browser: FAQ http://genome. If more than 250 tracks exist for the selected region. ESTs are aligned against the genome using the Blat program.528 on hg18): . Only alignments that have a base identity level within a selected percentage of the best are kept.000 bases.none are discarded.264-20.674-22.345. Note that only 250 EST tracks can be viewed at a time within the browser.strand in text and + strand in graphical 5 of 14 11/9/2012 12:10 PM . it is not included in the track. EST strand direction Question: "Could you help me with my interpretation of EST data? If the EST is taken from the minus (-) strand. consult the description page accompanying the EST track for that organism. both are retained because this can be significant corroboration of a splice site. If a sequence is too divergent from the organism's genome to generate a significant Blat hit.143 on hg18): + strand in text and . Alignments must also have a minimum base identity to be kept.

which we record as + or . Missing RefSeq ID 6 of 14 11/9/2012 12:10 PM . then the arrows point in the opposite direction. it can be seen that the strand to which an EST aligns is not necessarily reflected in the direction of transcription shown by the arrows in the display. e. If no introns exist or all of the introns are non-canonical.ucsc. we do some calculations to try to determine the correct direction of transcription for the EST sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. then the EST appears in the display with the arrows pointing in the same direction as the EST alignment. Therefore.in the strand field of the corresponding database table. a canonical intron would look like this: NNNNexonNNNNgtnnnnintronnnnnnnnagNNNNexon Here.edu/FAQ/FAQdownloads. Determining the direction of transcription for ESTs is not an easy task so we do some calculations to make the best guess for the transcription direction.strand (forward or reverse direction) of the genome. Therefore. If an EST alignment produces canonical introns (with gt-ag splice-site pairs). the two nucleotides on either end of the intron show the canonical gt-ag splice site pairs. The graphical display goes with the orientation of the gene in that location. When UCSC downloads mRNAs and ESTs from GenBank and aligns them to a genome assembly using Blat. In both BQ016549 and AA928010 (in the example above). ESTs are sequenced from either the 5' or the 3' end." Response: From the examples above. The alignment details pages and the Table Browser do not take the intron orientation into account. When sequenced from the 5' end. this is used to determine the transcription direction. For example when an EST is aligned to the genome. then intronOrientation is set to zero. Therefore.UCSC Genome Browser: FAQ http://genome. The calculation is: gt/ag introns minus ct/ac introns = intronOrientation The sign of this calculated intronOrientation field (stored in the estOrientInfo table) shows the orientation of the transcript relative to the EST. To find transcription direction. It is not always possible to determine if this has been done.html display. if intronOrientation is positive. each EST aligns to the + or .g. With a 3' end read. we use a method that relies on finding gt-ag canonical pairs in one direction more often than in the opposite direction. the resulting sequence is the same as that of the mRNA which it represents. it is the reverse complement of the actual mRNA sequence. A problem occurs if the EST contributor reverse-complements the 3'-read sequence before depositing it into GenBank. the arrows on the Genome Browser display point in the opposite direction to that indicated by the alignment on the EST details page. Note: A low intronOrientation number can cause an incorrect assignment of transcription direction when calculated in this way. the resulting sequence matches the opposite strand of the cDNA clone. If the alignment is used to retrieve DNA sequence from the genome. all_ests or chrN_est. If intronOrientation is negative. The strand information (+/-) therefore indicates the direction of the match between the EST and the matching genomic sequence. the intronOrientation is negative. therefore. the DNA sequence will look similar to the GenBank sequence (not its complement). with the idea that people will want the mRNA (transcriptiondirection) sequence. They show only the alignment of the GenBank sequence (as given) to the genome.

The quality of the last 500 bases on either end of a contig tends to be lower than the rest of the contig. Finished vs. How do you determine the accuracy? The base-calling program Phred analyzes the traces from the sequencing machines and assigns a quality score to these. Because the primary reference sequence can only display a single haplotype. where we have included two alternative versions of the MHC region in chr6_random. In subsequent assemblies. draft segments Question: "Do chrN.UCSC Genome Browser: FAQ http://genome.fa tables contain both finished and draft segments? If so. Chromosome Un Question: "What is ChrUn?" 7 of 14 11/9/2012 12:10 PM . Starting with the April 2003 human assembly. There are a few clones in other chromosomes that also correspond to a different haplotype. these regions have been moved into separate files (e. You can check the submission date and status of an accession on the NCBI Entrez Nucleotide site. in addition to the unordered sequence. Because this sequence is not quite finished. chrN_random tables Question: "What are the chrN_random_[table] files in the human assembly? Why are they called random? Is there something biologically random about the sequence in these tables or are they just not placed within their given chromosomes?" Response: In the past. chr6_hla_hap1). it could not be included in the main "finished" ordered and oriented section of the chromosome. Use the corresponding chrN_gold table to look them up. these tables also include data for sequence that is not in a finished state. but whose location in the chromosome is known. the larger the contig it is in. these tables contain both finished and draft segments. This is present primarily in chr6. in a very few cases in the April 2003 assembly. but could not be reliably ordered within the current sequence.ucsc. these tables contained data related to sequence that is known to be in a particular chromosome. which gives quality scores for the bases on the assembly as well.edu/FAQ/FAQdownloads. the random files contain data related to sequence for alternative haplotypes.html Question: "Why isn't my refseq ID in your database?" Response: It may have been added after we last downloaded data from Genbank. these alternatives were included in random files. The quality of the draft varies. These quality scores are used by the Phrap assembly program. Also.g. the better the quality. or it may have been replaced or removed. how do you determine which segments are finished?" Response: Yes. In general.

The coordinates of these are fairly arbitrary.2 AC145571. although the relative positions of the coordinates are good within a contig. we were not able to remove it from mm6 prior to the browser's release.html Response: ChrUn contains clone contigs that can't be confidently placed on a specific chromosome. chrY_random erroneously contains a region duplicated from chrY. MmY_78990_34 and NT_078925. I've found duplicate contigs that are placed on both chrY and chrY_random. 8 of 14 11/9/2012 12:10 PM . Because NCBI discovered this assembly problem after the UCSC Genome Browser was processed.3 AC145392. Chromosome M Question: "What is chromosome M (chrM)?" Response: Mitochondrial DNA.311.5 AC134433. You can find more information about the data organization and format on the Data Organization and Format page.4 The fragments are assembled into the contig NT_111995 for chrY_random and also appear (under different names) as regions on contigs MmY_110865_34.2 AC148319. we essentially just concatenate together all the contigs into short pseudo-chromosomes. the unzipped file contains only N's. Is this intentional?" Response: On the mm6 assembly.521 and chrY_random:29.ucsc. The duplicated section occupies chrY:1-696. Search for an A to bypass the initial group of N's.053-30.615.3 AC145393.UCSC Genome Browser: FAQ http://genome." Response: There is a large block of N's at the beginning and end of chr22.edu/FAQ/FAQdownloads. N characters at beginning of human chr22 Question: "When I download human chr22 from your web site. For the chrN_random and chrUn_random files.573 (the end of the chromosome) and includes the following repeated fragments: AC139318. Erroneous duplicated chrY_random region on Mouse Build 34 (mm6) Question: "On the mm6 assembly.

McConkey in 2004.edu/FAQ/FAQdownloads.html Problems with Mouse Build 32 (mm4) Question: "I have heard that the Build 32 mouse assembly isn't as good as the Build 30 assembly. As a result.UCSC Genome Browser: FAQ http://genome. chromosomes 2 and 23 (present in the panTro1 assembly) do not exist in later versions. Human Chr Chimp Chr (panTro1) Chimp Chr (panTro2) 1 2 (part) 2 (part) 3 4 5 6 7 8 9 10 11 12 13 14 1 12 13 2 3 4 5 6 7 11 8 9 10 14 15 1 2a 2b 3 4 5 6 7 8 9 10 11 12 13 14 9 of 14 11/9/2012 12:10 PM . the new numbering convention was subsequently endorsed by the International Chimpanzee Sequencing and Analysis Consortium. Mapping chimp chromosome numbers to human chromsomes numbers Question: How do the chimp and human chromosome numbering schemes compare? Response: The following table shows the mapping of chromosomes in the chimp draft assemblies to human chromosomes. This standard assigns the identifiers "2a" and "2b" to the two chimp chromosomes that fused in the human genome to form chromosome 2 and renumbers the other chromosomes to more closely match their human counterparts.H. the numbering scheme has been changed to reflect a new standard that preserves orthology with human chromosomes.ucsc. You can read more information about the problems Ensembl identified and review a list of the chromosomes and genes most likely to be affected by these issues on the Ensembl Mus musculus web page. Ensembl has conducted an analysis of the assembly and has attributed the problems to incorrect mapping information that led to the generation of artificial duplications and some incorrect flips in orientation. there appear to be some problems with the Build 32 assembly. Initially proposed by E. Can you clarify?" Response: Unfortunately. Starting with the panTro2 assembly.

and cross-species conversions. you may find it useful to try the command-line version of the LiftOver tool. which is accessed from the menu on the Genome Browser annotation tracks page. Pre-generated files are available for selected assemblies from the Downloads page. accessed via the Utilities link on the Genome Browser home page. Is there an easy way to locate my area of interest on the new assembly?" Response: You can migrate data from one assembly to another by using the blat alignment tool or by converting assembly coordinates. also supports forward. as well as batch conversions. If the desired file is not available. Obtaining a list of Known Genes 10 of 14 11/9/2012 12:10 PM .edu/FAQ/FAQdownloads.chain file as input. but does not accept batch input. The executable file for this utility can be downloaded here. send a request to the genome mailing list and we may be able to provide you with one. If you wish to update a large number of coordinates to a different assembly and have access to a Linux platform.UCSC Genome Browser: FAQ http://genome.ucsc. LiftOver requires a UCSC-generated over. For the Known Genes. use the kgAlias table. and cross-species conversions.html 15 16 17 18 19 20 21 22 X Y 16 18 19 17 20 21 22 23 X Y 15 16 17 18 19 20 21 22 X Y Converting genome coordinates between assemblies Question: "I've been researching a specific area of the human genome on the current assembly. The LiftOver tool. reverse. the refFlat table contains both the gene name (usually a HUGO Gene Nomenclature Committee ID) and its accession number. There are two conversion tools available on the Genome Browser web site: the Convert utility and the LiftOver tool. Is there a table that shows both pieces of information?" Response: If you are looking at the RefSeq Genes. Linking gene name with accession number Question: "I have the accession number for a gene and would like to link it to the gene name. and now you've just released a new version. reverse. The Convert utility. supports forward.

then click the link for the knownGene. masking out repeats of period 12 or less. we also use -m. Repeat-masking data Question: "What version of RepeatMasker do you use on your data? Which flags do you use?" Response: UCSC uses the latest versions of RepeatMasker and repeat libraries available on the date when the assembly data is processed. For mouse repeats. Data for a specific region or chromosome may be obtained from the Table Browser by selecting the "Genes and Gene Prediction Tracks" group.html Question: "How can I obtain a complete list of all the genes in the UCSC Known Genes table for a particular organism? Response: To obtain a complete copy of the entire Known Genes data set for an organism. Availability of repeat-masked data Question: "Are the repeat annotation files available for every chromosome?" Response: Yes. RepeatMasker website Question: "When I run RepeatMasker independently from the RepeatMasker web server.UCSC vs. jump to the section specific to the organism.edu/FAQ/FAQdownloads. my results vary from those of UCSC. click the Annotation database link in that section. we use the Tandem Repeat Finder (trf) program. 11 of 14 11/9/2012 12:10 PM . Set the position to the region of interest. RepeatMasker version information can usually be found in the README text for the assembly's bigZips downloads directory.UCSC Genome Browser: FAQ http://genome. In addition to RepeatMasker.txt. What's the cause?" Response: UCSC occasionally uses updated versions of the RepeatMasker software and repeat libraries that are not yet available on the RepeatMasker website (see Repeat-masking data for more information). Masking is done using the RepeatMasker -s flag.ucsc. open the Genome Browser Downloads page. The repeats are just "soft" masked. the "Known Genes" track and the "knownGene" table. then click the "get output" button. The RepeatMasker annotation tables are named chrN_rmsk (where N represents the chromosome number) and the Tandem Repeat Finder (TRF) tables are named simpleRepeat.gz table. but not initiate in them. you can obtain the repeat-masked files via the Table Browser or from the organism's annotation database downloads directory. Alignments are allowed to extend through repeats. RepeatMasker version differences .

edu/FAQ/FAQdownloads. you can click the DNA link in the top menu bar of the Genome Browser tracks window to access options for displaying the sequence. Minus strand coordinates . Alternatively. the tables are named using specific release numbers. Click Get Sequence when you've finished configuring the output. along with several other options. You can obtain these from the bigZips downloads directory for the assembly of interest. Paste the gene name or accession number in the identifier field.html Obtaining promoter sequence Question: "How can I fetch promoter sequence upstream of a gene?" Response: The UCSC Genome Browser offers several ways to obtain this information. To convert axt minus strand coordinates to Genome Browser coordinates." Response: Is this alignment on the minus strand? Minus strand coordinates in axt files are handled differently from how they are handled in the Genome Browser. The tables within a given set differ by the number of bases/score interval and are used to generate the browser displays at different zooming levels. and or chrN_zoom2500_humMusL. On the next page. On the final page. Click the entry for the gene in the RefSeq or Known Genes track. The Genome Browser downloads site provides prepackaged downloads of 1000 bp. In later releases. use: 12 of 14 11/9/2012 12:10 PM . Open the Genome Browser window to display the gene in which you're interested. assembly. In earlier assemblies. then click the get output button. then click the Genomic Sequence link. such as chrN_hg16Mm3.ucsc. The Stanford Human Promoters track on the UCSC Custom Annotation Tracks page shows promoters for some of the human assemblies. use the Table Browser. chrN_zoom1_humMusL. select genomic. Data from Evolutionary Conservation Score tracks Question: "Where can I download the conservation score data from the Human/Mouse Evolutionary Conservation Score track?" Response: The conservation score data are stored in a group of tables in the annotation database downloads directory. The naming conventions of the tables vary among releases. you will have the opportunity to configure the amount of upstream promoter sequence to fetch. Choose sequence for the output format type. and 5000 bp upstream sequence for RefSeq genes that have a coding portion and annotated 5' and 3' UTRs. 2000 bp. depending on your requirements. You can also use the Genome Browser to obtain sequence for a specific gene.UCSC Genome Browser: FAQ http://genome. To fetch the upstream sequence for a specific gene.axtNet Question: "I downloaded the axtNet alignments between the latest human and mouse assemblies. Enter the genome. I found that some of the alignments listed in the axtNet files do not agree with what is shown in the browser. and select the knownGene table. table names are of the form chrN_humMusL.

To determine the location of a specific marker. For example. what does the 'Name' column (fourth BED column) refer to?" Response: The fourth column of the BED output contains a lot of information separated by underscores.UCSC Genome Browser: FAQ http://genome.html start = chromSize + 1 . See our documentaion on Downloading Data using MySQL. The stsMap table contains the physical position of all STS markers. but we also track the UniSTS IDs for each marker in the downloadable stsInfo2 table. Direct MySQL access to data Question: "Is it possible to run SQL queries directly on the database rather than using the Table Browser interface?" Response: Yes. primer sequence information. including the deCODE map.axtStart See an explanation of coordinate transforms in the genomeWiki. Name of fourth column in BED output Question: "When using the Table Browser to extract exons from a Gene track. This table is related to the first table by an ID (the identNo field in both files). including those on the deCODE map. For example: 13 of 14 11/9/2012 12:10 PM . etc.axtEnd end = chromSize + 1 . contains additional information about each marker. stsInfo2. deCODE map data Question: "Where can I get more information about the deCODE map?" Response: You can obtain this information from the combination of a couple of tables. look up the marker's name in the stsAlias table to determine the UCSC ID assigned to the marker.ucsc. D10S249 has UCSC ID 2880 and is located at chr10:240791-241019.edu/FAQ/FAQdownloads. This file also contains information about the position on the genome-wide maps. A second file. Mapping UCSC STS marker IDs to those of other groups Question: "How do I map the STS genetic marker IDs in the genome browser to the IDs assigned by other groups? " Response: We assign our own IDs to each of the STS markers. and then use this ID to look it up in the stsMap table where the marker is located. including aliases.

but this start position listed in this section of the 4th column is actually 1 based. there will be a row for each sequence type (cds or intron) and this identifies which is represented in this row. Sequence Type: exons. So. they would start and end 10 bases before/after the exon.edu/FAQ/FAQdownloads. 14 of 14 11/9/2012 12:10 PM . Chromosome: chromosome number the item is on. So. It will be the exact coordinate the feature starts on as displayed in the browser. Sequence Type Number: for every transcript. utr5. etc.html uc009vjk. you will see a row for each one and in this position they will be numbered 0-9. Position of First Base of Item: if you have specified bases added to the requested features (for example. Exons plus 10 bases on each end). if you requested exons. then columns 2 and 3 of the output wouldn't be the exact coordinates of the exon.ucsc.2_cds_1_0_chr1_324343_f This information is represented as follows: ucscId_sequenceType_sequenceTypeNumber_basesAdded_chromosome_positionOfFirstBaseOfItem_strand UCSC ID: our identification for the transcripts in the UCSC Genes track.UCSC Genome Browser: FAQ http://genome. introns. It is "as displayed in the browser" because the coordinates in our tables almost always have 0-based starts (as they do in columns 2 and 3 of this output) but display as 1-based in the browser (for more info see this FAQ). Bases Added: number of bases added to the regions requested. this part of the information is an easy way to see where the actual feature starts as displayed in the browser. Strand: forward(f) or reverse(-) strand. and a particular transcript has 10 exons. cds. the first is denoted with 0.