GenomeData[ ] - Import FASTA file

| Many projects require importing and then manipulating files as text or strings. A powerful set of functions for manipulating strings is available and is similar to those for working with other data objects.

Import the MT chromosome on the human genome from a FASTA format file.

HSGenome "ftp:

Import ftp.ncbi.nlm.nih.gov genomes H_sapiens CHR_MT hs_ref

_chrMT.fa.gz", "FASTA" ;
Import local copy

HSGenome
Count the number of characters in the file and then display the first 200.

StringCount HSGenome, LetterCharacter
16 571

StringTake HSGenome, 200
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGAT AGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTAT CGCACCTACGTTCAATATTACAGGCGAACATACCTACTA

2

Genome Pattern Matching.nb

Pattern Matching
|

Mathematica’s pattern matching features make it easy to search strings for any pattern of interest.

pattern

x_

x_

y_

y_

y_

x_

x_;

For example, this searches the full genome for all occurrences of the pattern.

StringCases HSGenome, pattern
CCAAACC CCAAACC CCAAACC GGTTTGG AACCCAA CCAAACC CCAAACC AAGGGAA TTAAATT AATTTAA AATTTAA

Here are the positions in the sequence at which the pattern occurs.

StringPosition HSGenome, pattern
298, 304 303, 309 304, 310 350, 356 554, 560 654, 660 915, 921 1686, 1692 1722, 1728 1791, 1797 2050, 2056

Color those subsequences in the range 2000 to 2500 that match the specified pattern.

ColorString HSGenome, pattern, 2000, 2500
TACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACT

TTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTA AATTTAACTGTTAGTCCAAAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAA AATTTAACACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACTACC
TAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACCCTATAGAAGAA CTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAGCCTGCGTCAGATCAAAACACTGAA CTGACAATTAACAGCCCAATATCTACAATCAACCAACAAGTCATTATTACCCTCACTGTC

AACCCAACACAGGCATGCTCATAA GGAAAGGTTAAAAAAAGTAAAAGGAACTCGGCAAACCTTACCCCGCCTGTT

Sign up to vote on this title
UsefulNot useful