This dictionary describes a number of terms, concepts, tools, and databases in Bioinformatics. Descriptions range from a few sentences to a few paragraphs each. (Neither URLs, nor references are included. The reader can easily find more information about a term by querying a search engine on the web with the term.) The terms described are specific to bioinformatics; molecular biology or statistical concepts not known to the author to be used significantly in bioinformatics (to date) are excluded.
Arun Jagota teaches a number of bioinformatics courses in the national award winning certificate program at the University of California, Santa Crux Extension. He is a PhD in Computer Science from the State University of New York at Buffalo in 1993. He has been a visiting faculty member in the Department of Mathematical Sciences at the University of Memphis, in the Department of Computer Science at the University of North Texas, an affiliated faculty member in the Department of Computer Science at the University of California, Santa Cruz, and an adjunct faculty member in the Department of Mathematical Sciences at San Jose State University. He is presently a part-time instructor and research scientist at the University of California, Santa Cruz, and an adjunct faculty member in the Department of Applied Mathematics as well as at the Department of Computer Engineering at Santa Clara University. Arun Jagota has taught more than thirty different courses in Computer Science, some in Discrete Mathematics, and a few in Probability and Statistics.
These are search tools at the ExPasy server that use the composition of an amino acid sequence, rather than the sequence itself, to find sequences in a database (SWISS-PROT) with similar composition. (The composition of a query is searched against a database of the compositions of sequences in SWISS-PROT.)
AACompIdent inputs the composition of the query and is useful when the query sequence is not known. In fact, it may be used to identify the sequence from its composition. AACompSim inputs a query sequence and computes its composition internally.
This is the data model that underlies the set of integrated databases at NCBI. This model is similar to object-oriented data structures and databases, and offers rich support for nested and inter- related data structures. The richness of this model facilitates the seamless and well-navigable access to these databases that the NCBI web browser Entrez provides. Contrast this model to the flat-file format of Genbank, which has its deficiencies.
program. It performs, one-by-one, a heuristic, local alignment of the query sequence to each sequence in the specified database. (Heuristic means non-optimal. Local means as opposed to global.) BLAST is much faster than dynamic programming, while remaining effective. (Dynamic programming computes optimal local alignments.) BLAST returns a list of hits ranked by E-value. The lower the E-value of a hit, the more significant it is.
This is the version of BLAST that searches a protein sequence against a protein database. It supports searches using the BLOSUM or the PAM family of substitution matrices. BLOSUM62 is the most popular matrix used in BLAST searches.
Leave a Comment