You are on page 1of 5

BIOINFORMATICS

20CS3277RA
CO-2

Introduction Biological Database


Session-8

Search Engine Technology

AIM OF THE SESSION

• The session aims and objectives are to get knowledge of Search Engine Technology

in the biological database

LEARNING OUTCOMES

• To understand the molecular biology databases accessible through the Internet


• To understand the security concepts in the biological lab

Search Engine

"The Search Process" section of this chapter introduces many of the challenges and concepts involved
in a typical search of molecular biology databases accessible through the Internet, based on the Entrez
integrated searching environment. "Search Engine Technology" explores the various technologies that
researchers can use to differentiate required data from noise, from portals and intelligent agents to
natural-language processing (NLP) and other user interface tools.

Entrez (NCBI)

Entrez is a retrieval system for searching several linked databases. It provides access to PubMed ,
Genbank, Structure, Genome, Sequence Retrieval tools

and OMIM

2. SRS (EBI and DDBJ)


SRS is a data retrieval system that integrates heterogeneous databanks in molecular biology and
genome analysis. There are currently several dozen servers worldwide that provide access to over
300 different databanks via the World Wide Web.
Sequence Retrieval tools

ENTREZ from NCBI and SRS (Sequence Retrieval System) from EBI.

Sequence Submission tools

Sequin and BankIt from NCBI and WebIn from EBI

• Today, most of the potential links between data in digital form aren't readily available because
the relevant data, when they exist, are in disparate databases.
• Each database is typically based on different and incompatible database technologies and
uses different languages and vocabularies to access data.

• These incompatibilities are especially significant when non-textual data, such as 3D images of
protein structures, accessed by author-specified keywords, need to be linked with nucleotide
sequences in other databases.

Although static links between databases can be established programmatically, a more common
approach is to create links dynamically by using search engines.

Search Process

• A molecular biology problem with bioinformatics methods invariably involves significant


backtracking, stepping, and jumping around from one database to the next

• In support of this typical work process, integrated information-retrieval systems have been
created to provide a mesh of "hard" or pre-computed links between the key online
molecular biology databases.

• By far, the most popular of these integrated systems is the National Center for
Biotechnology Information's Entrez, which includes many of the key molecular biology
databases
• The major search features of the Entrez system include a variety of tools to define and refine
a database search.

• These tools support selecting a database, linking, imposing limits on searches, using indexes
and the search history in searches, and saving results to a clipboard.

• In addition, the tools support searching by a variety of topics, searching within a specified
range, truncating searches, using Boolean operators to narrow searches, and advanced
search authoring capabilities to supplement menu-driven search commands.
• Entrez-Enabled Search Process. Entrez hides the underlying complexity of online molecular
biology databases, facilitating the iterative process of submitting search criteria, viewing
results, and refining or narrowing the search until the desired results are achieved

• The major search features of the Entrez system include a variety of tools to define and refine
a database search (see Figure 4-4). These tools support selecting a database, linking,
imposing limits on searches, using indexes and the search history in searches, and saving
results to a clipboard. In addition, the tools support searching by a variety of topics,
searching within a specified range, truncating searches, using Boolean operators to narrow
searches, and advanced search authoring capabilities to supplement menu-driven search
commands.

Search in the Entrez system

• The first step in the process of initiating a search in the Entrez system is to define, through
the use of a pull-down menu system, which database to search.

• Entrez supports searching by subject, subject phrase, author, unique identifier, and, where
applicable, molecular weight. Search topics are defined by keying terms into a free-text
query box.

• A search can also be specified by a unique identifier, which can be an accession number for
the complete sequence record in a database or a sequence number assigned by NCBI.
• The format for the accession number depends on the database.

For example, the format of an accession number in GenBank is one letter followed by five digits,
compared to a series of six or seven digits followed by a letter for the PRF database.

Entrez also supports a search based on molecular weight, including a range of weights, based on
calculations of protein structures. This search capability applies only to the Entrez Protein database

SUMMARY
1. Understand the importance of the biological database
2. Primary database
3. Secondary database
TERMINAL QUESTIONS
1. List out different data formats in biological data
2. Compare microarray data vs clinical data.
3. Summarize the microarray laboratory network in bioinformatics.
Reference Books:
1. Bioinformatics Computing, Bryan Bergeron, PHI, 2003.
2. Introduction to BioInformatics, Attwood, Smith, Longman, 1999.
3. Bio-Informatics, D Srinivasa Rao, Biotech.
Sites and Web links:
1. https://onlinecourses.nptel.ac.in/noc21_bt06/preview
2. https://www.shomusbiology.com/bioinformatics.html

You might also like