You are on page 1of 10

COMPUTATIONAL BIOLOGY, BIG DATA, AND DATA BASE

 The difference between computational biology and bioinformatics

Computational Biology : Is more concerned with the big picture of what’s going on
biologicaly ; Mechanistic understanding of part or whole of biological process

Bioinformatics : Refers to the study of large sets of biodata, biological statistics, and
results of scientific studies ; Data analysis of biological data

 The application of computational biology

What do you know about Darwin’s Finches?

Galapagos islands
 Darwin’s theory: Finches on the Galapagos island

The favorable adaptations of Darwin’s Finches beaks were selected for over generations
until they all branched out to make new species  Family Tree

 What does every organism have?

- Are the sequences among organisms similar?


- How to compare the long sequences to find similarities among some species?
 Do you remember about cell, DNA, RNA, gene, genome, and protein?

DNA : long molecule that contains our unique genetic code, two polynucleotide chains
RNA : polymeric molecule essential in carious biological roles in coding, decoding,
regulation and expression of genes, single stranded
Genome is simply the sum total
of an organism’s DNA

Meanwhile

A gene is a functional sequence


of DNA
 Human Genome Project (HJP) major requirements

- DNA is the design of an organism


- HGP was the international, collaborative research program whose goal was the
complete mapping and understanding of all the genes of human beings
- The HGP was publicly funded project initiated in 1990
- The project was used whole genome – shotgun sequencing method

 How many base


pair and gene?

3.2 billion base


pair ; 15,000-
20,000 gene

 Potential
benefits :

- Improved
diagnosis of
disease
- Earlier detection
of genetic
predispositions to disease
- Rational drug design
- Gene therapy and control systems for drugs
- Pharmacogenomics “custom drugs”
- Study migration of different population groups based on female genetic inheritance
 Big Data

- What is big data?

Big data refers to massive complex structured and unstructured data sets that are
rapidly generated and transmitted from a wide of sources, huge in volume, yet
growing exponentially with time

- Where are they used commonly?

They used commonly for transaction processing systems, customer databases,


documents, emails, medical records, internet clickstream logs, mobile apps, and
social networking

Examples : Instagram, starbucks, spotify, Netflix, mcdonalds

The Big Data grows exponentially


from time to time
 What to expect and challenges of Big Data in Biology

- What to expect

o Opportunities to make better decisions


o Anomaly detection
o Identifying the root causes of failures and issues in real time

- Challenges

o What is a good dataset and how can reliable knowledge be extracted from big
data
o The availability of big data has the potential to transform many areas of the life
sciences and user in new ways of doing research
o Biologist and data scientist
 Data Base in Biology

A collection of data that is structured, searchables, updated periodically and cross-


referenced, libraries of biological sciences, collected from scientific experiments,
published literature, high-troughout experiment technology, and computational analysis

 How can they explode?


- The 1st database
Was developed in US (1960s) algorithm for sequences analysis was also created

- Appereance another database


Germany was developed new database in Europe (1970s; European Molecular
Biology Laboratory)

- DNA sequencing method


In 1970s, the sequence method was discovered

- Bioinformatics and internet


In 1980s, bioinformatics were introduced
 Features biological database
- Data heterogenity
Availability of diverse and complex data types; examples : sequence and graph

- High volume data


In addition being highly heterogeneous, biological data are voluminious to support
comprehensive investigations in various field and directions

- Uncertainity and dynamics


Biological data have great deal of uncerteainity as they represent biological
phenomena that are observed and assumed

- Data integration and sharing


Data is collected from laboratories through a database and made available for use,
biologival data is share via databases

 Classification of biological database


- Primary
Contain information for sequence or structure only; examples :

- Secondary
Contain information derived from primary database conserved sequences, active site
residues, and signature sequences; examples :

- Composite
Contain a variety of primary databases; examples :

You might also like