Professional Documents
Culture Documents
UIMLT
• Data Annotation
• Data annotation includes several activities, such as
labeling measurements, adding structure to data,
describing the collection environment, and
recording provenance.
• Data Annotation
This information enhances the use of scientific
data in collaborative environments and enables
data integration. Shared, controlled
vocabularies let scientists communicate how
and why data were collected to reduce data
misuse. In some cases the annotations supplant
the original observations to become a new
form of scientific data.
• Using Annotations
Annotated data serves several purposes such as
enhancing traditional information retrieval
approaches with shared knowledge of
concepts and relationships; tracking the
source an original use of scientific data to
facilitate proper interpretation and use by
third parties; creating a new, structured
representation of the data that scientists can
reason about.
• Data Preparation
Observations often require processing before
serving as scientific data. Even then, data may
require further preparation before analysis
such as When correctly applied, these steps
help ensure the reliability of scientific results.
normalizing the data to enable the comparison
of results across experiments; filtering the data
to enhance the signal; and estimating the
values of missing observations.
• Data Normalization and Filtering Normalization
counters systematic and uninformative variation in
measurement tools and measured entities.
• Normalization of microarray data combats incidental
variation across experimental settings.
• Normalizations may also transform data to fit a
normal distribution to support the use of statistical
analyses.
• Filters remove unreliable data and irrelevant noise by
scanning for outliers, smoothing trajectories, etc. .
• Handling Missing Data
Missing data can skew the distribution of a sample:
Imputation builds a (typically shallow) underlying model
of the available data that provides the missing values.
SPSS, SAS, and R include imputation routines.
substituting the mean is no longer encouraged; for series
data, interpolation fits a (localized) curve to the data set
and estimates the missing values from it; maximum
likelihood estimation and multiple imputation are the
most common approaches.
• Data Analysis
• Analysis tools can reveal the patterns and
relationships hidden within a scientific data set.
Abstract views of these relationships are gathered
through a combination of
• These analyses describe the key characteristics of
data sets, helping scientists form conjectures.
Informatics tools supporting these analyses include
Excel, SPSS, Minitab, and R. descriptive statistics,
correlation tables, and exploratory data analysis.
• Descriptive Statistics and Correlations
Descriptive statistics include quantitative
measures of Correlation tables identify linear
relationships between variables in a multivariate
data set. The correlation coefficient ranges
between -1.0 and 1.0 and provides heuristic
evidence for interesting interactions. central
tendency (e.g., mean, median), variability (e.g.,
range, standard deviation), and skewness (whether
a distribution leans to one direction). Example
distributions and their correlation coefficients..
• Exploratory Data Analysis
Exploratory data analysis includes a collection of
techniques designed to These techniques
complement statistical approaches to testing
hypotheses and providing quantitative summaries.
Informatics support for exploratory data analysis
includes: identify potential causal factors in a data
set; locate outliers for analysis or removal; and
produce other general intuitions about the data.
Data Desk, SOCR, and JMP.
• Data Annotation and Analysis:
Summary Data annotation assists primarily in information
retrieval, but it has potential for data and knowledge
integration.
However, we need rich informatics tools that use the well
established knowledge bases such as the Gene Ontology.
Software for data processing is becoming more common,
but different types of data have different needs.
General informatics tools that are readily specialized to
particular sciences could address this situation.
RECOMMENDED BOOKS
• 1. Baxevanis AD. Current Protocols in
Bioinformatics: Volumes 1-3; John Wiley & Sons:
2003.
• 2. Arthur M. Lesk, “Introduction to Bioinformatics”,
Oxford University Press.
• 3. Ignacimuthu SJ, “Basic Bioinformatics”, Narosa
Publishing House.
• 4. Yadav Neelam, “A Hand Book of
Bioinformatics”, Anmal Publications Pvt.Ltd.
• 5. Krawetz. Stephen A., “Introduction to
Bioinformatics: A Theoretical and Practical
Approach”, Humana Press
Thank You
ANY QUESTION???