Professional Documents
Culture Documents
Analytics
Data-Information
characteristics of data
data munging
Module – Scraping
1
Syllabus Sampling
Cleaning
importance of data
analytics
Data Data analysis is a process of
Analysis obtaining raw data and converting
it into information useful for
decision-making by users.
Lots of data is being collected and warehoused
We have?
Big Data is any data that is expensive to manage
and hard to extract value from
Volume
The size of the data
Velocity
Big Data Velocity refers to the speed with which data is
generated. High velocity data is generated with
such a pace that it requires distinct (distributed)
processing techniques. An example of
a data that is generated with
high velocity would be Twitter messages or
Facebook posts
1. Accuracy and
Precision
2. Legitimacy and Validity
Characteris
tics of 3. Reliability and Consistency
data 4. Timeliness and Relevance
5. Completeness and
Comprehensiveness
6. Availability and Accessibility
7. Granularity and Uniqueness
Accuracy and Precision: This
characteristic refers to the exactness of the
data.
Characteris
tics of Ex: Records at the wrong level of precision
data (i.e. prices that were originally quoted at
three decimal places, but cut-off and stored
at two decimal places)
Legitimacy and Validity: Requirements
governing data set the boundaries of this
characteristic.
Characteris Ex: On surveys, items such as gender, ethnicity, and
nationality are typically limited to a set of options and
tics of open answers are not permitted. Any answers other
data than these would not be considered valid or legitimate
based on the survey’s requirement.
Reliability and Consistency: Regardless of
what source collected the data or where it
resides, it cannot contradict a value residing
in a different source or collected by a
Characteris different system.
tics of Ex:
data • Telephone numbers with commas vs. hyphens
• U.S. vs. European date formats
Timeliness and Relevance: Data collected
too soon or too late could misrepresent a
situation and drive inaccurate decisions.
Characteris Ex:
tics of • An issuance or corporate action not
data delivered when it was announced
• A credit rating change not updated on the
day it was issued
Completeness and Comprehensiveness:
Incomplete data as dangerous as
is inaccurate data.
Ex: Missing data
Characteris
tics of Availability and Accessibility: This
data presumes that the data exists and is
available for access to be granted.
Ex: This characteristic can be tricky at times due to legal and
regulatory constraints. Regardless of the challenge, though,
individuals need the right level of access to the data in order to
perform their jobs. This presumes that the data exists and is
available for access to be granted.
• Granularity and Uniqueness: The level of detail at
which data is collected is important, because
confusion and inaccurate decisions can otherwise
occur.
• Aggregated, summarized and manipulated
collections of data could offer a different meaning
than the data implied at a lower level.
Characteris • An appropriate level of granularity must be defined
to provide sufficient uniqueness and distinctive
tics of properties to become visible.
data • This is a requirement for operations to function
effectively.
Ex:
• Two instances of the same security with different identifiers or
spellings
• A preferred share represented as both an equity and debt
object in the same database
• Data science plays a role in virtually all aspects
of our day-to-day lives and is used across nearly
all industries.