You are on page 1of 12

DATA ANALYTICS – is the science of - BA enhances understanding of

analyzing raw data to pull out useful data.


insights so conclusions can be drawn. - BA is vital for businesses to
remain competitive.
- BA enables creation of
DATA ANALYST – individuals that informative reports.
perform the collecting, understanding
DESCRIPTIVE ANALYTICS
& creating report of conclusions from
the raw data available. - uses data to understand past
and present.
- Is like the person that pick-up
the Jigsaw puzzle pieces (raw PREDICTIVE ANALYTICS
data) & do the work to produce
- analyzes past performance.
a clear picture or message out
of the Jigsaw pieces. PRESCRIPTIVE ANALYTICS
- uses optimization techniques.

BUSINESS ANALYTICS
- Management of customer RETAIL MARKDOWN DECISIONS
relationships - most department stores clear
- Financial and marketing seasonal inventory by reducing
activities prices.
- Supply chain management - The question is “When to
- Human resource planning reduce the price and by how
- Pricing decisions much?
- Sport team game strategies - DESCRIPTIVE ANALYTICS:
examine historical data for
similar products (prices, units
IMPORTANCE OF BUSINESS
sold, advertising,…)
ANALYTICS
- PREDICTIVE ANALYTICS: predict
- There is strong relationship of sales based on price.
BA with - PRESCRIPTVE ANALYTICS: find
• Profitability of businesses the best sets of pricing and
• Revenue of businesses advertising to maximize sales
• Shareholder return. revenue.
SCOPE OF BUSINESS ANALYTICS - Variables can be defined into
types according to the level of
mathematical scaling that can
DATA – collected facts and figures. be carried out on the data.
DATA BASE – collection of There are four types of data or
computer files containing data. levels of measurement.
INFORMATION – comes from 1. Categorical (Nominal)
analyzing data. 2. Ordinal
METRICS – are used to quantify 3. Interval
performance. 4. Ratio

MEASURES – are numerical values CATEGORICAL (NOMINAL) DATA


of metrics. - Nominal or categorical data is
DISCRETE METRICS – involve data that comprises of
counting on time or not on time. categories that cannot be rank
ordered – each category is just
- number or proportion of on different.
time deliveries. - The categories available cannot
CONTINUOUS METRICS – are be placed in any order and no
measured on a continuum. judgement can be made about
the relative size or distance
• Delivery time from one category to another.
• Package time - Categories bear no quantitative
• Purchase price relationship to one another.
Examples: - customer’s location
LEVEL OF MEASUREMENT (America, Europe, Asia)

- When collecting or gathering - employee classification


data we collect data from (manager, supervisor,
individuals cases on particular associate)
variables. What does this mean? No
- A variable is a unit of data mathematical operations can be
collection whose values can performed on the data relative to
vary. each other.
Therefore, nominal data reflect category cannot be calculated
qualitative differences rather than but the categories can be
quantitative ones. ranked above or below each
other.
- No fixed units of
CATEGORICAL (NOMINAL) DATA measurement
Example: Examples: survey responses,
performance evaluation (poor,
average, good, very good, excellent)
- What does this mean? Can
make statistical
judgements and perform
limited maths.
INTERVAL AND RATIO DATA
2 subgroups of qualitative data: ■ Both interval and ratio data are
• Dichotomic - if it takes the examples of scale data.
form of a word with two ■ Scale data:
options (gender - male or
– data is in numeric
female)
format ($50, $100,
• Polynomic - if it takes the form $150)
of a word with more than two
– data that can be
options (education - primary
measured on a
school, secondary school, and
continuous scale.
college/university).
– the distance between
each can be observed
ORDINAL DATA and as a result measured
- Ordinal data is data that the data can be placed
comprises of categories that in rank order.
can be rank ordered.
- Similarly with nominal data
the distance between each
RATIO DATA growing exponentially with
time.
■ Ratio data measured on a
continuous scale and does
- a is a term for datasets that
have a natural zero point.
are so large or complex
■ Ratios are meaningful. that traditional data
■ Examples: processing applications are
inadequate for them.
– Monthly sales
– Delivery times
Examples:
– Weight
– Height
■ transaction processing
– Age systems,
METHODS OF PRESENTING DATA ■ customer databases,
- Textual ■ emails,
- Tabular
- Graphical ■ medical records,
■ internet clickstream logs,

BIG DATA ■ mobile apps,

- is a term that ■ social networks


describes large, hard-to-
manage volumes of data –
both structured and
unstructured – that
inundate businesses on a
day-to-day basis.

- is a term used to describe a


collection of data that is
huge in size and yet
TYPES OF BIG DATA
Structured Data - is the easiest to
work with. It is highly organized with
dimensions defined by set
parameters.
- has certain predefined
organizational properties
and is present in structured
or tabular schema, making
it easier to analyze and
sort.
- Examples of sources:
Spreadsheets like Excel,
Web logs, medical devices,
online forms
Semi-structured Data - refers to data
that is not captured or formatted in
conventional ways. Semi-structured
data does not follow the format of a
tabular data model or relational
databases because it does not have a
fixed schema.
- Examples of sources: Emails
by inbox, draft and sent,
tweets organized by
hashtags, images, and
videos with tags.
Unstructured Data 5. Value – having access to big
data is all well and good but
- is information that either
that’s only useful if we can turn
does not have a pre-
it into a value.
defined data model or is
not organized in a pre-
defined manner.
TYPES OF BIG DATA ANALYTICS

- is typically text-heavy, but


may contain data such as DESCRIPTIVE ANALYTICS - helps
dates, numbers, and facts answer questions about what
as well. happened, and these types
- Examples of sources: summarize large datasets to describe
Emails, Text files, social outcomes to concern party.
media, mobile and
communications data,
media. DIAGNOSTIC ANALYTICS - helps
answer questions about why things
happened, and it take the findings
CHARACTERISTICS OF BIG DATA from descriptive analytics and dig
(5V’s) deeper to find the cause.

1. Velocity – speed at which data PREDICTIVE ANALYTICS - helps


is emanating and changes are answer questions about what will
occurring between the diverse happen in the future and these types
data sets. use historical data to identify trends
2. Volume – this refers to the and determine if they are likely to
sheer volume of data being recur.
generated every second.
3. Variety – can use structured as
well as unstructured data.
4. Veracity – data reliability and
trust. Verifying and validating
the data.
DESCRIPTIVE ANALYTICS - helps PIE GRAPH - when it comes to
answer questions about what should statistical types of graphs and charts,
be done and by using insights from the pie chart (or the circle chart) has
predictive analytics, data-driven a crucial place and meaning. It
decisions can be made. displays data and statistics in an easy-
to-understand ‘pie-slice’ format and
illustrates numerical proportion. Each
TYPES OF CHARTS/GRAPHS pie slice is relative to the size of a
particular category in each group. To
say it in another way, the pie chart
LINE GRAPH - is commonly used to breaks down a group into smaller
display change over time as a series pieces. It shows part-whole
of data points connected by straight relationships.
line segments on two axes. The line
graph therefore helps to determine
the relationship between two sets of HISTOGRAM - shows continuous data
values, with one data set always in ordered rectangular columns (to
being dependent on the other set. understand what continuous data is
see our post discrete vs continuous
data). Usually, there are no
BAR GRAPH - or bar chart is a chart gaps between the columns. The
with rectangular bars with lengths histogram displays a frequency
proportional to the values that they distribution (shape) of a data set. At
represent. The bars can be plotted first glance, histograms look alike to
vertically or horizontally. Bar graphs bar graphs. However, there is a key
are good for plotting data that spans difference between them. Bar Chart
a length of time (for example, for represents categorical data and
comparing achievement between the histogram represent continuous data.
beginning and the end of the year) or
they can be used for comparing
different items in a related category
(for example, achievement results for
different classes
PICTOGRAPH - or a pictogram is one ■ TIME SERIES - Data set is
of the more visually appealing types composed of quantitative
of graphs and charts that display entries taken at regular
numerical information with the use of intervals over a period. e.g.,
icons or picture symbols to represent The amount of precipitation
data sets. They are very easy to read measured each day for one
statistical way of data visualization. A month.
pictogram shows the frequency of
data as images or symbols. Each
image/symbol may represent one or APPLICATIONS/TRENDS OF BIG DATA
more units of a given dataset. - Understanding and aiming
customers
- Understanding and
DOT PLOT - or dot plot is a statistical
improving business
chart consisting of data points plotted
practices
on a fairly simple scale, typically using
- Health care
filled in circles.
- Sports
- Science and research
- Optimizing machine and
PARETO CHART - is a type of chart
device performance
that contains both bars and a line
- Security and law
graph. It is a graph that indicates the
enforcement
frequency of defects, as well as their
- Smart cities
cumulative impact. Pareto Charts are
- Financial operation
useful to find the defects to prioritize
in order to observe the greatest
overall improvement.

STEM AND LEAF PLOTS - represents


data by separating each data value
into two parts: the stem (such as the
leftmost digit) and the leaf (such as
the rightmost digit)
MEASURES OF CENTRAL TENDENCY MEDIAN - divides the observations
into two equal parts.
A single value that is used to identify
the “center” of the data.
– it is thought of as a If the number observations are odd,
typical value of the the median is the middle number. If
distribution. the number of observations is even,
the median is the average of the 2
– precise yet simple
middle numbers.
– most representative
value of the data
MEAN - Most common measure of
the center. Also known as arithmetic
average.
MODE - occurs most frequently,
PROPERTIES OF THE MEAN nominal average and may or may not
– may not be an actual exist.
observation in the data set. PROPERTIES OF THE MODE
– can be applied in at least – can be used for qualitative as
interval level. well as quantitative data.
– easy to compute. – may not be unique.
– every observation contributes – not affected by extreme values
to the value of the mean.
– can be computed for
– subgroup means can be ungrouped and grouped data.
combined to come up with a
group mean.
– easily affected by extreme All three measures describe an
values. “average”. Choose the one that best
represents a “typical” value in the
set.
Mean Use the mode when:
-The most familiar average. - when the "typical" value is
desired.
-A reliable measure because it
- when the dataset is
considers every entry of a data set.
measured on a nominal
-May be greatly affected by outliers scale.
or skew.

MEASURES OF DISPERSION
Use the mean when:
- sampling stability is
RANGE - the difference between the
desired.
maximum and minimum value in a
- other measures are to be
data set, i.e.
computed.
R = MAX – MIN
Median
- The larger the value of the
- -A common average.
range, the more dispersed
- -Not as effected by skew or the observations are.
outliers. - It is quick and easy to
understand.
- A rough measure of
Use the median when: dispersion.
- the exact midpoint of the
distribution is desired.
INTERQUARTILE RANGE - the
- there are extreme difference between the third quartile
observations. and first quartile, i.e.
IQR = Q3 – Q1
Mode - Reduces the influence of
- may be used if there is an extreme values.
overwhelming repeat. - Not as easy to calculate as
the Range.
VARIANCE - important measure of SAMPLE SD
variation and it shows variation about
the mean. n

 i
( x − x ) 2

POPULATION VARIANCE s= i =1

N n −1
 i
( X −  ) 2

 = i =1
2 COEFFICIENT OF VARIATION
- measure of relative
N variation.
- usually expressed in
percent
SAMPLE VARIANCE - shows variation relative to
mean.
n

 i
- used to compare 2 or more
( x − x ) 2
groups.

s2 = i =1
 SD 
CV = 
n −1   100%
 Mean 
STANDARD DEVIATION - most
important measure of variation,
square root of variance and has the
same units as the original data.

POPULATION SD

 i
( X −  ) 2

 = i =1
N
FORMULAS: MEAN
ΣFM
Mean = 𝑵
RANGE
- R(range) = H (highest value MEDIAN
– L (lowest value).
𝑵
median =LL + i ( 𝟐 − <cf)
Example:
R = 50 - 9
R = 41 Example:

NUMBER OF CLASSES
c = 1 + 3.322 log N

Example:
C = 1 + 3.322 log 40
C = 6.32 OR 6 MODE

CLASS INTERVAL
𝑅
I=𝐶

Example:
41
I= I = 6.83 OR 7
6

You might also like