Descriptive Statistics: Instructor: Maira Sami

Business Analytics
Descriptive Statistics
Instructor : Maira Sami
Modified By : Sophia Ajaz
1
Objectives
• Overview Of Using Data: Definitions And Goals
• Types of data
• Some Definitions
• Types of measurement
• Modifying Data in Excel
• Creating Distributions from Data
• Data Presentation
2
Overview of Using Data: Definitions and Goals
• Data
• Variable
• Observation
• Variation
• Random variables
3
• Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.
• A characteristic or a quantity of interest that can take on
different values is known as a variable.
• An observation is a set of values corresponding to a set of
variables.
4
variation is the difference in a variable measured over observations
(time, customers, items, etc.).
The role of descriptive analytics is to collect and analyse data to gain a

better understanding of variation and its impact on the business
setting.
The values of some variables are under direct control of the decision
maker (these are often called decision variables).
The values of other variables may fluctuate with uncertainty because

of factors outside the direct control of the decision maker. In general,
a quantity whose values are not known with certainty is called a
random variable, or uncertain variable.
5
6
Types of data
• Population and Sample Data
• Quantitative and Categorical Data
• Cross-Sectional and Time Series Data
7
Population and Sample Data
Data can be categorized in several ways based on how they are

collected and the type collected.
• In many cases, it is not feasible to collect data from the
population of all elements of interest. In such instances, we
collect data from a subset of the population known as a
sample.
8
What is a Statistic????
Sample
Sample
Sample
Population
Sample
Parameter: value that describes a population
Statistic: a value that describes a sample
9
Sample vs. Population
Population Sample
11
Quantitative and Categorical Data
Data are considered quantitative data if numeric and arithmetic

operations included, such as
• addition,
• subtraction,
• multiplication,
• and division.
For instance, we can sum the values for Volume in the Dow data
in Table 2.1 to calculate a total volume of all shares traded by
companies included in the Dow.
12
• If arithmetic operations cannot be performed on the data,
they are considered categorical data.
• For instance, the data in the Industry column in Table 2.1 are
categorical.
• We can count the number of observations or compute the
proportions of observations in each category.
13
Cross-Sectional and Time Series Data
• Cross-sectional data are collected from several entities at the
same, or approximately the same, point in time.
The data in Table 2.1 are cross-sectional because they describe

the 30 companies that comprise the Dow at the same point in
time (July 2015).
• Time series data are collected over several time periods.
Graphs of time series data are frequently found in business and

economic publications.
14
15
Some Definitions
• Variable - any characteristic of an individual or entity. A variable can take
different values for different individuals. Variables can be categorical or
quantitative. Per S. S. Stevens…
Types
•Nominal - Categorical variables with no inherent order or ranking sequence such as

names or classes (e.g., gender, eye colour and hair colour). Value may be a numerical,
but without numerical value (e.g., I, II, III). The only operation that can be applied to
Nominal variables is enumeration.
• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe. Can
be compared for equality, or greater or less, but not how much greater or less.
Some Definitions
•Interval - Values of the variable are ordered as in Ordinal, and additionally,
differences between values are meaningful, however, the scale is not absolutely
anchored. Calendar dates and temperatures on the Fahrenheit scale are examples.
Addition and subtraction, but not multiplication and division are meaningful
operations. The difference in temperature between 10-20 degrees is the same as the
difference in temperature between 20-30 degrees.
• Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary zero
point, e.g. age, weight, height, temperature (Kelvin). Addition, subtraction,
multiplication, and division are all meaningful operations.
Types of measurement
• When collecting or gathering data we collect data from individuals
cases on particular variables.
• A variable is a unit of data collection whose value can vary.
• Variables can be defined into types according to the level of
mathematical scaling that can be carried out on the data.
• There are four types of data or levels of measurement:
1. Categorical (Nominal) 2. Ordinal
3. Interval 4. Ratio
18
Categorical (Nominal) data
• What does this mean? No mathematical operations can be
performed on the data relative to each other.
• Therefore, nominal data reflect qualitative differences rather
than quantitative ones.
• Nominal measurements only permit you to determine
whether two individuals are the same or different.
19
Nominal data
Examples:
What is your gender? (please Did you enjoy the film?

tick) (please tick)
Male Yes
Female No
20
Ordinal data
• Ordinal data is data that comprises of categories that can be
rank ordered.
• Similarly with nominal data the distance between each category
cannot be calculated but the categories can be ranked above or
below each other.
• No fixed units of measurement.
• Examples:
• college football rankings
• survey responses (poor, average, good, very good, excellent)
• What does this mean? Can make statistical judgements and
perform limited maths.
21
Ordinal data
22
Interval and ratio data
• Both interval and ratio data are examples of scale data.

• Scale data:
• data is in numeric format ($50, $100, $150).
• data that can be measured on a continuous scale.
• the distance between each can be observed and as a result
measured.
• the data can be placed in rank order.
23
Interval data
• Ordinal data but with constant differences between
observations.
• Examples:
• Time – moves along a continuous measure or seconds,
minutes and so on and is without a zero point of time.
• Temperature – moves along a continuous measure of
degrees and is without a true zero.
• SAT scores
24
Ratios
• Ratio data measured on a continuous scale and does have a
natural zero point.
• Ratios are meaningful.
• Examples:
– Monthly sales
– Delivery times
– Weight
– Height
– Age
25
Data for Business Analytics
Classifying Data Elements in a Purchasing Database
Figure 1.2
Data for Business Analytics
Classifying Data Elements in a Purchasing Database
Modifying Data in Excel
Sorting Data in Excel
• Step 1. Select cells A1:F21
• Step 2. Click the Data tab in the Ribbon
• Step 3. Click Sort in the Sort & Filter group
• Step 4. Select the check box for My data has headers
• Step 5. In the first Sort by dropdown menu, select Sales
(March 2010)
• Step 6. In the Order dropdown menu, select Largest to
Smallest (see Figure 2.4)
• Step 7. Click OK
28
• Ref book pg 24
29
Filtering
• Step 1. Select cells A1:F21
• Step 3. Click Filter in the Sort & Filter group
• Step 4. Click on the Filter Arrow in column B, next to
Manufacturer
• Step 5. If all choices are checked, you can easily deselect all
choices by unchecking
• (Select All). Then select only the check box for Toyota.
• Step 6. Click OK
30
Creating Distributions from Data
• Distributions help summarize many characteristics of a data
set by describing how often certain values for a variable
appear in that data set.
• Distributions can be created for both categorical and
quantitative data, and they assist the analyst in gauging
variation.
31
Frequency Distributions for Categorical Data
• A frequency distribution is a summary of data that shows the
number (frequency) of observations in each of several non
overlapping classes, typically referred to as bins.
32
Frequency Distribution
Consider a data set of 26 children of ages 1-6 years. Then the frequency
distribution of variable ‘age’ can be tabulated as follows:
Frequency Distribution of Age
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:
Age Group 1-2 3-4 5-6
Frequency 8 12 6
33
Example: 1
A survey was taken in Maple Avenue. In each of 20 homes, people were

asked how many cars were registered to their households. The results were
recorded as follows:
3, 1, 4, 0, 2, 1, 5, 2, 1, 5, 4, 2, 3, 2, 0, 2, 1, 0, 3, 2.
Present this data in Frequency Distribution Table.
Also find maximum number of cars registered by household.
Example# 02
35
36
Solution ?
• Discussed in class
37
Relative Frequency and Percent Frequency
Distributions
• A relative frequency distribution is a tabular summary of data

showing the relative frequency for each bin.
• A percent frequency distribution summarizes the percent

frequency of the data for each bin.
38
Relative Frequency and Percent Frequency
Distributions
for Coca-Cola is 19/50 = 0.38,

for Diet- Coke is 8/50 = 0.16, and so on.
39
Example 3
40
Frequency Distributions for Quantitative Data
• Consider the quantitative data in Table 2.6
41
• These data show the time in days required to complete year-
end audits for a sample of 20 clients of Sanderson and
Clifford, a small public accounting firm. The three steps
necessary to define the classes for a frequency distribution
with quantitative data are as follows:
1. Determine the number of non overlapping bins.

2. Determine the width of each bin.
3. Determine the bin limits.
42
• Number of Bins: Bins are formed by specifying the ranges
used to group the data.
• Width of the Bins: choose a width for the bins.
•
bin width of (33 -12)/5 = 4.2 Approx. is 5
43
• Bin Limits: Bin limits must be chosen so that each data item
belongs to one and only one class.
lower and upper bin limits to obtain a total of five classes:

10–14,
15–19,
20–24,
25–29,
30–34.
44
Example 4
45
46
• Step 1. Select cells B10:B14
• Step 2. Type the formula 5FREQUENCY(A2:D6, A10:A14). The
range A2:D6
• defines the data set, and the range A10:A14 defines the bins.
• Step 3. Press CTRL+SHIFT1+ENTER after typing the formula in
Step 2.
47
48
Data Presentation
Two types of statistical presentation of data - graphical and numerical.
Graphical Presentation: We look for the overall pattern and for striking deviations
from that pattern. Over all pattern usually described by shape, center, and spread
of the data. An individual value that falls outside the overall pattern is called an
outlier.
Bar diagram and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for numerical variable.
Histograms
• Step 2. Click Data Analysis in the Analyze group
• Step 3. When the Data Analysis dialog box opens, choose
Histogram from the list of
• Analysis Tools, and click OK
• In the Input Range: box, enter A2:D6
• In the Bin Range: box, enter A10:A14
• Under Output Options:, select New Worksheet Ply:
• Select the check box for Chart Output (see Figure 2.13)
• Click OK
50
A common graphical presentation of
quantitative data is a histogram
51
Data Presentation –Categorical
Variable
Bar Diagram: Lists the categories and presents the percent or count of individuals who fall
in each category.
Figure 1: Bar Chart of Subjects in Treatment Frequency Proportion Percent

Treatm ent Groups Group (%)
1 15 (15/60)=0.25 25.0
30
Number of Subjects
25 2 25 (25/60)=0.333 41.7
20
15
3 20 (20/60)=0.417 33.3
10 Total 60 1.00 100
5
0
1 2 3
Treatm ent Group
Data Presentation –Categorical
Variable
Pie Chart: Lists the categories and presents the percent or count of individuals who fall in
each category.
Figure 2: Pie Chart of Treatment Frequency Proportion Percent

Subjects in Treatment Groups Group (%)
1 15 (15/60)=0.25 25.0
25% 2 25 (25/60)=0.333 41.7
33% 1
2 3 20 (20/60)=0.417 33.3
3 Total 60 1.00 100

42%
Graphical Presentation –Numerical Variable
Histogram: Overall pattern can be described by its shape, center, and spread. The
following age distribution is right skewed. The center lies between 80 to 100. No
outliers.
Mean 90.41666667
Figure 3: Age Distribution
Standard Error 3.902649518
16 Median 84
14 Mode 84
Number of Subjects
12 Standard Deviation 30.22979318

10
Sample Variance 913.8403955
8
Kurtosis -1.183899591
6
4 Skewness 0.389872725
2 Range 95
0 Minimum 48
40 60 80 100 120 140 More
Maximum 143
Age in Month
Sum 5425
Count 60
Thank You !

Descriptive Statistics: Instructor: Maira Sami

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive Statistics: Instructor: Maira Sami

Uploaded by

Copyright:

Available Formats

Business Analytics

The role of descriptive analytics is to collect and analyse data to gain a

The values of other variables may fluctuate with uncertainty because

Data can be categorized in several ways based on how they are

Parameter: value that describes a population

Statistic: a value that describes a sample

Data are considered quantitative data if numeric and arithmetic

The data in Table 2.1 are cross-sectional because they describe

• Time series data are collected over several time periods.

Graphs of time series data are frequently found in business and

•Nominal - Categorical variables with no inherent order or ranking sequence such as

1. Categorical (Nominal) 2. Ordinal

What is your gender? (please Did you enjoy the film?

• Both interval and ratio data are examples of scale data.

Frequency Distribution of Age

A survey was taken in Maple Avenue. In each of 20 homes, people were

• A relative frequency distribution is a tabular summary of data

• A percent frequency distribution summarizes the percent

for Coca-Cola is 19/50 = 0.38,

1. Determine the number of non overlapping bins.

bin width of (33 -12)/5 = 4.2 Approx. is 5

lower and upper bin limits to obtain a total of five classes:

Two types of statistical presentation of data - graphical and numerical.

Figure 1: Bar Chart of Subjects in Treatment Frequency Proportion Percent

Figure 2: Pie Chart of Treatment Frequency Proportion Percent

3 Total 60 1.00 100

12 Standard Deviation 30.22979318

You might also like