You are on page 1of 55

Business Analytics

Descriptive Statistics
Instructor : Maira Sami
Modified By : Sophia Ajaz

1
Objectives
• Overview Of Using Data: Definitions And Goals
• Types of data
• Some Definitions
• Types of measurement
• Modifying Data in Excel
• Creating Distributions from Data
• Data Presentation

2
Overview of Using Data: Definitions and Goals
• Data
• Variable
• Observation
• Variation
• Random variables

3
• Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.
• A characteristic or a quantity of interest that can take on
different values is known as a variable.
• An observation is a set of values corresponding to a set of
variables.

4
variation is the difference in a variable measured over observations
(time, customers, items, etc.).

The role of descriptive analytics is to collect and analyse data to gain a


better understanding of variation and its impact on the business
setting.

The values of some variables are under direct control of the decision
maker (these are often called decision variables).

The values of other variables may fluctuate with uncertainty because


of factors outside the direct control of the decision maker. In general,
a quantity whose values are not known with certainty is called a
random variable, or uncertain variable.

5
6
Types of data
• Population and Sample Data
• Quantitative and Categorical Data
• Cross-Sectional and Time Series Data

7
Population and Sample Data

Data can be categorized in several ways based on how they are


collected and the type collected.
• In many cases, it is not feasible to collect data from the
population of all elements of interest. In such instances, we
collect data from a subset of the population known as a
sample.

8
What is a Statistic????
Sample
Sample
Sample

Population
Sample

Parameter: value that describes a population

Statistic: a value that describes a sample

9
Sample vs. Population

Population Sample
11
Quantitative and Categorical Data

Data are considered quantitative data if numeric and arithmetic


operations included, such as
• addition,
• subtraction,
• multiplication,
• and division.
For instance, we can sum the values for Volume in the Dow data
in Table 2.1 to calculate a total volume of all shares traded by
companies included in the Dow.

12
• If arithmetic operations cannot be performed on the data,
they are considered categorical data.
• For instance, the data in the Industry column in Table 2.1 are
categorical.
• We can count the number of observations or compute the
proportions of observations in each category.

13
Cross-Sectional and Time Series Data
• Cross-sectional data are collected from several entities at the
same, or approximately the same, point in time.

The data in Table 2.1 are cross-sectional because they describe


the 30 companies that comprise the Dow at the same point in
time (July 2015).

• Time series data are collected over several time periods.

Graphs of time series data are frequently found in business and


economic publications.

14
15
Some Definitions
• Variable - any characteristic of an individual or entity. A variable can take
different values for different individuals. Variables can be categorical or
quantitative. Per S. S. Stevens…

Types

•Nominal - Categorical variables with no inherent order or ranking sequence such as


names or classes (e.g., gender, eye colour and hair colour). Value may be a numerical,
but without numerical value (e.g., I, II, III). The only operation that can be applied to
Nominal variables is enumeration.

• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe. Can
be compared for equality, or greater or less, but not how much greater or less.
Some Definitions
•Interval - Values of the variable are ordered as in Ordinal, and additionally,
differences between values are meaningful, however, the scale is not absolutely
anchored. Calendar dates and temperatures on the Fahrenheit scale are examples.
Addition and subtraction, but not multiplication and division are meaningful
operations. The difference in temperature between 10-20 degrees is the same as the
difference in temperature between 20-30 degrees.

• Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary zero
point, e.g. age, weight, height, temperature (Kelvin). Addition, subtraction,
multiplication, and division are all meaningful operations.
Types of measurement
• When collecting or gathering data we collect data from individuals
cases on particular variables.
• A variable is a unit of data collection whose value can vary.
• Variables can be defined into types according to the level of
mathematical scaling that can be carried out on the data.
• There are four types of data or levels of measurement:

1. Categorical (Nominal) 2. Ordinal

3. Interval 4. Ratio

18
Categorical (Nominal) data
• What does this mean? No mathematical operations can be
performed on the data relative to each other.
• Therefore, nominal data reflect qualitative differences rather
than quantitative ones.
• Nominal measurements only permit you to determine
whether two individuals are the same or different.

19
Nominal data
Examples:

What is your gender? (please Did you enjoy the film?


tick) (please tick)

Male Yes
Female No

20
Ordinal data
• Ordinal data is data that comprises of categories that can be
rank ordered.
• Similarly with nominal data the distance between each category
cannot be calculated but the categories can be ranked above or
below each other.
• No fixed units of measurement.
• Examples:
• college football rankings
• survey responses (poor, average, good, very good, excellent)
• What does this mean? Can make statistical judgements and
perform limited maths.

21
Ordinal data

22
Interval and ratio data

• Both interval and ratio data are examples of scale data.


• Scale data:
• data is in numeric format ($50, $100, $150).
• data that can be measured on a continuous scale.
• the distance between each can be observed and as a result
measured.
• the data can be placed in rank order.

23
Interval data
• Ordinal data but with constant differences between
observations.
• Examples:
• Time – moves along a continuous measure or seconds,
minutes and so on and is without a zero point of time.
• Temperature – moves along a continuous measure of
degrees and is without a true zero.
• SAT scores

24
Ratios
• Ratio data measured on a continuous scale and does have a
natural zero point.
• Ratios are meaningful.
• Examples:
– Monthly sales
– Delivery times
– Weight
– Height
– Age

25
Data for Business Analytics
Classifying Data Elements in a Purchasing Database

Figure 1.2
Data for Business Analytics
Classifying Data Elements in a Purchasing Database
Modifying Data in Excel
Sorting Data in Excel
• Step 1. Select cells A1:F21
• Step 2. Click the Data tab in the Ribbon
• Step 3. Click Sort in the Sort & Filter group
• Step 4. Select the check box for My data has headers
• Step 5. In the first Sort by dropdown menu, select Sales
(March 2010)
• Step 6. In the Order dropdown menu, select Largest to
Smallest (see Figure 2.4)
• Step 7. Click OK

28
• Ref book pg 24

29
Filtering
• Step 1. Select cells A1:F21
• Step 2. Click the Data tab in the Ribbon
• Step 3. Click Filter in the Sort & Filter group
• Step 4. Click on the Filter Arrow in column B, next to
Manufacturer
• Step 5. If all choices are checked, you can easily deselect all
choices by unchecking
• (Select All). Then select only the check box for Toyota.
• Step 6. Click OK

30
Creating Distributions from Data
• Distributions help summarize many characteristics of a data
set by describing how often certain values for a variable
appear in that data set.
• Distributions can be created for both categorical and
quantitative data, and they assist the analyst in gauging
variation.

31
Frequency Distributions for Categorical Data
• A frequency distribution is a summary of data that shows the
number (frequency) of observations in each of several non
overlapping classes, typically referred to as bins.

32
Frequency Distribution
Consider a data set of 26 children of ages 1-6 years. Then the frequency
distribution of variable ‘age’ can be tabulated as follows:

Frequency Distribution of Age

Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:
Age Group 1-2 3-4 5-6

Frequency 8 12 6

33
Example: 1

A survey was taken in Maple Avenue. In each of 20 homes, people were


asked how many cars were registered to their households. The results were
recorded as follows:
3, 1, 4, 0, 2, 1, 5, 2, 1, 5, 4, 2, 3, 2, 0, 2, 1, 0, 3, 2.
Present this data in Frequency Distribution Table.
Also find maximum number of cars registered by household.
Example# 02

35
36
Solution ?
• Discussed in class

37
Relative Frequency and Percent Frequency
Distributions

• A relative frequency distribution is a tabular summary of data


showing the relative frequency for each bin.

• A percent frequency distribution summarizes the percent


frequency of the data for each bin.

38
Relative Frequency and Percent Frequency
Distributions

for Coca-Cola is 19/50 = 0.38,


for Diet- Coke is 8/50 = 0.16, and so on.

39
Example 3

40
Frequency Distributions for Quantitative Data
• Consider the quantitative data in Table 2.6

41
• These data show the time in days required to complete year-
end audits for a sample of 20 clients of Sanderson and
Clifford, a small public accounting firm. The three steps
necessary to define the classes for a frequency distribution
with quantitative data are as follows:

1. Determine the number of non overlapping bins.


2. Determine the width of each bin.
3. Determine the bin limits.

42
• Number of Bins: Bins are formed by specifying the ranges
used to group the data.
• Width of the Bins: choose a width for the bins.

bin width of (33 -12)/5 = 4.2 Approx. is 5

43
• Bin Limits: Bin limits must be chosen so that each data item
belongs to one and only one class.

lower and upper bin limits to obtain a total of five classes:


10–14,
15–19,
20–24,
25–29,
30–34.

44
Example 4

45
46
• Step 1. Select cells B10:B14
• Step 2. Type the formula 5FREQUENCY(A2:D6, A10:A14). The
range A2:D6
• defines the data set, and the range A10:A14 defines the bins.
• Step 3. Press CTRL+SHIFT1+ENTER after typing the formula in
Step 2.

47
48
Data Presentation

Two types of statistical presentation of data - graphical and numerical.

Graphical Presentation: We look for the overall pattern and for striking deviations
from that pattern. Over all pattern usually described by shape, center, and spread
of the data. An individual value that falls outside the overall pattern is called an
outlier.

Bar diagram and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for numerical variable.
Histograms
• Step 1. Click the Data tab in the Ribbon
• Step 2. Click Data Analysis in the Analyze group
• Step 3. When the Data Analysis dialog box opens, choose
Histogram from the list of
• Analysis Tools, and click OK
• In the Input Range: box, enter A2:D6
• In the Bin Range: box, enter A10:A14
• Under Output Options:, select New Worksheet Ply:
• Select the check box for Chart Output (see Figure 2.13)
• Click OK

50
A common graphical presentation of
quantitative data is a histogram

51
Data Presentation –Categorical
Variable
Bar Diagram: Lists the categories and presents the percent or count of individuals who fall
in each category.

Figure 1: Bar Chart of Subjects in Treatment Frequency Proportion Percent


Treatm ent Groups Group (%)

1 15 (15/60)=0.25 25.0
30
Number of Subjects

25 2 25 (25/60)=0.333 41.7
20
15
3 20 (20/60)=0.417 33.3
10 Total 60 1.00 100
5
0
1 2 3
Treatm ent Group
Data Presentation –Categorical
Variable
Pie Chart: Lists the categories and presents the percent or count of individuals who fall in
each category.

Figure 2: Pie Chart of Treatment Frequency Proportion Percent


Subjects in Treatment Groups Group (%)

1 15 (15/60)=0.25 25.0
25% 2 25 (25/60)=0.333 41.7
33% 1
2 3 20 (20/60)=0.417 33.3

3 Total 60 1.00 100


42%
Graphical Presentation –Numerical Variable

Histogram: Overall pattern can be described by its shape, center, and spread. The
following age distribution is right skewed. The center lies between 80 to 100. No
outliers.

Mean 90.41666667
Figure 3: Age Distribution
Standard Error 3.902649518

16 Median 84
14 Mode 84
Number of Subjects

12 Standard Deviation 30.22979318


10
Sample Variance 913.8403955
8
Kurtosis -1.183899591
6
4 Skewness 0.389872725

2 Range 95
0 Minimum 48
40 60 80 100 120 140 More
Maximum 143
Age in Month
Sum 5425
Count 60
Thank You !

You might also like