Professional Documents
Culture Documents
Descriptive Statistics
Instructor : Maira Sami
Modified By : Sophia Ajaz
1
Objectives
• Overview Of Using Data: Definitions And Goals
• Types of data
• Some Definitions
• Types of measurement
• Modifying Data in Excel
• Creating Distributions from Data
• Data Presentation
2
Overview of Using Data: Definitions and Goals
• Data
• Variable
• Observation
• Variation
• Random variables
3
• Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.
• A characteristic or a quantity of interest that can take on
different values is known as a variable.
• An observation is a set of values corresponding to a set of
variables.
4
variation is the difference in a variable measured over observations
(time, customers, items, etc.).
The values of some variables are under direct control of the decision
maker (these are often called decision variables).
5
6
Types of data
• Population and Sample Data
• Quantitative and Categorical Data
• Cross-Sectional and Time Series Data
7
Population and Sample Data
8
What is a Statistic????
Sample
Sample
Sample
Population
Sample
9
Sample vs. Population
Population Sample
11
Quantitative and Categorical Data
12
• If arithmetic operations cannot be performed on the data,
they are considered categorical data.
• For instance, the data in the Industry column in Table 2.1 are
categorical.
• We can count the number of observations or compute the
proportions of observations in each category.
13
Cross-Sectional and Time Series Data
• Cross-sectional data are collected from several entities at the
same, or approximately the same, point in time.
14
15
Some Definitions
• Variable - any characteristic of an individual or entity. A variable can take
different values for different individuals. Variables can be categorical or
quantitative. Per S. S. Stevens…
Types
• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe. Can
be compared for equality, or greater or less, but not how much greater or less.
Some Definitions
•Interval - Values of the variable are ordered as in Ordinal, and additionally,
differences between values are meaningful, however, the scale is not absolutely
anchored. Calendar dates and temperatures on the Fahrenheit scale are examples.
Addition and subtraction, but not multiplication and division are meaningful
operations. The difference in temperature between 10-20 degrees is the same as the
difference in temperature between 20-30 degrees.
• Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary zero
point, e.g. age, weight, height, temperature (Kelvin). Addition, subtraction,
multiplication, and division are all meaningful operations.
Types of measurement
• When collecting or gathering data we collect data from individuals
cases on particular variables.
• A variable is a unit of data collection whose value can vary.
• Variables can be defined into types according to the level of
mathematical scaling that can be carried out on the data.
• There are four types of data or levels of measurement:
3. Interval 4. Ratio
18
Categorical (Nominal) data
• What does this mean? No mathematical operations can be
performed on the data relative to each other.
• Therefore, nominal data reflect qualitative differences rather
than quantitative ones.
• Nominal measurements only permit you to determine
whether two individuals are the same or different.
19
Nominal data
Examples:
Male Yes
Female No
20
Ordinal data
• Ordinal data is data that comprises of categories that can be
rank ordered.
• Similarly with nominal data the distance between each category
cannot be calculated but the categories can be ranked above or
below each other.
• No fixed units of measurement.
• Examples:
• college football rankings
• survey responses (poor, average, good, very good, excellent)
• What does this mean? Can make statistical judgements and
perform limited maths.
21
Ordinal data
22
Interval and ratio data
23
Interval data
• Ordinal data but with constant differences between
observations.
• Examples:
• Time – moves along a continuous measure or seconds,
minutes and so on and is without a zero point of time.
• Temperature – moves along a continuous measure of
degrees and is without a true zero.
• SAT scores
24
Ratios
• Ratio data measured on a continuous scale and does have a
natural zero point.
• Ratios are meaningful.
• Examples:
– Monthly sales
– Delivery times
– Weight
– Height
– Age
25
Data for Business Analytics
Classifying Data Elements in a Purchasing Database
Figure 1.2
Data for Business Analytics
Classifying Data Elements in a Purchasing Database
Modifying Data in Excel
Sorting Data in Excel
• Step 1. Select cells A1:F21
• Step 2. Click the Data tab in the Ribbon
• Step 3. Click Sort in the Sort & Filter group
• Step 4. Select the check box for My data has headers
• Step 5. In the first Sort by dropdown menu, select Sales
(March 2010)
• Step 6. In the Order dropdown menu, select Largest to
Smallest (see Figure 2.4)
• Step 7. Click OK
28
• Ref book pg 24
29
Filtering
• Step 1. Select cells A1:F21
• Step 2. Click the Data tab in the Ribbon
• Step 3. Click Filter in the Sort & Filter group
• Step 4. Click on the Filter Arrow in column B, next to
Manufacturer
• Step 5. If all choices are checked, you can easily deselect all
choices by unchecking
• (Select All). Then select only the check box for Toyota.
• Step 6. Click OK
30
Creating Distributions from Data
• Distributions help summarize many characteristics of a data
set by describing how often certain values for a variable
appear in that data set.
• Distributions can be created for both categorical and
quantitative data, and they assist the analyst in gauging
variation.
31
Frequency Distributions for Categorical Data
• A frequency distribution is a summary of data that shows the
number (frequency) of observations in each of several non
overlapping classes, typically referred to as bins.
32
Frequency Distribution
Consider a data set of 26 children of ages 1-6 years. Then the frequency
distribution of variable ‘age’ can be tabulated as follows:
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:
Age Group 1-2 3-4 5-6
Frequency 8 12 6
33
Example: 1
35
36
Solution ?
• Discussed in class
37
Relative Frequency and Percent Frequency
Distributions
38
Relative Frequency and Percent Frequency
Distributions
39
Example 3
40
Frequency Distributions for Quantitative Data
• Consider the quantitative data in Table 2.6
41
• These data show the time in days required to complete year-
end audits for a sample of 20 clients of Sanderson and
Clifford, a small public accounting firm. The three steps
necessary to define the classes for a frequency distribution
with quantitative data are as follows:
42
• Number of Bins: Bins are formed by specifying the ranges
used to group the data.
• Width of the Bins: choose a width for the bins.
•
43
• Bin Limits: Bin limits must be chosen so that each data item
belongs to one and only one class.
44
Example 4
45
46
• Step 1. Select cells B10:B14
• Step 2. Type the formula 5FREQUENCY(A2:D6, A10:A14). The
range A2:D6
• defines the data set, and the range A10:A14 defines the bins.
• Step 3. Press CTRL+SHIFT1+ENTER after typing the formula in
Step 2.
47
48
Data Presentation
Graphical Presentation: We look for the overall pattern and for striking deviations
from that pattern. Over all pattern usually described by shape, center, and spread
of the data. An individual value that falls outside the overall pattern is called an
outlier.
Bar diagram and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for numerical variable.
Histograms
• Step 1. Click the Data tab in the Ribbon
• Step 2. Click Data Analysis in the Analyze group
• Step 3. When the Data Analysis dialog box opens, choose
Histogram from the list of
• Analysis Tools, and click OK
• In the Input Range: box, enter A2:D6
• In the Bin Range: box, enter A10:A14
• Under Output Options:, select New Worksheet Ply:
• Select the check box for Chart Output (see Figure 2.13)
• Click OK
50
A common graphical presentation of
quantitative data is a histogram
51
Data Presentation –Categorical
Variable
Bar Diagram: Lists the categories and presents the percent or count of individuals who fall
in each category.
1 15 (15/60)=0.25 25.0
30
Number of Subjects
25 2 25 (25/60)=0.333 41.7
20
15
3 20 (20/60)=0.417 33.3
10 Total 60 1.00 100
5
0
1 2 3
Treatm ent Group
Data Presentation –Categorical
Variable
Pie Chart: Lists the categories and presents the percent or count of individuals who fall in
each category.
1 15 (15/60)=0.25 25.0
25% 2 25 (25/60)=0.333 41.7
33% 1
2 3 20 (20/60)=0.417 33.3
Histogram: Overall pattern can be described by its shape, center, and spread. The
following age distribution is right skewed. The center lies between 80 to 100. No
outliers.
Mean 90.41666667
Figure 3: Age Distribution
Standard Error 3.902649518
16 Median 84
14 Mode 84
Number of Subjects
2 Range 95
0 Minimum 48
40 60 80 100 120 140 More
Maximum 143
Age in Month
Sum 5425
Count 60
Thank You !