You are on page 1of 44

10/4/2017

Type of Data

www.utm.my innovative ● entrepreneurial ● global

Type of Data

www.utm.my innovative ● entrepreneurial ● global

1
10/4/2017

Type of Data

www.utm.my innovative ● entrepreneurial ● global

Type of Data

www.utm.my innovative ● entrepreneurial ● global

2
10/4/2017

Type of Data

www.utm.my innovative ● entrepreneurial ● global

Type of Data

www.utm.my innovative ● entrepreneurial ● global

3
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

4
10/4/2017

Frequency Distributions

www.utm.my innovative ● entrepreneurial ● global

Scores of Mathematics Test (N = 50)

www.utm.my innovative ● entrepreneurial ● global

5
10/4/2017

Typical Traditional
Frequency Table

www.utm.my innovative ● entrepreneurial ● global

Scores Frequency
0 1
1 1
2 0
3 1
4 6
Frequency Table (Raw Scores)
5 6
6 6
7 8
8 5
9 5
10 2
11 3
12 2
13 1
14 1
15 2
Total 50

www.utm.my innovative ● entrepreneurial ● global

6
10/4/2017

Parameter

Population Sample
(N) (n)
Mean µ M
Variance σ2 s2
Standard deviation σ s
Correlation coefficient ρ r

www.utm.my innovative ● entrepreneurial ● global

DATA

www.utm.my innovative ● entrepreneurial ● global

7
10/4/2017

Frequency Distributions

• After collecting data, the first task for a


researcher is to organize and simplify the data
so that it is possible to get a general overview
of the results.
• This is the goal of descriptive statistical
techniques.
• One method for simplifying and organizing
data is to construct a frequency distribution.

www.utm.my innovative ● entrepreneurial ● global

Frequency Distributions (cont.)

• A frequency distribution is an organized


tabulation showing exactly how many
individuals are located in each category on the
scale of measurement. A frequency
distribution presents an organized picture of
the entire set of scores, and it shows where
each individual is located relative to others in
the distribution.

www.utm.my innovative ● entrepreneurial ● global

8
10/4/2017

Group Frequency distribution


Observed Limits Real Limits
Midpoint Frequency
(mph) (mph)
40 – 44 39.5 – 44.5 42 5
45 – 49 44.5 – 49.5 47 9
50 – 54 49.5 – 54.5 52 18
55 – 59 54.5 – 59.5 57 27
60 – 64 59.5 – 64.5 62 35
65 – 69 64.5 – 69.5 67 26
70 – 74 69.5 – 74.5 72 25
75 – 79 74.5 – 79.5 77 16
80 – 84 79.5 – 84.5 82 15
85 – 89 84.5 – 89.5 87 10
90 – 94 89.5 – 94.5 92 4
Total 200
www.utm.my innovative ● entrepreneurial ● global

Distribution Table

X f
5 1
4 5
3 8
2 4
1 2

www.utm.my innovative ● entrepreneurial ● global

9
10/4/2017

Distribution Table

x f xf
5 1 5
4 5 20
3 8 24
2 4 8
1 2 2

www.utm.my innovative ● entrepreneurial ● global

Frequency Distribution Tables (cont.)

• A third column can be used for the proportion


(p) for each category: p = f/N. The sum of the
(p) column should equal 1.00.
• A fourth column can display the percentage of
the distribution corresponding to each x value.
The percentage is found by multiplying (p) by
100. The sum of the percentage column is
100%.

www.utm.my innovative ● entrepreneurial ● global

10
10/4/2017

Distribution Table
x f p % cf c%
5 1 1/20 5 20 100%
4 5 5/20 25 19 95%
3 8 8/20 40 14 70%
2 4 4/20 20 6 30%
1 2 2/20 10 2 10%
20 1 100 cf(100%)/N

where:
p = f/N
% = p*100
cf = cummulative frequency
c% = cummulative percentage

www.utm.my innovative ● entrepreneurial ● global

Scores Frequency
0 1
1 1
2 0
3 1
4 6
5 6
Regular Frequency
6 6
Distribution Table
7 8
8 5
9 5
10 2
11 3
12 2
13 1
14 1
15 2
Total 50

www.utm.my innovative ● entrepreneurial ● global

11
10/4/2017

Group Frequency distribution


Observed Limits Real Limits
Midpoint Frequency
(mph) (mph)
40 – 44 39.5 – 44.5 42 5
45 – 49 44.5 – 49.5 47 9
50 – 54 49.5 – 54.5 52 18
55 – 59 54.5 – 59.5 57 27
60 – 64 59.5 – 64.5 62 35
65 – 69 64.5 – 69.5 67 26
70 – 74 69.5 – 74.5 72 25
75 – 79 74.5 – 79.5 77 16
80 – 84 79.5 – 84.5 82 15
85 – 89 84.5 – 89.5 87 10
90 – 94 89.5 – 94.5 92 4
Total 200
www.utm.my innovative ● entrepreneurial ● global

Grouped Frequency Distribution (cont.)

• In a grouped table, the X column lists groups of


scores, called class intervals, rather than individual
values.
• These intervals all have the same width, usually a
simple number such as 2, 5, 10, and so on.
• Each interval begins with a value that is a multiple of
the interval width. The interval width is selected so
that the table will have approximately ten intervals.

www.utm.my innovative ● entrepreneurial ● global

12
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

13
10/4/2017

Example:

A child psychologist was studying the aggressive behavior of two different


groups of twenty-four 5-year-olds kids by counting the number of aggressive
acts they displayed over a 2-week period. The first group consisted of kids
who lived in the city while the other consisted of those from sub-urban areas.
The results of her study are as follows:

Group 1: Group 2:
23 36 36 38 43 56 56 56 50 55 55 58 60 62 65 70
57 58 59 59 60 65 66 68 79 80 81 81 82 82 83 85
70 77 78 80 86 89 90 91 85 89 93 103 104 107 108 117

Compare the results using frequency polygons or/and barcharts/histogram.


What can you conclude from the study?

27
www.utm.my innovative ● entrepreneurial ● global

 Estimate the number of classes for grouping k using:


k = 1 + 3.3 log N
= 1 + 3.3 log 24 = 5.554 ≈ 6
 Find the class size using the following rule:
range = (Highest score – Lowest score)
class size = range/k
= (117 – 23)/6 = 15.67
(We can take class size = 15 or 16 as desired)

28
www.utm.my innovative ● entrepreneurial ● global

14
10/4/2017

 Class Size
Sub-Class
• 20 - 24
• 25 - 29

 Discrete data
data for sub-class 20-24: 20,21,22,23,24
class size =5 (24-20+1)
 Continues data
data for sub-class 20-24
class size =Upper limit – Lower limit (24.5-19.5)
=5
29
www.utm.my innovative ● entrepreneurial ● global

Frequency
Sub-Class
Group 1 Group 2
20 - 34 1 0
35 - 49 4 0
50 - 64 8 6
65 - 79 6 3
80 - 94 5 10
95 - 109 0 4
110 - 124 0 1
Total 24 24

30
www.utm.my innovative ● entrepreneurial ● global

15
10/4/2017

12

10

Frequency Group 1
6
Frequency Group 2

0
20 - 34 35 - 49 50 - 64 65 - 79 80 - 94 95 - 109 110 - 124

End of Distributions 31
www.utm.my innovative ● entrepreneurial ● global

Frequency Distribution Graphs

• In a frequency distribution graph, the score


categories (X values) are listed on the X axis
and the frequencies are listed on the Y axis.
• When the score categories consist of
numerical scores from an interval or ratio
scale, the graph should be either a histogram
or a polygon.

www.utm.my innovative ● entrepreneurial ● global

16
10/4/2017

Histograms

• In a histogram, a bar is centered above each


score (or class interval) so that the height of
the bar corresponds to the frequency and the
width extends to the real limits, so that
adjacent bars touch.

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

17
10/4/2017

Polygons

• In a polygon, a dot is centered above each


score so that the height of the dot
corresponds to the frequency. The dots are
then connected by straight lines. An
additional line is drawn at each end to bring
the graph back to a zero frequency.

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

18
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

19
10/4/2017

We may combine the HISTOGRAM and FREQUENCY POLYGON on a


single graph:

www.utm.my innovative ● entrepreneurial ● global

We may also plot two different (relative) FREQUENCY POLYGONS


on a single graph:

Frequency
Observed Real Limits
Midpoint
Limits (mph) (mph)
Event A Event B

40 – 44 39.5 – 44.5 42 5 3
45 – 49 44.5 – 49.5 47 9 12
50 – 54 49.5 – 54.5 52 18 26
55 – 59 54.5 – 59.5 57 27 33
60 – 64 59.5 – 64.5 62 35 37
65 – 69 64.5 – 69.5 67 26 39
70 – 74 69.5 – 74.5 72 25 30
75 – 79 74.5 – 79.5 77 16 24
80 – 84 79.5 – 84.5 82 15 23
85 – 89 84.5 – 89.5 87 10 18
90 – 94 89.5 – 94.5 92 4 5
Total 200 250

www.utm.my innovative ● entrepreneurial ● global

20
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

Bar graphs

• When the score categories (X values) are


measurements from a nominal or an ordinal
scale, the graph should be a bar graph.
• A bar graph is just like a histogram except that
gaps or spaces are left between adjacent bars.

www.utm.my innovative ● entrepreneurial ● global

21
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

The Bar Chart


Raw Scores
Scores Frequency
0 1 Bar Chart
1 1
2 0 9
3 1
8
4 6
7
5 6
6 6 6
7 8 5
8 5
4
9 5
3
10 2
11 3 2
12 2 1
13 1
0
14 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
15 2 scores
Total 50
44
www.utm.my innovative ● entrepreneurial ● global

22
10/4/2017

The Bar Chart


Grouped Scores

Scores Frequency 20
18
0-2 2 16
14
3-5 13
12
6-8 19 10

9 - 11 10
8
6
12 - 14 4 4

15 - 17 2 2
0
Total 50 0-2 3-5 6-8 9 - 11 12 - 14 15 - 17

scores

45
www.utm.my innovative ● entrepreneurial ● global

An Example of Inappropriate Use of Statistics*

Means of Faculty CGPA (Sem 2 2004/05)


Fakulti Min CPA
FKSG

ATMA 2.60

PPD 2.77 FSKSM FPPSM


FKA FP
FS 2.82

FAB 2.88 FAB FKM FKE FKKKSA

FKM 2.88 FS

FKE 2.90 PPD

FKKKSA 2.90

FSKSM 2.93 ATMA


FKA 2.97

FP 2.99

FPPSM 3.02

FKSG 3.14

Universiti 2.89 46
* inappropriate statistical extrapolation
www.utm.my innovative ● entrepreneurial ● global

23
10/4/2017

An Example of Appropriate Use of Statistics

Rates of Graduating on Time (GoT)

100
88.4 89.5
90
80
70 61.3 61.1
60.7
60
% (G o T )

52.1
50 43.8 46.1 46.6

40
30
20
10
0
FSKSM FS FKM FKE FKA FKKKSA FKSG FPPSM Keseluruhan

47
www.utm.my innovative ● entrepreneurial ● global

Summaries of CGPA for Each Academic Program (at Entry Point)

4.00
3.97 3.97 Program Min Max Median
3.88 3.87 3.86 3.85
3.76
SEL 3.48 4.00 4.00
3.80 3.73
3.70
3.61 3.61 SMT 3.83 4.00 4.00
3.59
3.60 3.55
SEM 3.56 4.00 3.90
3.40 SKK 3.13 4.00 3.89
SMM 3.60 4.00 3.88
3.20
SKP 3.00 4.00 3.84
3.00
SMV 3.00 4.00 3.78
2.80 SEP 3.29 4.00 3.73

2.60 SEB 3.46 4.00 3.74


SEE 3.16 4.00 3.65
2.40
SEW 3.07 4.00 3.57
2.20
SET 3.21 4.00 3.55
2.00 SEC 3.00 4.00 3.57
SEL SMT SEM SKK SMM SKP SMV SEP SEB SEE SEW SET SEC

Mean CGPA

48
www.utm.my innovative ● entrepreneurial ● global

24
10/4/2017

The Bar Chart


The Data
Types 1980s 1990s 2000's
2000's
30
Crime 16 18 10
25
Drama 14 15 15
20
Info Entertainment 0 1 13
15
Western 0 1 21
10
Comedy 54 53 26
5

Others 16 12 15 0
Crime Drama Info Western Comedy Others
Total 100 100 100 Entertainment

Types of TV Programmes

49
www.utm.my innovative ● entrepreneurial ● global

Raw Scores
Scores Frequency
0 1 Inappropriate Polygon
1 1
9
2 0
8
3 1
7
4 6
6
5 6
5
6 6 4
7 8 3
8 5 2
9 5 1

10 2 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
11 3
12 2 scores
13 1
14 1
15 2
Total 50
50
www.utm.my innovative ● entrepreneurial ● global

25
10/4/2017

Grouped Scores
The Pie Chart
Group Scores Frequency

1 0-2 2
4% 4%

2 3-5 13
8% 1
2
3 6-8 19 26%
20% 3

4 9 - 11 10 4
5
38%
5 12 - 14 4 6

6 15 - 17 2

50

51
www.utm.my innovative ● entrepreneurial ● global

Grouped Scores
The Pie Chart
Group Scores Frequency

1 0-2 2
2 2

2 3-5 13 1
4
2
3 6-8 19 13
10 3

4 9 - 11 10 4

19 5
5 12 - 14 4 6

6 15 - 17 2

50

52
www.utm.my innovative ● entrepreneurial ● global

26
10/4/2017

Relative frequency

• Many populations are so large that it is


impossible to know the exact number of
individuals (frequency) for any specific
category.
• In these situations, population distributions
can be shown using relative frequency instead
of the absolute number of individuals for each
category.

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

27
10/4/2017

Smooth curve

• If the scores in the population are measured


on an interval or ratio scale, it is customary to
present the distribution as a smooth curve
rather than a jagged histogram or polygon.
• The smooth curve emphasizes the fact that
the distribution is not showing the exact
frequency for each category.

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

28
10/4/2017

Frequency distribution graphs

• Frequency distribution graphs are useful


because they show the entire set of scores.
• At a glance, you can determine the highest
score, the lowest score, and where the scores
are centered.
• The graph also shows whether the scores are
clustered together or scattered over a wide
range.

www.utm.my innovative ● entrepreneurial ● global

Shape

• A graph shows the shape of the distribution.


• A distribution is symmetrical if the left side of the
graph is (roughly) a mirror image of the right side.
• One example of a symmetrical distribution is the bell-
shaped normal distribution.
• On the other hand, distributions are skewed when
scores pile up on one side of the distribution, leaving
a "tail" of a few extreme values on the other side.

www.utm.my innovative ● entrepreneurial ● global

29
10/4/2017

Positively and Negatively


Skewed Distributions
• In a positively skewed distribution, the scores
tend to pile up on the left side of the
distribution with the tail tapering off to the
right.
• In a negatively skewed distribution, the scores
tend to pile up on the right side and the tail
points to the left.

www.utm.my innovative ● entrepreneurial ● global

Shapes of Data Distribution (Generated


from Frequency Polygons)

www.utm.my innovative ● entrepreneurial ● global

30
10/4/2017

Percentiles, Percentile Ranks, and Interpolation

• The relative location of individual scores


within a distribution can be described by
percentiles and percentile ranks.
• The percentile rank for a particular X value is
the percentage of individuals with scores
equal to or less than that X value.
• When an X value is described by its rank, it is
called a percentile.

www.utm.my innovative ● entrepreneurial ● global

Interpolation

• When scores or percentages do not correspond


to upper real limits or cumulative percentages,
you must use interpolation to determine the
corresponding ranks and percentiles.
• Interpolation is a mathematical process based
on the assumption that the scores and the
percentages change in a regular, linear fashion
as you move through an interval from one end
to the other.

www.utm.my innovative ● entrepreneurial ● global

31
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

Distribution Table

X f cf c%
5 1 20 100%
4 5 19 95%
3 8 14 70%
2 4 6 30%
1 2 2 10%

www.utm.my innovative ● entrepreneurial ● global

32
10/4/2017

Example – real limit

• What is the 95th percentile? (X=4.5)


• What is the percentile rank for X=3.5? (70%)

www.utm.my innovative ● entrepreneurial ● global

Example - Interpolation

• What is the 50th percentile? (X=3)


• What is the percentile rank for X=4 (82.5%)

www.utm.my innovative ● entrepreneurial ● global

33
10/4/2017

Distribution Table (#58)

X f cf c%
40-49 4 25 100%
30-39 6 21 84%
20-29 10 15 60%
10-19 3 5 20%
0-9 2 2 8%

www.utm.my innovative ● entrepreneurial ● global

Example – real limit

• What is the 60th percentile? (X=29.5)


• What is the percentile rank for X=39.5? (84%)

www.utm.my innovative ● entrepreneurial ● global

34
10/4/2017

Example - Interpolation

• What is the 40th percentile? (X=24.5)


• What is the percentile rank for X=32 (66%)

www.utm.my innovative ● entrepreneurial ● global

Stem-and-Leaf Displays
• A stem-and-leaf display provides a very efficient
method for obtaining and displaying a frequency
distribution.
• Each score is divided into a stem consisting of the
first digit or digits, and a leaf consisting of the final
digit.
• Finally, you go through the list of scores, one at a
time, and write the leaf for each score beside its
stem.
• The resulting display provides an organized picture of
the entire distribution. The number of leafs beside
each stem corresponds to the frequency, and the
individual leafs identify the individual scores.

www.utm.my innovative ● entrepreneurial ● global

35
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

STEM AND LEAF DIAGRAM

Examples:
Score : 37 Score : 736
Leaf : 7 Leaf : 6
Stem : 3 Stem : 73

www.utm.my innovative ● entrepreneurial ● global

36
10/4/2017

CREATING STEM AND LEAF DIAGRAMS


1. Create a vertical column that contains all the possible stems in
the data
Stem Leaf

3
4
5
6
7
8
9
Stem Leaf
2. Take each score and list its leaf to
the corresponding stem, 3
e.g. for the first data (83) 4
5
6
7
8 3
9
www.utm.my innovative ● entrepreneurial ● global

3. Continue building the respective stem and leaf for the


remaining scores.
Stem Leaf
e.g. the stem and leaf for 85 is
(stem = 8 leaf =5) 3
4
the stem and leaf for 82 is 5
(stem = 8 leaf =2) 6
7
the stem and leaf for 81 is 8 3521
(stem = 8 leaf =2) 9

4. Complete the stem and leaf for all Stem Leaf


scores 23
3
4 26
5 6279
6 283
7 1643846
8 3521
9 37

www.utm.my innovative ● entrepreneurial ● global

37
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

38
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

Exercise

• No 9, #67

• No 23, #69

www.utm.my innovative ● entrepreneurial ● global

39
10/4/2017

SPSS
Descriptive Statistics

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

40
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

41
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

42
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

43
10/4/2017

www.utm.my innovative ● entrepreneurial ● global

www.utm.my innovative ● entrepreneurial ● global

44

You might also like