6 views

Uploaded by mayansun

- Chapter 7 Statistical Analysis Not Mine
- Valeport Ips
- Measures of Central Tendency and Variability
- Mean, Median and Mode Grouped Data
- Soalan Ramalan ADD MATHS+Skema [Times] 2011
- GFMJongsma Honours Thesis
- Statistics SB
- Homework.pdf
- Trigger modes for Power Controllers
- 3 5 a appliedstatistics
- Coffalyser v9 Quick Analysis Guide (PRE_RELEASE).pdf
- african435 kimario fall2016 3 1
- study test for fundamentals of statistics
- Hypericum So Fran
- Statistic Assignment 1
- stat probbility ging2.docx
- The Median2
- Engineering Formulas
- ReactionTime1SE
- BRM Ch 17

You are on page 1of 24

1. What is Statistics?

2. Where does it come from?

3. Why should ‘we’ study it?

So, What is Statistics? It is the Collection of Data, and the Conversion of this data into Information.

What is Data, and what is Information? Data is pieces of attributes, and there are two main types,

Quantitative and Qualitative data. Information is organized data for specific purposes. In statistics

data is organized for two main purposes, to describe the Data (Descriptive), to use the information

to make decisions (Inferential). So statistical study can be broken into three parts, Statistical Data

Collection, Descriptive Statistics, and Inferential Statistics. We shall later expand on each of them.

So, Where does Statistics come from? On a bigger scale rulers have always being interested in

knowing the composition of their subjects especially the number of young men fit to be sent out to

kill, or be killed for the glory of the ruler. Rulers have therefore being very fond of Census, one of

the well known, according to the Bible, led to Jesus Christ being born in a manger. On a smaller

scale it is a major part of the basis of most of our personal decisions because it forms the basis of

what we call experience. “I shop at No Frills, because prices there are cheaper (most times) than

Loblaws’.” It is intertwined with chance or probability, so it comes from most of our daily routines.

So, Why should ‘we’ study it? Variation (according to Derek Stephens of Sick Kids) is what makes

the study of statistics necessary. Example, if all the people in a country are alike in every

conceivable characteristic, then what satisfies one satisfies all, and there will be no need for census.

We therefore study statistics primarily to make sense of the variation in a group. The scientific

method, which underlies experimental science, technological research and development, and

research in the social sciences, is essentially statistical methods, and for this alone it is worth

studying statistics. Knowledge in statistics makes us ‘better’ consumers of advertisement and

propaganda. Statistics is invaluable in planning especially for large groups like people in a country.

We will now learn a bit more about the three parts of Statistics: Statistical Data Collection,

Descriptive Statistics, and Inferential Statistics.

Statistics Text - http://www.cimt.plymouth.ac.uk/projects/mepres/allgcse/pbtxt.pdf

Statistical Data Collection:

http://www.cimt.plymouth.ac.uk/projects/mepres/alevel/stats_ch2.pdf

Types of Data: 1. Quantitative Data 2. Qualitative (Category) Data.

Quantitative Data are Numerical Data. There are two types; a) Discrete and b) Continuous.

Discrete Numerical Data: Possible numerical outcomes can be counted. Example, the number of

students in a class are either 0, 1, 2, 3, …. That is it is a non-negative integer. Shoe sizes, since there

are a finite number of them (note: some are fractions). Bank balance of Canadians

Continuous Numerical Data: Possible numerical outcomes cannot be counted. Example, the size of

feet measured in any units of length, e.g. centimetres. The weight of any three oranges measured in

any units of weight, e.g. grams.

Qualitative (Category) Data: Is a ‘measure’ that put subjects in non-quantifiable groups. There are

two types; a) Nominal: category by name, b) Rank: category by rank. Colour (e.g. of cars), place of

birth, SIN that starts with 405, 905, etc are examples of Nominal, and 1 st, 2nd, 3rd, year is an example

of Rank.

Statistical Population: Are all the subjects under a statistical study. TYP students become the

population if the study is restricted to TYP students, example, finding the height of TYP students.

Census is a statistical study, which includes each member of the adult population of a country.

If each member of a population is included in the study, then the problem of Data Collection is

reduced to ‘how ‘information’ is collected’ from the subjects. For most statistical studies, the size of

the population and/or the cost of the study and/or the nature of the study make it impractical to

collect data from each member of the population. In such situations, data is collected from a

representative group or sample chosen from the population. The data from this sample is then

assumed to be applicable to the whole population. So besides the problem of ‘how ‘information’ is

collected’ is added the problem of ‘how the sample was chosen’.

‘How ‘Information’ is collected’? Data is collected from subjects either passively by not interacting

with the subjects or actively by interacting with the subjects. Collecting data passively is mostly by

observation, and counting. Example of such, is data on number of people passing through a given

place in a given time interval. Active data collection involves measuring usually with an instrument,

and most often by questionnaire. Problems with measuring with instruments are problems

associated with the instruments, which most times are resolved technically. Data collection by

questionnaire on the other hand has many ‘hidden’ problems from the way the questions are framed

to whether subjects respond verbally or in writing. So we reduce the problems we will look at in

data collection to whether the whole population is studied and if not how a sample is chosen

(Sampling), and if questionnaires are used how they are prepared and used.

Sampling: http://www.cimt.plymouth.ac.uk/projects/mepres/book9/bk9_18.pdf

The criteria for choosing a sample to represent a population for statistical study is that each member

of the population must have an equal chance of being chosen. This is similar to Lotto 649, for which

each of the 49 numbers has an equal chance of being one of the 6 numbers chosen for the jackpot.

The best method to achieve this is by Random Sampling. So to Randomly choose a sample is to

give each member of your population the same chance of being chosen. There are many methods of

Random Sampling and one of the most used is by using Random Numbers, for example Lotto 649

numbers.

Questionnaire: http://www.cimt.plymouth.ac.uk/projects/mepres/book8/bk8_20.pdf

Check the above site for criteria a good questionnaire meets.

This involves Sorting and Grouping, Graphical Illustration, and Calculation of Summary Statistics.

Sorting and Grouping: This brings some sort of order to the data. If it is numerical data, it may be

arranged in increasing or decreasing order. It may also be sorted into Stem and Leaf. Another

method of sorting is to put the data in a Frequency Table. The two types of Frequency Tables are,

the Ungroup Frequency Table, and the Group Frequency Table. The Ungroup Frequency Table is a

score and the frequency of the score in the data. The Group Frequency Table is a group of scores

and the sum of the frequency of each of the scores in the group. (Frequency of a score is the

3

number of times the score is in the data.)

Graphical Illustration: Some of the illustrations are: 1) Line Graph; 2) Pie Charts; 3) Bar Charts;

4) Histogram; 5) Cumulative Frequency Diagram; etc.

Summary, Statistics: These are values derived from the data to give a short description of data.

These are the Measures of Central Tendency, and the Measures of Dispersion. Besides describing

the data, these measures are convenient for the comparison of two sets of data.

Measures of Central Tendency: These are the Mode, Median, and the Mean. They are also known

as the averages. The Mode is the score with the highest frequency. The Median is the ‘score’ with

the same number of scores greater than it as are less than it. The mean is the sum of all the scores in

the data, divided by the number of scores in the data.

We shall illustrate all these by examples using the following set of Data:

Test 1:

56 40 7 70 31 17 56 71 70 36 71 71

63 56 46 91 46 73 60 97 67 53 86 64

44 92 46 75 77 93 70 97 53 71 79 57

Test 2:

65 60 30 70 63 65 45 56 70 58 48 37

92 40 86 62 60 40 53 47 45 60 31 91

31 35 61 61 72 80 50 60 73 28 58 38

Blue, White, Blue, Blue, Black, Blue, Black, Silver, Silver, Blue, Silver, Green, Black, White,

Silver, Silver, Black, Blue, Green, Blue, Green, Red, Black, Red.

variation in the scores of a data’. That is if there is no variation, then there is no statistical problem.

Example, if the people in a country do not change in terms of all its attributes, then there is no need

for a census, or at most only one census for all time. So the data may be uniformly distributed that is

the scores have the same frequency, symmetrically distributed (bell curve), skewed to the ‘right’, or

skewed to the left of a score. The mean and median are the same for a uniform distribution whilst

every score is a mode, so the mode cannot be used as ‘the’ measure of central tendency. For a

symmetrical distribution, the mode, median and mean are the same value, so each could be used as

a measure of central tendency. For skewed distribution, the median or mode may be a better

measure of central tendency. However the mean is most often used as the measure of central

tendency for quantitative data, because it is influenced by each of the scores, and also statistical

decision theory is more developed for the mean.

Stem and Leaf:

The Stem is the digit of the highest position of the numbers in the data, and the leaf is the remaining

digit(s) of the number. Example if the greatest number in the data is a two digit number, then the

stem is the digit in the tens position, and the leaf the digit in the unit position.

Diagram:

List all the possible stems, that is digits from 0 (sometimes) to 9 for the stem position, and for each

stem the leaf is attached (by listing them) on the right and arranged in order of magnitude.

Stem and Leaf Diagram for Test 1

Then

Arrange

leaf of

Stem Leaf Stem Leaf

each stem

in order of

magnitude

0 7 0 7

1 7 1 7

2 2

3 1, 6 3 1, 6

4 0, 6, 6, 4, 6 4 0, 4, 6, 6, 6

5 6, 6, 6, 3, 3, 7 5 3, 3, 6, 6, 6, 7

6 3, 0, 7, 4 6 0, 3, 4, 7

7 0, 1, 0, 1, 1, 3, 5, 7, 0, 1, 9 7 0, 0, 0, 1, 1, 1, 1, 3, 5, 7, 9

8 6 8 6

9 1, 7, 2, 3, 7 9 1, 2, 3, 7, 7

Final Diagram – Above.

Measures of Central Tendency or Averages:

Mode - From the Stem and Leaf diagram it is not difficult to see that the mode is 71. It occurs more

often than any other score.

5

Median – To find the median, the score is arranged in order of magnitude, and the score in the

middle position is the median. There are 36 scores or numbers in the data, so the middle position

lies between the 18th and the 19th positions. From the stem and leaf diagram, counting from the least

to the greatest score, the score in the 18th position is 64, and the score in the 19th position is 67. The

median is the sum of these two numbers divided by 2. Median is 65.5

N +1

F

orad

ataw

ithNsco

res, th

eMed

ianP

ositio

n = .Arran

ged

ataino

rdero

fmag

nitu

de.

2

Th

enth

eM ed

ianisth

escoreinth

ispositio

n.Itisfoundbycounting.Ifth

epo

sitio

nfallsb

etween

tw

osco

resasabovethenth

em eanofthetw oscoresisth

em ed

ian .

Arithmetic Mean:

This is the sum of all the scores divided by the number of scores. There are 36 scores, and the sum

can be found from the original data or from the stem and leaf diagram.

Sum of the scores = 2252. Then the Arithmetic mean is 2252 divided by 36. Arithmetic mean = 63.

Some Properties of Arithmetic Mean:

The product of the Arithmetic Mean and the Number of scores gives the sum of the scores. This is

useful in finding the required mark to make a certain grade.

Example 1: Akua’s mean mark for her first 3 tests is 78.

i. Akua wants her mean mark for the course to be at least 80. What should be her

minimum mark on the 4th (and last) test if she is to get 80?

ii. What is the highest possible mean mark Akua can get in the course?

Solution:

i. For Akua to get a mean mark of 80 for 4 tests, the sum of her marks for the 4 tests must be equal

to the product of 80 (mean of the tests) and 4 (the number of tests). This is 320. The sum of Akua’s

mark for the first three tests is the product of 78 (mean of the 3 tests) and 3 (number of tests). This

is 234. The difference between the sum of the 4 tests and the 3 tests is Akua’s mark for the 4 th test.

That is the difference of 320 and 234. This is 86. That is Akua must get 86 on the 4 th test for her

mean for the course to be 80.

ii. The highest possible mark Akua can get on the 4th test is 100. The sum of the first three tests is

234. So the sum of the 4 tests cannot be more than the sum of 234 and 100. This is 334. Akua’s

maximum mean mark is 334 divided by 4. This is 83.5. So the highest possible mean mark Akua

can get on the course is 83.5.

The Range is the difference between the greatest number and the least number of in the data.

Example: For Test 1, from the Stem and Leaf diagram, the greatest number is 97, and the least

number is 7. So the Range for Test 1, is the difference of 97 and 7. So the Range is 90.

Another Statistical Diagram – Frequency Table: i. Ungroup and ii. Group

How often a score (number) appears in a data is called the frequency of that score. From the Stem

and Leaf diagram for Test 1, the score 46 appears 3 times, so the frequency of 46 is 3.

Frequency Table is a table of the scores (numbers) in a data and how often they appear in the data.

The Stem and Leaf diagram of a data makes it ‘easier’ to make the Frequency Table of the data.

Quite often Frequency Table diagrams are made without first making the Stem and Leaf diagram.

For Ungroup Frequency Table, the scores in the data are listed and then tally by going through the

data and putting a check mark opposite a score whenever it appears. The sum of the tally marks is

how often a score appears and is put under frequency opposite the score.

Scores Frequency Scores Frequency Scores Frequency

7 1 56 3 73 1

17 1 57 1 75 1

31 1 60 1 77 1

36 1 63 1 79 1

40 1 64 1 86 1

44 1 67 1 91 1

46 3 70 3 92 1

53 2 71 4 93 1

97 2

Note: The sum of the frequencies in a frequency table is equal to the number of scores. The

frequency table lends itself to many uses in finding statistical measures, the measures of central

7

tendency, and measures of dispersion. Example to find the Arithmetic Mean from a frequency

table; (i) multiply each score by its corresponding frequency, (ii) find the sum of the products in (i),

(iii) divide the sum in (ii) by the sum of the frequencies.

Exercise: From the frequency table for Test 1, find the Mode, Median, Arithmetic Mean, and

Range. Compare your answers to the answers obtained by using the Stem and Leaf diagram.

Notation : x (or y, or z) represents a score or number in a data, and f the frequency of a score.

N is the number of scores, and is equal to the sum of frequencies of the scores.

Symbol : ∑ (sigma) is the symbol for summation or addition. Example, ∑x means sum all (of)

the x (or scores) . ∑ f means sum all (of) f (or the frequencies).

∑x . (Sum of all scores, divided by number of scores.)

N

∑xf . Sum of all products of each score and

∑f

corredponding frequency, divided by the sum of the frequencies (which is the number of scores).

Example: Find the Arithmetic Mean for Test 1, in the Ungroup Frequency Table .

Ungroup Frequency Table for Test 1

x f xf x f xf x f xf

7 1 7 56 3 168 73 1 73

17 1 17 57 1 57 75 1 75

31 1 31 60 1 60 77 1 77

36 1 36 63 1 63 79 1 79

40 1 40 64 1 64 86 1 86

44 1 44 67 1 67 91 1 91

46 3 138 70 3 210 92 1 92

53 2 106 71 4 284 93 1 93

97 2 194

:∑ ∑

2

25

2

F

ro

mth

eta

ble f =2

x 25

2an

d f =3

6.B

yth

efo

rm

ula

; th

ea

rith

meticm

e

an

, x=

3

6

x =6

2.5

6 =6

3(n

ea

re

stw

ho

len

umb

er).S

oth

eme

anm

a

rkfo

rTe

st1is6

3.(A

so

bta

in

ede

arlie

r).

∑x

f isth

es

umo

fa

lln

u

mbe

r

sun

de

rth

ec

olu

mn' x

f ',s

im

ila

r

ly ∑fisth

es

um

Note: o

fa

llth

en

um

be

r

sun

de

rth

ec

olu

mn' f'.

Exercise: (i) Organize Test 2 in an Ungroup Frequency Table. (ii) Find the Arithmetic Mean of Test

2 (using the above procedure).

Measures of Dispersion: Variance and Standard Deviation

Notation: σ

2

,(

o

rs2

)r

ep

r

e

s

en

t

s

Va

r

ia

n

c

e

. σ

,(

ors

)r

e

pr

e

s

en

t

s

St

a

nd

a

r

dD

e

v

i

at

i

o

n.

N

o

t

e :T

h

e

s

q

u

ar

e

o

ft

h

e

St

a

n

da

r

d

De

v

i

a

t

io

n

i

st

h

e

Va

r

i

an

c

e

; o

r

S

t

a

nd

a

r

dD

e

v

i

a

t

io

n =V

a

r

i

a

n

c

e

Variance Standard Deviation

F

o

r

D

a

ta

n

o

ti

n

af

r

e

qu

e

n

c

yT

a

b

l

e F

o

r

D

a

ta

n

o

ti

n

af

r

e

qu

e

nc

y

Ta

bl

e

∑(x − x) . ∑(x − x) .

2 2

σ =

2

Sumof the squares of the σ =

2

Square Root, of the sum

N N

difference of each score and the mean, divided, of the squares of the difference of each score

by the number of scores. and the mean divided by the number of scores.

It simplifies to : It simplifies to :

∑x 2

∑x

()

2 2

σ = − x

()

2 2

N σ = − x

N

For Data in a frequency Table

∑ (x − x ) f

2

∑ (x − x ) f

2

σ2 =

∑f σ =

∑f

It simplifies to :

It simplifies to :

∑x f2

()

2

σ = − x ∑x f

2 2

∑f σ = − x ()

2

∑f

9

Comments: The Standard Deviation and the Variance as the formula shows, give a measure of a

spread of the scores of a data about or around the Arithmetic Mean (a measure of central tendency).

Unlike the Range, which depends only on the two extreme scores, the lowest and highest, the

Standard Deviation and the Variance is dependent on all the scores of a data. They are the most

widely used measures of dispersion especially in Inferential Statistics.

Data with a large number of scores are most often given in a frequency table, or first organized in a

frequency table before any further analysis. So as an example, the Variance and Standard Deviation

will be calculated for Test 1 from the frequency table of Test 1.

Example: Find the Variance and Standard Deviation for Test 1 (in the Ungroup Frequency Table).

Solution:

∑x f 2

∑x f

2

∑xf

() ()

2 2

σ 2

= − x isV

ariance and σ= − x isS

tandardD

eviation. x=

∑f ∑f ∑f

So foreachscore' x', andcorrespondingfrequency' f', thefollow

ingm

ustbefound; xf, andx 2 f.

Notation: x is score; f is frequency; xf is the product of a score and its frequency as its x2f.

7 1 7 49 64 1 64 4096

17 1 17 289 67 1 67 4489

31 1 31 961 70 3 210 14700

36 1 36 1296 71 4 284 20164

40 1 40 73 1 73

1600 5329

44 1 44 75 1 75

1936 5625

46 3 138 77 1 77

6348 5929

53 2 106 79 1 79

5618 6241

56 3 168 86 1 86

9408 7396

57 1 57 91 1 91

60 1 60 3249 92 1 92 8281

63 1 63 3600 93 1 93 8464

3969 97 2 194 8649

18818

Fromthetable : ∑f = 36; ∑xf = 2252 ; ∑x f

2

= 156 504

So var iance σ =2

− ∴ σ = 436.91 ≈ 437 (nearest

2

whole number )

36 36

156 504 2252 2

Stan dard deviation σ= − ∴ σ = 20.90 ≈ 21

36 36

Comment: The standard deviation acts as a unit of the scale of measurement of the scores in the

sense of the number of standard deviations of a score from the arithmetic mean.

Example: For Test 1, find the percentage of the number of scores that are within;

i. One standard deviation of the mean?

ii. Two standard deviations of the mean?

iii. 95, is how many standard deviations from the mean?

iv. Find the number of standard deviations, 7 is from the mean?

11

Solution :

The mean x = 62.56 and the Standard Deviation σ = 20.90

i. A number within one standard deviation of the mean is greater than or equal to x - σ

and less than or equal to x + σ . So; x - σ ≤ Number within one σ of the x ≤ x + σ .

Therefore, x - σ ≤ A score within one σ of the x ≤ x + σ .

Substituting, x - σ = 62.56 - 20.90 = 41.66 and x + σ = 62.56 + 20.90 = 83.46

From the Frequency Table or Stem Leaf, the number of scores greater than or equal to 41.66

and less than or equal to 83.46 is 25. This is the number of scores from 44 to 79. The number

of all the scores is 36. Therefore the percentage of the number of the scores that lie within one

standard deviation of the mean = 25

36 × 100% = 69.44%

Substituting, x - 2σ = 62.56 - 2 × 20.90 = 20.76 and x + 2σ = 62.56 + 2 × 20.90 = 104.36

From the Frequency Table or Stem Leaf, the number of scores greater than or equal to 20.76 and

less than or equal to 104.36 is 34. This is the number of scores from 31 to 97. The number of all

the scores is 36. Therefore the percentage of the number of the scores that lie within two standard

deviations of the mean = 34

36 × 100% = 94.44%

x − x

iii. For any number ' x': z = is the number of standard deviations of ' x' from the mean.

σ

95 − 62.56

So for x = 95; z = = 1.55. ∴ 95 is 1.55 standard deviations from the mean.

20.90

7 − 62.56

iv. From (iii), for x = 7; z = = − 2.66. ∴ the number of standard deviations, 7

20.90

is from the mean, is - 2.66.

Exercise: For Test 2, find the percentage of the number of scores that are within;

i. One standard deviation of the mean? ii. Two standard deviations of the mean? iii. 37, is how

many standard deviations from the mean? iv. Find the number of σ s, 91 is from the mean?

Group Frequency Table: Is a table of groups of scores and sum of the frequencies of the

individual scores in the group. A group of scores is called a class. Each score belongs to a class,

and can belong to only one class. So classes do not overlap. The other aspects of a class are: (i)

Class Limits (Lower and Upper), (ii) Class Size, (iii) Class Boundary (Lower and Upper), (iv)

Class Interval, and (v) Class Mark. These would be discussed at the appropriate points. Whilst

there is only one Ungroup Frequency Table for a given data, there is more than one Group

Frequency Table for the same data. The distinguishing features are the Class Size, which is the

number of scores in a class, and the Lowest or Greatest Class Limit.

Group Frequency Table 1 for Test 1

Test Marks of Students Number of Students Test Marks of Students Number of Students

Scores Frequency Scores Frequency

7 - 11 1 57 - 61 2

12 - 16 0 62 - 66 2

17 - 21 1 67 - 71 8

22 - 26 0 72 - 76 2

27 - 31 1 77 - 81 2

32 - 36 1 82 - 86 1

37 - 41 1 87 - 91 1

42 - 46 4 92 - 96 2

47 - 51 0 97 - 101 2

52 - 56 5

Comment: Each class has the same size, 5 (different scores). The lower limit of the fourth class is

22, and the upper limit of the first class is 11. In general the class sizes need not be equal.

Group Frequency Table 2 for Test 1

Scores Frequency Scores Frequency

4 - 13 1 54 - 63 6

14 - 23 1 64 - 73 10

24 - 33 1 74 - 83 3

34 - 43 2 84 - 93 4

44 - 53 6 94 - 103 2

Comment: Each class size is 10. The lowest limit is a score of 4 and the greatest limit 103. None of

these is a score of the data.

Large data is often given in a Group Frequency Table. This summarizes the data at the expense of

details. The larger the class size the shorter the summary and the more detail that is lost. It is

therefore necessary to balance brevity of summary against too much detail. This is comparable to

the assignment of grades to course marks. By the rule of thumb or by convention, the number of

classes must not be less than 5, and it must not be more than 25.

13

Mean, Variance, and Standard Deviation from Group Frequency Table: Each class is represented

by a Class Mark which then is given the frequency of the class. This ‘reduces’ the Group Frequency

Table to an Ungroup Frequency Table with the Class Marks as the scores, with frequencies of the

corresponding Classes.

Class Mark, x of a Class: Is the mean of the Lower and Upper Class Limits of the class. That is,

(Lower Class Limit + Upper Class Limit) ÷2.

Example: Find the Mean, Variance and Standard Deviation for Test 1 Group Frequency Table 2.

Solution: The following table is in reference to the formulas to be used;

Test Scores # of students: f Class Mark: x xf x2f = x(xf)

4 - 13 1 8.5 8.5 72.25

14 - 23 1 18.5 18.5 342.25

24 - 33 1 28.5 28.5 812.25

34 - 43 2 38.5 77 2964.5

44 - 53 6 48.5 291 14113.5

54 - 63 6 58.5 351 20533.5

64 - 73 10 68.5 685 46922.5

74 - 83 3 78.5 235.5 18486.75

84 - 93 4 88.5 354 31329

94 - 103 2 98.5 197 19404.5

Sum Σ 36 2246 154981

2246

M

ean x= = 62.388.. ∴M ean x = 62 (nearestw holenum

ber)

36

154981 2246 2

Variance σ2 = − = 412.654321 ∴Variance σ2 = 413 (nearestwholenum

ber)

36 36

and Stan dard D eviation σ = 20.31 (2decim

alplaces) and σ = 20(nearestw

holenumber)

Comment: Compare these values to the corresponding values for the Ungroup Frequency Table.

Frequency Table for Category Data: Is the ‘non-numerical’ attributes of the Category Data with

their corresponding frequencies.

Example: The Frequency Table of the following Category Data of Colour of Cars in a Car Park;

Blue, White, Blue, Blue, Black, Blue, Black, Silver, Silver, Blue, Silver, Green, Black, White,

Silver, Silver, Black, Blue, Green, Blue, Green, Red, Black, Red.

Frequency Table of Colour of Cars in Car Park

Score Frequency

Colour of Car Number of Cars

Blue 7

White 2

Black 5

Silver 5

Green 3

Red 2

Mode is Blue. That is there are more Blue cars than any other Colour of cars.

Comment:

Numerical Data, were organized by, (i) Stem and Leaf, and (ii) Frequency table for both ungroup

that is single score, and group that is class of scores and frequency: and calculated (i) the Measures

of Central Tendency or the Averages; Mode, Median, and Mean, and (ii) some of the Measures of

Dispersion; Range, Variance, and (from the Variance) the Standard Deviation.

Category Data was organized in a Frequency Table of a category attribute and frequency: and

calculated the Mode, a Measure of Central Tendency or Average. The mode is the only measure of

central tendency that makes sense for category data. There is no measure of dispersion, because

none make sense for a category data.

Statistical Graph is the pictorial representation of the relation between statistical variables, example,

scores and frequencies. The Pie, Line, and Bar graphs and the Histogram are examples of pictorial

representation of the relation between scores and frequencies. These graphs are about the most

common statistical graphs.

Numerical data can be represented by any one of the four graphs. Categorical data can be

represented by the Pie, and the Bar graphs, but cannot be represented by the Line graph or the

Histogram. Which graph to use, depends on the type of data, and the purpose of the graph.

15

Pie Graph or Chart:

The pie chart is a circle divided into sectors, to represent the proportion of the frequency of a

‘Score’, ‘Class’ or ‘Category’ to the number of scores (sum of frequencies of the scores).

Example:

Colour of Cars Number of Cars Pie Graph for the Colour of Cars in a Car Park

Blue 7

White 2

Black 5

Silver 5

Green 3

Red 2

i. Find the number of scores or sum of frequencies. For the example: N = 24

ii. Find the ratio as fraction of the frequency of a category to the sum of frequencies.

Find the product of the fraction and 360o. At the centre of a circle measure an angle

equal to the product. Draw the sector of the circle subtended by this angle. This

sector represents the frequency of the category. Example, for the category Blue, the

7

fraction is 24 . The product of the fraction and 360o is 105o. The sector subtended by

105o represents the proportion or percent of Blue cars to the Number of cars, or the

number of Blue cars.

iii. Repeat for each category. Indicate which sector is for what by legend or writing in

sectors.

Or use a computer software, for examples Excel, SPSS, MathLab, etc.

Comment: Pie charts are good for comparing relative frequencies of the Categories and to the

sum.

Bar Graph or Chart:

The Bar chart is a graph of rectangular bars (or blocks). The ‘width’ of a bar represents a ‘Score’,

‘Class’ or ‘Category’, and the Area of the bar is equal to the frequency of the score. The ‘length’ is

therefore equal to the frequency of the score divided by the width. If the widths are all equal then

the lengths are taken to be the corresponding frequencies.

Bar graphs are mainly used for pictorial comparison of the frequencies of scores. This includes how

the scores are distributed around the measures of central tendency.

Example: Draw the Bar graph for the data below:

Frequency Table of Colour of Cars in Car Park

Colour of Car Number of Cars

Blue 7

White 2

Black 5

Silver 5

Green 3

Red 2

17

Colour of Cars

The line graph has the score as the independent variable and the frequency as the value of a

function. The bar graph has ‘category’ on the horizontal axis as the base of a rectangle (same

width) and the frequency as the height. If it is group numerical data, the class defined by the lower

and upper class limit is the ‘category’ and then, the base of the rectangle is proportional to the class

size and the height is the frequency.

Histogram:

The Histogram is the graph formed by rectangles representing the classes of a group frequency table

of numerical data. The area of the rectangle for a class on a histogram is equal to the frequency of

the class. The lower and upper class boundary is the base, and the height of the rectangle is the

frequency of the class divided by the class interval (width). For equal class intervals the height of a

rectangle is ‘frequency’ of the class. There are no gaps between the rectangles of a histogram.

(There can be gaps between the rectangles of a Bar graph.)

One use of the histogram is to find the ratio or fraction of the scores between numbers of standard

deviations from the mean, (for example, one standard deviation from the mean) and the total

number of scores. This is the ratio or fraction of the area of the rectangles in the region (of interest)

to the total area of the histogram. These ratios interpreted as the probability of a score in the region

are used in statistical decision-making.

Test 1 Scores

Class

# of students: f

4 - 13

14 - 23

24 - 33

34 - 43

44 - 53

54 - 63

64 - 73

74 - 83

84 - 93

94 - 103

1

1

1

2

6

6

10

3

4

2

19

21

~~

Statistical Graphs: References

23

http://www.cimt.plymouth.ac.uk/projects/mepres/book9/bk9_8.pdf

http://www.cimt.plymouth.ac.uk/projects/mepres/book8/bk8_5.pdf

http://www.cimt.plymouth.ac.uk/projects/mepres/allgcse/pbtxt.pdf

- Chapter 7 Statistical Analysis Not MineUploaded byMC Badlon
- Valeport IpsUploaded byFranklin Hall
- Measures of Central Tendency and VariabilityUploaded bySherry Lou Pacuri Consimino
- Mean, Median and Mode Grouped DataUploaded byTheo Parrott
- Soalan Ramalan ADD MATHS+Skema [Times] 2011Uploaded byShin Azwan
- GFMJongsma Honours ThesisUploaded byGregorius Jongsmaa
- Statistics SBUploaded byEvie
- Homework.pdfUploaded bySayyadh Rahamath Baba
- Trigger modes for Power ControllersUploaded byRobert Markovski
- 3 5 a appliedstatisticsUploaded byapi-312626334
- Coffalyser v9 Quick Analysis Guide (PRE_RELEASE).pdfUploaded byLaseratorija Galaktika
- african435 kimario fall2016 3 1Uploaded byapi-352908941
- study test for fundamentals of statisticsUploaded byRaphael Atiyeh
- Hypericum So FranUploaded byButurăAlexandru
- Statistic Assignment 1Uploaded byAdam Zakwan
- stat probbility ging2.docxUploaded byErnalyn Hitgano Basaya
- The Median2Uploaded byCharlotte P. Bactol
- Engineering FormulasUploaded bykrishna_piping
- ReactionTime1SEUploaded byDezy Visbal
- BRM Ch 17Uploaded byAdeniyi Alese
- Analisis Univariat dan Bivar.docxUploaded bytitin
- Setting Alarms - ArdellUploaded byallmcbeall
- college math 2Uploaded byapi-401649604
- ch11Uploaded byahmed22gouda22
- SBE10ch10Uploaded byMohamed Med
- notes#6.pdfUploaded byi1958239
- BST510 Exam Feedback 2015-16 Plus(1)Uploaded bycons the
- Math Assignment.docxUploaded byTheo Parrott
- 2.3_-_Summary_Statistics.pdfUploaded byKyle N Samaroo
- Topic 1 - Data AnalysisUploaded byKenny Cantila

- Idamalayar Hydro Electric ProjectUploaded byRajesh TK
- Complete Stem ScienceUploaded byMoi Alcantara
- Image Classification using multi-spectral and multi-temporal satellite dataUploaded bySangita
- TM-T88V_hwum_EN Epson TM-T88V User's ManualUploaded bygivemeliberty2
- Michio Morishima-Dynamic Economic Theory (2009)Uploaded byLucas Carvalho
- User ManualUploaded byEdwin Giraldo
- The Characterization of Vertical Mean Temperature Over Indonesia From 1994 to 1998Uploaded byErvina Ahyudanari
- Micro Strip Patch AntennaUploaded bykartika_kraze
- Aboh 2010.pdfUploaded byRafael Berg
- Unit 1 RF & MWUploaded bysharonfranklin
- 110110-FinfetUploaded byBhautik Pandya
- 6.0-P.pf1 Hydrostatic Sensor Paver - APPOLO 550Uploaded byanoopsurendran
- 151727 Cambridge Learner Guide for Igcse PhysicsUploaded byNasTik
- 3607_Friis, Ib, Demissew, Sebsebe and Van Breugel, PauloUploaded byabatabraham
- Db4o NetBeans Plugin_0.6.0Uploaded byGonzalo Osco Hernandez
- Ball Valve Seat Seal Injection SystemUploaded bymudrijasm
- L10 - Finite State Recognizers.pdfUploaded byRenuka
- Scientific MethodUploaded byMichelle Talani
- ch4_p01-32Uploaded byJeanne Jackson
- Paper 7Uploaded byRakeshconclave
- Biomass - Detection Production and UsageUploaded byRahul Chauhan
- Carbon, Alloy and Stainless Steel Pipes - ASME_ANSI B36Uploaded byEdward Coraspe
- 46-74577MAN_TotalChromWorkstationVolume1.pdfUploaded byrizky amilia paramita
- Summary Breakdown Inloading & Outloading 2018.xlsxUploaded byRatmansyah
- Solving Simulatenous Equations Using MatricesUploaded byJonkka
- readme.txtUploaded bySerghei Plamadeala
- Improved System OperationUploaded byRafi Muhammed
- ! 23-ORC-051613Uploaded bysapcuta16smen
- 6 week training reportUploaded byRavi Kumar
- Ventilation in Hospital ProjectsUploaded byMAGDY KAMEL