Data Management 2

DATA
MANAGEMENT
LEARNING OUTCOMES
■ Use variety of statistical tools to process and manage numerical data;

■ Use the methods of linear regression and correlations to predict the value of a variable
given certain conditions; and
■ Advocate the use of statistical data in making important decisions.
GATHERING AND
ORGANIZING DATA
CATEGORICAL DATA
1. Nominal Scale consists of a finite set of possible values or categories that have unordered
scales.
Example:
Cause of death (Cancer, heart attack, accident, and others) Blood type, nationality, occupation,
civil status, and so on.
2. Ordinal Scale consists of a finite set of possible values or categories which have ordered
scales
Example:
Pain level (none, mild, moderate, severe)
Social status, attitude toward abortion, cancer stages, socio-economic status.
4
CONTINUOUS SCALE
3. Interval scale –is generally measured on a continuum and differences

between any two numbers on the scale that are of known size.
Examples: tons of garbage, number of arrest, income, age, and so on.
4. Ratio scale – like the interval scale is also measured on a meaningful
continuum. The distinction is that the ratio scale has a meaningful zero
point. Weight in pounds is a good example of ratio scale, another is height
and age.
5
VARIABLE
■ Refers to a property that can take on different values or

categories which can not be predicted with certainty.
Types of variables
■ RESPONSE VARIABLE – a variable which is affected by the
value of the other variable. This may be continuous, ordinal or
nominal.
-dependent variable or Y variables.
■ EXPLANATORY VARIABLE – a variable that is thought to
affect the values of the response variable.
- independent variable or x variables.
■ CONTROL VARIABLE – a variable that must be constant in
an experiment
8
classification OF VARIABLES
Quantitative Variables Qualitative Variables
Categories can be measured and Categories are simply used as
ordered according to quantity. labels to distinguish one group
They are intrinsically numeric. from another, rather than as
For example: basis for saying that one group is
- the heights of adult males, greater or less, higher or lower,
- the weights of preschool children, or better or worse than the other.
- the ages of patients seen in a For example:
dental clinic.
- The number of children in a - Cause of death, nationality, race,
family gender, severity of pain
- (either nominal or ordinal)
9
A discrete variable A continuous variable
refers to each element of a set of Refers to each element of a set of
possible values that is either possible values including all values
finite or countably infinite that in an interval of the real line that
can appear only as whole can be expressed with fractions or
number. digits after a decimal point.
For example:
For example: - Height,
- The number of daily admissions - weight,
to a general hospital, - skull circumference.
- The number of decayed, No matter how close together the
missing or filled teeth per child observed heights of two people, we
in an elementary school. can find another person whose
height falls somewhere in between.
10
DATA PRESENTATIONS
TEXTUAL PRESENTATION
■ Uses statements with numerals in order to describe the

data for the concrete information and in expository
form. It is to discuss the data and the information and
interpretation it carries
Example:
Tabular presentation
■ Uses statistical table to directly display the quantities or values collected
as data.
■ Parts of the table:
Graphical presentation
■ Illustrates data in a form of graphs aiding readers to understand the text
easily. A graph is the most attractive, effective and convincing way. There
are various types of graphs like bar graph, pie graph, line graph or
pictograph
Example:
FREQUENCY DISTRIBUTION TABLE
1. Arrange the data from lowest value to highest value (array).
2. Determine as to estimate number of classes k, k = 1 + 3 log n, where n is the number
of population
3. Determine the range, r = highest value – lowest value
4. Obtain the class size, c = range/k
5. Set the lowest value as the first lower limit and get the upper limit which is equal to the
first lower limit + class size – 1
6. Do the same process again until you reach the last class limit that includes the highest
value from the data.
Construct a frequency polygon for the
following data
11 19 11 15 16 10
16 16 15 17 10 27
21 11 13 21 10 16
11 19 24 12 22 13
19 13 18 20 21 11
19 15 11 25 29 23
16 23 10 17 11 27
16 24 12 21 13 12
26 15 11 14 10 12
11 15 18 12 20 13
Array:
10 11 13 16 19 22
10 11 13 16 19 23
10 11 13 16 19 23
10 11 13 16 19 24
10 12 14 16 20 24
11 12 15 16 20 25
11 12 15 17 21 26
11 12 15 17 21 27
11 12 15 18 21 27
11 13 15 18 21 29
2. no. of classes
K = 1 + 3 log 60
= 1 + 3 (1.7781512)
= 6. 334
3. Range = highest – lowest value
29 – 10 = 19
4. Class size = range/no. classes (k)
= 19/6 = 3.16 class size = 3

FREQUENCY DISTRIBUTION TABLE
CLASS FREQUENC CUMULATIV RELATIVE CUMULATIV TRUE
LIMITS Y E FREQUENC E RELATIVE CLASS
FREQUENC Y FREQUENC LIMITS
Y Y
9 – 11 14 14 0.23 0.23 8.5-11.5
12 – 14 11 25 0.18 0.41 11.5-14.5
15 – 17 13 38 0.22 0.63 14.5-17.5
18 – 20 8 46 0.13 0.76 17.5-20.5
21 – 23 7 53 0.12 0.88 20.5-23.5
24 – 26 4 57 0.07 0.95 23.5-26.5
27 – 29 3 60 0.05 1 26.5-29.5
n = 60 1
total
Construct a frequency distribution and
frequency polygon of the ages of 40
patients with hypertension.
54 35 47 58 20 36 22 41 23 16
67 45 80 40 78 48 48 48 37 27
68 62 74 28 70 39 51 60 42 34
42 54 63 25 63 80 60 75 38 48
MEASURES OF CENTRAL
TENDENCY
Measures of Central Tendency
A measure of central tendency is a measure which

indicates where the middle of the data is.
The three most commonly used measures of central
tendency are:
The Mean, the Median, and the Mode.
The Mean:
It is the average of the data.
27
The Population Mean:
N
=
 X which is usually unknown, then we use the
i 1
i
sample mean to estimate or approximate it.

The Sample Mean:
x
n
= x
i 1
i
n
Example:
Here is a random sample of size 10 of ages, where
 1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,
6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37.
x = (42 + 28 + … + 37) / 10 = 36.6
28
Properties of the Mean:
■ Uniqueness. For a given set of data there is one and
only one mean.
■ Simplicity. It is easy to understand and to compute.
■ Affected by extreme values. Since all values
enter into the computation.
Example: Assume the values are 115, 110, 119, 117, 121 and 126. The
mean = 118.
But assume that the values are 75, 75, 80, 80 and 280. The mean =
118, a value that is not representative of the set of data as a whole.
29
MEAN for grouped data:
=
Where:
= classmark (midpoint)
= frequency
n = total number of frequency
The Median:
When ordering the data, it is the observation that divide the set of
observations into two equal parts such that half of the data are
before it and the other are after it.
* If n is odd, the median will be the middle of observations. It will be
the (n+1)/2 th ordered observation.
When n = 11, then the median is the 6th observation.
* If n is even, there are two middle observations. The median will be
the mean of these two middle observations. It will be the (n+1)/2
th
ordered observation.
When n = 12, then the median is the 6.5th observation, which is an
observation halfway between the 6th and 7th ordered observation.
31
Example:
For the same random sample, the ordered observations
will be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. =
(32+34)/2 = 33.
Properties of the Median:
■ Uniqueness. For a given set of data there is one and
only one median.
■ Simplicity. It is easy to calculate.
■ It is not affected by extreme values as is the
mean. 32
MEDIAN for grouped data
+
Where:
median
LL = lower class boundary containing the median class
<cf= less than cumulative frequency preceding the median class
frequency of the class interval containing the modal class
i = class interval
n= total number of frequency
The Mode:
It is the value which occurs most frequently.
If all values are different there is no mode.
Sometimes, there are more than one mode.
Example:
For the same random sample, the value 28 is repeated two times, so it is the mode.
Properties of the Mode:

■ Sometimes, it is not unique.
■ It may be used for describing qualitative data.
34
MODE for grouped data
=+
Mode
= Lower class boundary
= frequency of the class interval containing the modal class
= frequency of the class before the modal class
= frequency of the class after the modal class
= class size
n = total number of frequency
Find the mean, median and mode of the
following data:
CLASS FREQUENCY (f) <cf
INTERVAL
18-26 8
27-35 13
36-44 21
45-53 6
54-62 12
MEASURES OF
VARIABILITY
MEASURES OF VARIATION OR
DISPERSION
Measures of Dispersion:
■ A measure of dispersion conveys information regarding the
amount of variability present in a set of data.
■ Note:
1. If all the values are the same
→ There is no dispersion .
2. If all the values are different
→ There is a dispersion:
3.If the values close to each other
→The amount of Dispersion small.
b) If the values are widely scattered
→ The Dispersion is greater.
38
Measures of Dispersion are :
1.Range (R).
2. Variance.
3. Standard deviation.
4.Coefficient of variation (C.V).
39
1.The Range (R):
■ Range =Largest value- Smallest value =x L  x S
■ Note:
Range concern only onto two values
■ Data:
43,66,61,64,65,38,59,57,57,50.
■ Find Range?
Range=66-38=28
40
2.The Variance:
■ It measures dispersion relative to the scatteredness of the values
about the mean.
2
a) Sample Variance (S ) :
n
■  ( x  x ) ,where x is sample mean

i
2
S2  i 1
n 1
■ Find Sample Variance of ages , x = 56

Solution:
S2= [(43-56) 2 +(66-43) 2+…..+(50-56) 2 ]/ 10 = 900/10 = 90
41
2
■ b)Population Variance 
( ):
where , is Population mean
N
 i
( x   ) 2
 2
 i 1
N
3.The Standard Deviation:
■ is the square root of variance= Variance
a) Sample Standard Deviation = S = S2
b) Population Standard Deviation = σ =  2
42
4.The Coefficient of Variation (C.V):
Is a measure use to compare the dispersion in two sets of data which is
independent of the unit of the measurement .
where S: Sample standard deviation.
X : Sample mean.
S
C.V  (100)
X
43
Example
■ Suppose two samples of human males yield the following
data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound
44
■ We wish to know which is more variable.
Solution:
■ c.v (Sample1)= (10/145)*100= 6.9
■ c.v (Sample2)= (10/80)*100= 12.5
■ Then age of 11-years old(sample2) has more variation
45
Exercises
Find the range, variance and standard deviation of the following data:
1.) 4, 5, 7, 8, 4, 6, 10, 12, 5, 13,11, 15, 8, 7, 9

2.) 24, 45, 33, 23, 15, 34, 45, 56, 52, 38
46

Data Management 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Management 2

Uploaded by

Copyright:

Available Formats

DATA

■ Use variety of statistical tools to process and manage numerical data;

3. Interval scale –is generally measured on a continuum and differences

■ Refers to a property that can take on different values or

■ Uses statements with numerals in order to describe the

= 19/6 = 3.16 class size = 3

A measure of central tendency is a measure which

sample mean to estimate or approximate it.

x = (42 + 28 + … + 37) / 10 = 36.6

Properties of the Mode:

■  ( x  x ) ,where x is sample mean

■ Find Sample Variance of ages , x = 56

■ c.v (Sample2)= (10/80)*100= 12.5

■ Then age of 11-years old(sample2) has more variation

1.) 4, 5, 7, 8, 4, 6, 10, 12, 5, 13,11, 15, 8, 7, 9

You might also like