You are on page 1of 68

LESSON 4

DATA MANAGEMENT

6
5
CLUSTER CHART
4
3
2
1
0
Category 1 Category 2 Category 3 Category 4

Series 1 Series 2 Series 3


DATA MANAGEMENT
A. Basic Statistical Concept
B. Measures of Central Tendency
C. Measures of Dispersion
What is Data Management?

 The science of collecting, organizing,


presenting, analyzing, and interpreting
numerical data.
 Itrefers also to the mere tabulation of numeric
information in reports of stock, market,
transactions, or to the body of techniques used
in processing or analyzing data.
BASIC STATISTICAL CONCEPTS
What is statistics?
It is a collection of methods for planning experiments,
obtaining data and then analyzing, interpreting and drawing
conclusions based on the data.

There are two types of Statistics:


Descriptive Statistics
-It is a branch of mathematics that involves the collection of
organization, summarization and presentation of data.
Inferential Statistics
-the branch that interprets, and draws conclusions from the
data.
BASIC TERMS IN STATISTICS

 Data are the values that the variables can assume


A variable is a characteristics that is observable
or measurable in every unit of universe.
 Population is the set of all possible values of a
variable.
 Sample is a subgroup of a population.
CLASSIFICATION OF VARIABLES

 Qualitative Variables
- words or codes that represent a class or category.
 Quantitative Variables
- number that represent an amount or a count.
 Continuous Variables
- it can assume all values between any two specific
values like 0.5, 1.2 etc and data that can be measured.
LEVELS OF MEASUREMENT
 Nominal Level
- this is characterized by data that consist of names, labels, or
categories only.
 Ordinal Level
- this involves data that arranged in some order, but differences
between data.
 Interval Level
- this is the same in ordinal level, with an additional property that
we can determine meaningful amounts of differences between the data.
 Ratio Level
- this is an interval level modified to include the inherent zero
starting point.
SAMPLING AND SAMPLING TECHNIQUES
Sampling refers to the process of obtaining samples from the population.
Sampling maybe categorized as either:
 Probability Sampling or
 Non-Probability Sampling

Probability sampling, also referred to as random sampling, is the


method of sampling in which every member of the population have equal
chance of being selected as sample.

In Non-Probability sampling, not every member of the population has


the equal chance of being selected. It can rely on the subjective
judgement of the researcher.
Probability Sampling Techniques
1. Simple Random Sampling – A probability sampling
technique wherein all possible subsets consistent of n
elements selected from the N elements of the population
have the same chances of selection.

LE
POP SAMP
U LA
TION
Probability Sampling Techniques
2. Systematic Sampling. This is a probability technique wherein the
selection of the first element is at random and the selection of other
elements in the sample is systematic by subsequently taking every
kth element from the random start where k is the sampling interval.

1 6 11 16

2 17 POPULATION
7 12
Interval – 8
3 13 18
every 5
th

element
14 19
4 9
SAMPLE
5 10 15 20
Probability Sampling Techniques
3. Stratified Random Sampling. A probability sampling method
where we partition the population into non-overlapping strata or
group and then a proportional sample is chosen from each strata. The
actual sample is the sum of the samples derived from each strata.
Probability Sampling Techniques
4. Cluster Sampling. A probability sampling technique wherein we
partition the population into non-overlapping groups or clusters
consisting of one or more elements, and then select a sample of
clusters. Every member of the selected cluster will be considered as
sample.
POPULATION

SAMPLE
Non-Probability Sampling Techniques
1. Accidental Sampling. Sample is chosen by the researcher
by the obtaining members of the population in a convenient,
often haphazard way.
Non-Probability Sampling Techniques
2. Quota Sampling. There is specified number of persons of certain
types is included in the sample. The researcher is aware of categories
within the population and draws samples from each category. The size
of each categorical sample is proportional to the proportion of the
population that belongs in that category.

POPULATION
SAMPLE
Non-Probability Sampling Techniques
3.Purposive Sampling. The researcher employs his or her
judgments on choosing which he or she believes are
representative of the population.
Non-Probability Sampling Techniques
4. Snowball Sampling. This technique is also called referral sampling.
A primary set of samples are chosen based on the criteria set by the
researcher. Information on where to find succeeding set of sample
having the same criteria will be gathered from this primary set in
order to expand the number of samples.
RESEARCHER
Methods of Data Collection

1. Survey Method. The survey is a method of collecting data on the


variable of interest by asking people questions. This may be done, by
interview or by using questionnaires.
2. Observation. Observation is a method of obtaining data or
information by using our primary senses.
3. Experiment. Experiment is a method of collecting data where there is
direct human intervention on the conditions that may affect the values
of the variable of interest.
Presentation of Data
After data have been collected, the researcher can now present
them in the following logical methods.

1. Textual Form. Data are presented in paragraph of text.


The text highlights the important figures or results that the
researcher wishes to focus on.
In the Statistics class of 40 students, 3 obtained the perfect score of
50. Sixteen students got a score of 40 and above, while only 3 got 19
and below. Generally, the students performed well in the test with
23 or 70% getting a passing score of 38 and above.
Presentation of Data
2. Tabular Form. Data appears in a systematic manner in rows and columns.
The following is an example of a Simple or One-Way
Table. Table 1
Frequency Distribution of the Students
Enrolled for the Last 6 Years

Year Frequency
The following is an example of a Two-Way Table.
2012 13,450

2013 13,200

2014 15,389

2015 16,790

2016 18,900

2017 19,500

Total 97,299
Presentation of Data
3. Graphical Form. Data or relationship among variables could be
presented in visual form, thru graph or diagrams. In that manner, the
reader can easily perceive what is being meant by the figure or any
trend being portrayed by the data.
Types of Statistical Charts

(a) Bar Graph (Vertical Bar/Column Charts) is applicable for


showing comparison of amount of a variable of interest
collected over time.
Types of Statistical Charts
(b) Histogram is similar to the bar graph but the base of the
rectangle has a length exactly equal to the class width of the
corresponding interval. Also, there are no spaces between
rectangles.
Types of Statistical Charts
(c) Pictograph is similar to the bar chart but instead of bars,
we use pictures or symbols torepresent a value or an
amount.
Types of Statistical Charts
(d) Pie Chart is a circular graph partitioned into several
section, depicting relative percentage with respect to the
total distribution.
Types of Statistical Charts
(e) Line Graph is a graph used to visualize data that changes
continuously over time.
Types of Statistical Charts
(f) Statistical Map is used to show data in geographical
areas.
Sample Size Considerations
The sample size is typically denoted by n and it is always a
positive integer. No exact sample size can be mentioned
here and it can vary in different research settings. However,
all else being equal, large sized sample leads to increased
precision in estimates of various properties of the
population. To determine the sample size we can apply one
of the following methods:
1. Slovin’s Formula
2. Minimum Sample Size for Estimating a Population Mean
3. Minimum Sample Size for Estimating a Population
Proportion
Slovin’s Formula

Slovin’s formula is used to calculate the sample size n given the population size
and a margin of error E. It is a formula use to estimate sampling size of a random
sample from a given population. We can compute

Where N is the population size.


Example

A researcher plans to conduct a survey about food


preference of BS Stat students. If the population of
students is 1000, use the Slovin’s formula to find the
sample size if the margin of error is 5%.
Minimum Sample Size for Estimating a
Population Mean
Minimum Sample Size for Estimating a Population Mean. The estimated minimum
sample size n needed to estimate a population mean — to within E units at
100(1− ¸)% confidence is

Where is the known population standard deviation, E is the margin of error and z¸=2 is a value
which can be obtained in the z-table.
Example

Suppose we want to know the average age of STEM


students. We would like to be 99% confident about
our results. From previous study, we know that the
standard deviation for the population is 1.3. How
many students should be chosen for a survey if the
margin of error is 0.2.
Minimum Sample Size for Estimating a
Population Proportion
The estimated minimum sample size n needed to estimate a population
proportion p to within E at 100(1−¸)% confidence is

This is also called the Cochran Formula.

The dilemma here is that the formula for estimating how large a sample to take contains
the number pˆ, which we know only after we have taken the sample. There are two ways
out of this dilemma.
Minimum Sample Size for Estimating a
Population Proportion
• First, typically the researcher will have some idea as to the value
of the population proportion p, hence of what the sample
proportion pˆ is likely to be. For example, if last month 37% ofall
voters thought that state taxes are too high, then it is likely that
the proportion with that opinion this month will not be dramatically
different, and we would use the value 0.37 for p^ in the formula.
• The second approach to resolving the dilemma is simply to replace
pˆ in the formula by 0.5. This is because if p is large^ then 1 − pˆ is
small, and vice versa, which limits their product to a maximum
value of 0.25, which occurs when p = 0:5. This^ is called the most
conservative estimate, since it gives the largest possible estimate of
n
Example:

Suppose we are doing a study on the inhabitants of


a large town, and want to find out how many
households serve breakfast in the mornings. We
don’t have much information on the subject to
begin with, so we’re going to assume that half of
the families serve breakfast: this gives us maximum
variability. Here, ^p = 0:5. We want 95% confidence
and at least 5% precision.
Finite Population Correction for
Proportions
If the population is small then the sample size can be reduced
slightly. This is because a given sample size provides
proportionately more information for a small population than a
large population. The formula is

Where n 0 is the Cochran’s sample size recommendation, N is


the population size and n is the new adjusted size.
MEASURES OF CENTRAL TENDENCY

 Measures of central tendency are methods that can use


to determine information regarding average, ranking, and
category of any data distribution. Mean, Median and Mode
are the three tools in obtaining the measures of central
tendency. But only by knowing and using the appropriate
tool that most accurate estimation of centrality can be
achieved.
IMPORTANCE OF MEASURE CENTRAL
TENDENCY

To find representative value.


To make more concise data.
To make comparisons.
Helpful in further statistical analysis
 Are used to estimate "normal" values
of a dataset.
What is mean in statistics?

The Mean is obtained by


computing the sum, or total, for
the entire set of observations or
values, then dividing this by sum of
the number of observations or
values.
The mean formula is given as the average
of all the observations. It is expressed
as Mean = {Sum of Observation} ÷ {Total
numbers of Observations}.
What is the median in statistics?
 Median is one of the important measure of central
tendency. Median is the middle value or mid point
of the given data, when data arranged in an order
viz., smallest to larger or ascending order.
• With an odd number of scores, list the values in
order, and the median is the middle score in the
list.
• With an even number of scores, list the values
in order, and the median is half-way between the
middle two scores.
How to calculate the Median
 Arrange the data sets in ascending order (from
the lowest to the largest value).
 Ifthe dataset contains an odd number of values,
the median is a central value that will split the
dataset into halves. For the Odd data series, the
equation is

 Where N is the number of observations/


values/ scores in a data
If the dataset contains an even number of values,
find the two central values that split the dataset
into halves. Then, calculate the mean of the two
central values. For Even data series the equation is

 Where N is the number of


observations/ values/ scores in a data
HOW TO FIND THE MEDIAN

To find the median: Arrange the data


points from smallest to largest. If the
number of data points is odd, the median is
the middle data point in the list. If the
number of data points is even, the median is
the average of the two middle data points in
the list.
What is the Mode in statistics
 The Mode is the least used measure of central tendency.
The Mode of a data set is the number/observation/value
that occurs most frequently in the set. To find the
Mode, put the numbers in order from least to greatest
and count how many times each number occurs. The
number that occurs the most is the Mode.
 In a set of data, there may be no Mode if no value
appears more than any other. There may be Two Modes
(bimodal), Three Modes (trimodal), or Four or more
Modes (multimodal).
HOW TO FIND THE MODE
WHAT IS DISPERSION IN STATISTICS?
 Dispersionis the state of getting dispersed
or spread. Statistical dispersion means the
extent to which numerical data is likely to
vary about an average value. In other
words, dispersion helps to understand the
distribution of the data
Measures of Dispersion

 In statistics, the measures of dispersion


help to interpret the variability of data i.e.
to know how much homogenous or
heterogeneous the data is. In simple terms,
it shows how squeezed or scattered the
variable is.
 
THE RANGE
The simplest measure of dispersion is the range. It is the difference between the largest and the smallest values in a data set.
Range
Range = Largest value – Smallest value
Range = Largest value – Smallest value
13 - 1
range= 12
Mean Deviation
A defect of the range is that it is based on only two values, the highest
and the lowest; it does not take into consideration all of the values. The
mean deviation does. It measures the mean amount by which the
values in a population, or sample, vary from their mean.
In terms of a definition: Mean Deviation is the arithmetic mean of the
absolute values of the deviations from the arithmetic mean.

The mean deviation has two advantages ,

First, it uses all values in the computation while range only uses the highest
and lowest values. ,
Second it is easy to understand −is the sum (total) of all the values in a set of
data such as number or measurements divided by the number of values on the
list.
 
 

 
Examples of the mean deviation(ungrouped data)

Determine the mean deviation for the data values


4,3,5,8,1,9
Given data values are 4,3,5,8,1,9

First, we will find the mean for the given data:


Mean, μ=4+3+5+8+1+9

μ=30(total value) ⇒μ=5 (numbers of given data)

Therefore, the mean value is 5.


 Now, subtract each mean from the data value, and
ignore the minus symbol
4–5=1
3–5=2
5–5=0
8–5=3
1–5=4
9–5=4
Now, the obtained data set is 1,2,0,3,4,4 Let us find
the mean value for the obtained data set. Therefore,
the mean deviation is 1+2+0+3+4+4=14 =2.3 Hence, the
mean deviation for 4,3,5,8,1,9is2.3
THE STANDARD DEVIATION
The of a set of numerical data makes use of the individual
amount that standard deviation each data value deviates
from the mean. These deviations, represented by (x−x´), are
positive when the data value is greater than the mean x ´
and are negative when x is less than the mean ´x . The sum
of all the deviations (x−´x), is 0 for all sets of data.
Procedures for Computing a Standard Deviation

The procedure to calculate the standard deviation is


given below:

Step 1: Compute the mean for the given data set.


Step 2: Subtract the mean from each observation
and calculate the square in each instance.
Step 3: Find the mean of those squared deviations.
Step 4: Finally, take the square root obtained mean
to get the standard deviation.
Variance
A statistic known as the variance is also
used as a measure of dispersion. The
variance for a given set of data is the of
the data is the square of the standard
deviation of the data.
Where;
σ=variance of
population
s=variance of
sample
x=numerical data
μ=sample mean
n=number of data
How Do I Calculate Variance?
Follow these steps to compute variance:
 Calculate the mean of the data.
 Find each data point's difference from the mean
value.
 Square each of these values.
 Add up all of the squared values.
 Divide this sum of squares by n – 1 (for a sample)
or N (for the population).

You might also like