Professional Documents
Culture Documents
statistics that involves the collection, Fisher, Jerzy Neyman, Karl Pearson and
organization, summarization and presentation Pearson’s son, Egon Pearson.
of data. A statistical method that is used in making
Molly Galetto (2016), DATA MANAGEMENT is statistical decisions using experimental
the administrative process by which the data.
required data is acquired, validated, stored, Used to determine whether the hypothesis
protected, and processed, and by which its is a reasonable statement and should not
accessibility, reliability, and timeliness is be rejected, or is unreasonable and should
ensured to satisfy the needs of the data users. be rejected.
HYPOTHESIS TEST evaluates two mutually Hypothesis are never proven through
exclusive statements about a population to hypothesis testing; rather they are accepted
determine which statement is best supported or rejected through the use of statistical
by the sample data and justified by the tests.
statistical test/s used. Types of Hypotheses
There are two categories of data namely: Null hypothesis (Ho)
Categorical data are nominal scales(finite • Defined as the hypothesis of no
set of possible values with no particular difference. • Formulated with the purpose
order like gender, civil status, of being accepted or rejected.
occupation)and ordinal scales Alternative hypothesis (H1 or Ha)
Continuous scale has interval scale (like • A claim that disagrees with the null
temperature, time tons of garbage, income, hypothesis (Kiernan, 2020).
number of arrests, age) and ratio scale • Sometimes referred to as the “research
(height, money, age, weight) hypothesis” (La Trobe University, 2020).
Continuous data are numerical data that • This indicates that a difference does exist
can theoretically be measured in infinitely between two or more variables.
small units. • This represents what the researcher
PERCENTAGES or RATIOS summarize two pieces hopes to find to be true.
of information, namely their constituent TEST STATISTIC – quantity calculated from
numerator and denominator values. sample data and is used in making the decision
Simple ratios (0 to 1, i.e. the denominator is the to “reject Ho“or “do not reject Ho.
maximum possible value that the numerator Types of Statistical Tests (Wills, n.d.)
can take) can be treated as continuous data. • Parametric tests (Broto, 2008)
These data set can be organized and presented • Nonparametric test
using graphs: Types of Correct Decision (Febre, 1987)
Bar graphs • Type A: This occurs when the null hypothesis
Histogram is true and we decide on its favor.
Pictograph • Type B: This occurs when the null hypothesis
Line graphs is false, and the decision is in opposition to the
Circle graphs / Pie charts null hypothesis
The data gathered should be accurately Types of Error (Febre, 1987)
organized in to grouped data called frequency • Type I: Reject the null hypothesis when in fact
distribution and presented in a frequency it is true; denoted by.
polygon. • Type II: Accept the null hypothesis when in
FREQUENCY POLYGON which is defined as the fact it is false; denoted by.
graph that uses lines to join the midpoints of Location of the Critical Region
the classes. One-tailed test. This is performed when the
VARIABLE that can be measured and ordered results are in one direction.
according to quantity is quantitative while • Two-tailed test. The two-tailed test is
qualitative is simply used as labels to performed when the results are in either
differentiate one group from the other. direction
HYPOTHESIS – A statement or tentative theory STATISTICAL TOOLS derived from mathematics
which aims to explain facts about the real are useful in processing and managing data in
world. order to describe a phenomenon and predict
A specific statement of prediction values.
Describes in concrete terms what is STATISTICS – A body of knowledge that deals
expected by the researcher in the study with the collection, organization, or
presentation, analysis, and interpretation of
data.
Steps in Conducting a Research: b. Continuous variable- one that can
Collection – Gathering of information or assume a infinite values within a specified
data. interval (measuring).
Organization/Presentation – Summarizing c. Dependent variable- a variable that is
data or information in textual, graphical, or affected or influenced by another variable
tabular forms d. Independent variable- a variable which
Analysis - Describing the data by using affects or influences the dependent variable
Statistical methods and procedures. Constant “A property or characteristic of a
Interpretation – Process of making population or sample, which makes the
conclusions based on analysed data. members of the group similar to each other.
History of Statistics: Scales of Measurement
Ancient Egypt – Prepared list of all the Nominal Scale – Used when we want to say
heads of the families one object is different from another for
Ancient Judea – Census was taken in identification purposes.
several occasions Ex. Gender, Nationality & Civil status
1st Roman Census – Census was repeated Ordinal Scale – Data are arranged in
sixty-nine times, the most famous was specified order or rank.
written in the Bible Ex. *Ranking of contestants in a beauty
Categories: Statistics contest
Descriptive Statistics – statistical procedure Interval Scale – Data is greater or less than
concerned with describing the another and the amount of difference can
characteristics and properties of group of be specified.
persons, places or things. Ex. *Scores in an examination * Pearl got 48
- Examples: How poor /how rich a group in English examination while Jeanne got 35.
of people in a community Ratio Scale Like the interval scale the only
- How many are literate/illiterate difference is the ratio level, it always starts
Inferential Statistics – statistical procedure from an absolute or true zero point.
that is used to draw inferences or Ex. *betty weighs 45N, while her friend
information about properties or Carla weighs 35N
characteristics of large group of people, Measures of dispersion – enhance the
places or things or the basis of information information given by the measures of central
obtained from a small portion of a large tendency.
group. dispersion is said to be relevant when there
- Examples: Suppose we wanted to know is variation or lack of uniformity in the size
the most favourite brand of deodorant of items of a series
soap of a very large community and we Measures of Central Tendency – is the numbers
do not have enough time and money to that describe what is average or typical of the
interview all the residents of the distribution.
barangay Also called as measures of the first order.
Terminologies in Statistics: Descriptive statistics – is the branch of statistics
Population – large collection of objects, involves the collection, organization,
persons, places, or things. summarization, presentation, and
Sample – small portion or part of a interpretation of data while the branch that
population. interprets and draws conclusion from the data
Parameter – Any numerical or nominal is called inferential statistic.
characteristic of a population Inferential statistics are used when data is
Statistic – value or measurement obtained viewed as a subclass of a specific population.
from a given sample Measures of Central Tendency
Data Facts – or set of information or Median
observations under study. Mode
a. Qualitative data - cannot be subjected to Weighted Mean
arithmetic operations. Example: gender and The MEAN, Mn is also called the arithmetic
nationality mean or average.
b. Quantitative data – numerical in nature A. Ungrouped Data
and obtained from counting or Mean (Mn) or 𝑿̅ = 𝒔𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔 𝒕𝒐𝒕𝒂𝒍
measurement. Example: test scores and 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
height of students B. Grouped Data
a. Discrete variable- one that can assume a Md = XLB + (N / 2 – cfb)i /fm
finite number of values (counting)
The MODE category or score with the largest This is more reliable than the range and
frequency (or percentage) in the distribution. quartile deviation.
The Mode of Grouped Data Mean absolute deviation (MAD) of a data set is
Mo = XLB + df1 / (df1 + df2) i the average distance between each data value
and the mean and it is a way to describe
WEIGHTED MEAN is an average computed by variation in a data set and the mean and it is a
giving different weights to some of the way to describe variation in a data set.
individual values. Mean absolute deviation helps us get a
It represents the average of a given data. sense of how "spread out" the values in a
Calculated when data is given in a different data set are.
way compared to an arithmetic mean or MEASURES OF RELATIVE POSITION – another
sample mean. most appropriate measures found to be useful
A. Ungrouped Data Weighted Mean in describing a distribution of observations.
WMn = ∑f X /N These measures are vital in the
DIPERSION – the measure of the variation of interpretation of quantitative variables,
items. –A.L Bowley where we are often interested in where a
DIPERSION – measure of the extent to which particular value falls in the distribution.
the individual items vary. –L.R Conor The most common measures of position are
DIPERSION – spread is the degree of the scatter percentiles, quartiles, deciles and standard
or variation of the variables about a central scores (aka, z-scores) and the box-and-whisker
value. – B.C Broooks & W.F.L Dicks or Boxplot.
STATISTICIANS use summary measures to PERCENTILE is a measure used in statistics
describe the amount of variability or spread in a indicating the value below which a given
set of data. percentage of observations in a group of
Range observations fall.
• Easiest measure of variability to calculate. The percentiles are the score- points that
• Used when the measure of Central Tendency divide a distribution into 100 equal parts.
is the mode (Nominal data or when the most The 25th percentile (P25) is also called the
frequent score is of interest) or Median (Ordinal first quartile, it separate the lowest 25%
data or skewed data from other 75%.
• Simply the difference between the highest The `50th percentile is generally the
and lowest scores median.
RANGE = Highest score – Lowest score = 7 - 2 = The 75th percentile is also called the third
5 quartile.
Standard Deviation Percentile separate the data set into 100
• Measure of Variability used with the Mean equal groups.
(normally distributed interval or ratio data) Pn = XLB + (i (nN – F) /f )
• Indicates the amount that all scores differ or Where;
deviate from the Mean Pn = the score corresponding to the ith
• The more the scores differ from the Mean, the percentile rank
higher the Standard Deviation (S) XLB = the lower limit of the percentile class
• The sum of the deviation of scores from the interval
mean is always 0. f = the frequency of the percentile interval
∑( X −X ̅ )2 F = the cumulative frequency of the interval
𝐬=
√
Where;
n−1 before the percentile interval
i = the class size
S = Standard Deviation n = the rank in decimals
X = Scores N = the total frequency
X̅ = Mean of scores QUARTILES are points that divide a distribution
N = Number of scores into four equal parts.
STANDARD DEVIATION, SD is the most Consider that Q1= P25 ; Q2= P50; Q3 = P75;
important and useful measure of variation Q4 = P100
It is the square root of the Variance, SD2. The lower quartile is Q1 and the upper
It is an average to the degree to which each quartile is Q3
set of scores in the distribution deviates Qn = XLB + (i (N/4 – F) /f )
from the mean value. Where: Qn = the score corresponding to
MEAN DEVIATION is a measure of deviation the ith quartile rank
that make use of all the scores in a distribution. XLB = the lower limit of the quartile class
interval
f = the frequency of the quartile interval Normal distribution is a distribution of a normal
F = the cumulative frequency of the interval random variable with mean zero and standard
before the quartile period deviation equal to 1.
i = the class size Normal distribution is important because
4 = stands for the quartile division it's the distribution that is followed by most
N = the total frequency of the continuous (or measurable)
DECILES are points that divide a distribution characteristics (or variables) in our day-to-
into ten equal parts .Each part is called a decile. day life.
So, D1 = P10 D2 = P20, …D10 = P10 A distribution that occurs naturally in many
Dn = XLB + (i (N/10 – F) /f) situations.
Where: Dn = the score corresponding to the In statistics, it is called the normal curve.
ith decile rank In the social sciences, it is called the bell
XLB = the lower limit of the decile class curve.
interval In Physics, it is called the Gaussian
f = the frequency of the decile interval distribution.
F = the cumulative frequency of the interval The total area under the curve is 1.
before the decile interval B. Pascal & P. de Fermat – Fathers of
i = the class size Probability.
4 = stands for the decile division Kinds of Probability
N = the total frequency Theoretical probability deals with the
Z-scores (also known as standard score) tells us nature of the experiment and events; it is
how many standard deviations an observation is what we expect to happen
above or below the mean. - Based on facts
A positive z-score measures the number of - Decisions should be made using
standard deviation a score is above the theoretical data
mean, and a negative z-score measures the Experimental or empirical probability relies
number of standard deviations a score is on the fact of actual occurrence of the
below the mean. experiment and events; it is what actually
A z-score describes the position of a raw happens when we try it out (based on
score in terms of its distance from the experiment or data).
mean, when measured in standard - Results base on experiment
deviation units.
Also called standard score denoted by the
letter z
BOX AND WHISKER PLOT (sometimes called a
boxplot) is a graph that presents information
from a five-number summary.
It is a way of summarizing a set of data
measured on an interval scale as illustrated
below.
A box-and-whisker plot shows the
distribution of data.
A box-and-whisker diagram or a boxplot is a
graph that provides the five-number
summary for a finite data set in pictorial
form.
PROBABILITY is the study of random events.
It is used to make predictions about future
events.
PROBABILITY DISTRIBUTIONS are a
fundamental concept in statistics and we use
probability in daily life to make decisions when
we don't know for sure what the outcome will
be.
Used in analysing games of chance,
genetics, weather prediction, and a myriad
of other everyday events.