You are on page 1of 4

 DESCRIPTIVE STATISTICS is the branch of  HYPOTHESIS – It was introduced by Ronald

statistics that involves the collection, Fisher, Jerzy Neyman, Karl Pearson and
organization, summarization and presentation Pearson’s son, Egon Pearson.
of data.  A statistical method that is used in making
 Molly Galetto (2016), DATA MANAGEMENT is statistical decisions using experimental
the administrative process by which the data.
required data is acquired, validated, stored,  Used to determine whether the hypothesis
protected, and processed, and by which its is a reasonable statement and should not
accessibility, reliability, and timeliness is be rejected, or is unreasonable and should
ensured to satisfy the needs of the data users. be rejected.
 HYPOTHESIS TEST evaluates two mutually  Hypothesis are never proven through
exclusive statements about a population to hypothesis testing; rather they are accepted
determine which statement is best supported or rejected through the use of statistical
by the sample data and justified by the tests.
statistical test/s used.  Types of Hypotheses
 There are two categories of data namely:  Null hypothesis (Ho)
 Categorical data are nominal scales(finite • Defined as the hypothesis of no
set of possible values with no particular difference. • Formulated with the purpose
order like gender, civil status, of being accepted or rejected.
occupation)and ordinal scales  Alternative hypothesis (H1 or Ha)
 Continuous scale has interval scale (like • A claim that disagrees with the null
temperature, time tons of garbage, income, hypothesis (Kiernan, 2020).
number of arrests, age) and ratio scale • Sometimes referred to as the “research
(height, money, age, weight) hypothesis” (La Trobe University, 2020).
 Continuous data are numerical data that • This indicates that a difference does exist
can theoretically be measured in infinitely between two or more variables.
small units. • This represents what the researcher
 PERCENTAGES or RATIOS summarize two pieces hopes to find to be true.
of information, namely their constituent  TEST STATISTIC – quantity calculated from
numerator and denominator values. sample data and is used in making the decision
 Simple ratios (0 to 1, i.e. the denominator is the to “reject Ho“or “do not reject Ho.
maximum possible value that the numerator  Types of Statistical Tests (Wills, n.d.)
can take) can be treated as continuous data. • Parametric tests (Broto, 2008)
 These data set can be organized and presented • Nonparametric test
using graphs:  Types of Correct Decision (Febre, 1987)
 Bar graphs • Type A: This occurs when the null hypothesis
 Histogram is true and we decide on its favor.
 Pictograph • Type B: This occurs when the null hypothesis
 Line graphs is false, and the decision is in opposition to the
 Circle graphs / Pie charts null hypothesis
 The data gathered should be accurately  Types of Error (Febre, 1987)
organized in to grouped data called frequency • Type I: Reject the null hypothesis when in fact
distribution and presented in a frequency it is true; denoted by.
polygon. • Type II: Accept the null hypothesis when in
 FREQUENCY POLYGON which is defined as the fact it is false; denoted by.
graph that uses lines to join the midpoints of  Location of the Critical Region
the classes.  One-tailed test. This is performed when the
 VARIABLE that can be measured and ordered results are in one direction.
according to quantity is quantitative while  • Two-tailed test. The two-tailed test is
qualitative is simply used as labels to performed when the results are in either
differentiate one group from the other. direction
 HYPOTHESIS – A statement or tentative theory  STATISTICAL TOOLS derived from mathematics
which aims to explain facts about the real are useful in processing and managing data in
world. order to describe a phenomenon and predict
 A specific statement of prediction values.
 Describes in concrete terms what is  STATISTICS – A body of knowledge that deals
expected by the researcher in the study with the collection, organization, or
presentation, analysis, and interpretation of
data.
 Steps in Conducting a Research: b. Continuous variable- one that can
 Collection – Gathering of information or assume a infinite values within a specified
data. interval (measuring).
 Organization/Presentation – Summarizing c. Dependent variable- a variable that is
data or information in textual, graphical, or affected or influenced by another variable
tabular forms d. Independent variable- a variable which
 Analysis - Describing the data by using affects or influences the dependent variable
Statistical methods and procedures.  Constant “A property or characteristic of a
 Interpretation – Process of making population or sample, which makes the
conclusions based on analysed data. members of the group similar to each other.
 History of Statistics:  Scales of Measurement
 Ancient Egypt – Prepared list of all the  Nominal Scale – Used when we want to say
heads of the families one object is different from another for
 Ancient Judea – Census was taken in identification purposes.
several occasions Ex. Gender, Nationality & Civil status
 1st Roman Census – Census was repeated  Ordinal Scale – Data are arranged in
sixty-nine times, the most famous was specified order or rank.
written in the Bible Ex. *Ranking of contestants in a beauty
 Categories: Statistics contest
 Descriptive Statistics – statistical procedure  Interval Scale – Data is greater or less than
concerned with describing the another and the amount of difference can
characteristics and properties of group of be specified.
persons, places or things. Ex. *Scores in an examination * Pearl got 48
- Examples: How poor /how rich a group in English examination while Jeanne got 35.
of people in a community  Ratio Scale Like the interval scale the only
- How many are literate/illiterate difference is the ratio level, it always starts
 Inferential Statistics – statistical procedure from an absolute or true zero point.
that is used to draw inferences or Ex. *betty weighs 45N, while her friend
information about properties or Carla weighs 35N
characteristics of large group of people,  Measures of dispersion – enhance the
places or things or the basis of information information given by the measures of central
obtained from a small portion of a large tendency.
group.  dispersion is said to be relevant when there
- Examples: Suppose we wanted to know is variation or lack of uniformity in the size
the most favourite brand of deodorant of items of a series
soap of a very large community and we  Measures of Central Tendency – is the numbers
do not have enough time and money to that describe what is average or typical of the
interview all the residents of the distribution.
barangay  Also called as measures of the first order.
 Terminologies in Statistics:  Descriptive statistics – is the branch of statistics
 Population – large collection of objects, involves the collection, organization,
persons, places, or things. summarization, presentation, and
 Sample – small portion or part of a interpretation of data while the branch that
population. interprets and draws conclusion from the data
 Parameter – Any numerical or nominal is called inferential statistic.
characteristic of a population  Inferential statistics are used when data is
 Statistic – value or measurement obtained viewed as a subclass of a specific population.
from a given sample  Measures of Central Tendency
 Data Facts – or set of information or  Median
observations under study.  Mode
a. Qualitative data - cannot be subjected to  Weighted Mean
arithmetic operations. Example: gender and  The MEAN, Mn is also called the arithmetic
nationality mean or average.
b. Quantitative data – numerical in nature  A. Ungrouped Data
and obtained from counting or Mean (Mn) or 𝑿̅ = 𝒔𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔 𝒕𝒐𝒕𝒂𝒍
measurement. Example: test scores and 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
height of students  B. Grouped Data
a. Discrete variable- one that can assume a Md = XLB + (N / 2 – cfb)i /fm
finite number of values (counting)
 The MODE category or score with the largest  This is more reliable than the range and
frequency (or percentage) in the distribution. quartile deviation.
 The Mode of Grouped Data  Mean absolute deviation (MAD) of a data set is
Mo = XLB + df1 / (df1 + df2) i the average distance between each data value
and the mean and it is a way to describe
 WEIGHTED MEAN is an average computed by variation in a data set and the mean and it is a
giving different weights to some of the way to describe variation in a data set.
individual values.  Mean absolute deviation helps us get a
 It represents the average of a given data. sense of how "spread out" the values in a
 Calculated when data is given in a different data set are.
way compared to an arithmetic mean or  MEASURES OF RELATIVE POSITION – another
sample mean. most appropriate measures found to be useful
A. Ungrouped Data Weighted Mean in describing a distribution of observations.
WMn = ∑f X /N  These measures are vital in the
 DIPERSION – the measure of the variation of interpretation of quantitative variables,
items. –A.L Bowley where we are often interested in where a
 DIPERSION – measure of the extent to which particular value falls in the distribution.
the individual items vary. –L.R Conor  The most common measures of position are
 DIPERSION – spread is the degree of the scatter percentiles, quartiles, deciles and standard
or variation of the variables about a central scores (aka, z-scores) and the box-and-whisker
value. – B.C Broooks & W.F.L Dicks or Boxplot.
 STATISTICIANS use summary measures to  PERCENTILE is a measure used in statistics
describe the amount of variability or spread in a indicating the value below which a given
set of data. percentage of observations in a group of
 Range observations fall.
• Easiest measure of variability to calculate.  The percentiles are the score- points that
• Used when the measure of Central Tendency divide a distribution into 100 equal parts.
is the mode (Nominal data or when the most  The 25th percentile (P25) is also called the
frequent score is of interest) or Median (Ordinal first quartile, it separate the lowest 25%
data or skewed data from other 75%.
• Simply the difference between the highest  The `50th percentile is generally the
and lowest scores median.
RANGE = Highest score – Lowest score = 7 - 2 =  The 75th percentile is also called the third
5 quartile.
 Standard Deviation  Percentile separate the data set into 100
• Measure of Variability used with the Mean equal groups.
(normally distributed interval or ratio data)  Pn = XLB + (i (nN – F) /f )
• Indicates the amount that all scores differ or  Where;
deviate from the Mean Pn = the score corresponding to the ith
• The more the scores differ from the Mean, the percentile rank
higher the Standard Deviation (S) XLB = the lower limit of the percentile class
• The sum of the deviation of scores from the interval
mean is always 0. f = the frequency of the percentile interval
∑( X −X ̅ )2 F = the cumulative frequency of the interval
 𝐬=

Where;
n−1 before the percentile interval
i = the class size
S = Standard Deviation n = the rank in decimals
X = Scores N = the total frequency
X̅ = Mean of scores  QUARTILES are points that divide a distribution
N = Number of scores into four equal parts.
 STANDARD DEVIATION, SD is the most  Consider that Q1= P25 ; Q2= P50; Q3 = P75;
important and useful measure of variation Q4 = P100
 It is the square root of the Variance, SD2.  The lower quartile is Q1 and the upper
 It is an average to the degree to which each quartile is Q3
set of scores in the distribution deviates  Qn = XLB + (i (N/4 – F) /f )
from the mean value.  Where: Qn = the score corresponding to
 MEAN DEVIATION is a measure of deviation the ith quartile rank
that make use of all the scores in a distribution. XLB = the lower limit of the quartile class
interval
f = the frequency of the quartile interval  Normal distribution is a distribution of a normal
F = the cumulative frequency of the interval random variable with mean zero and standard
before the quartile period deviation equal to 1.
i = the class size  Normal distribution is important because
4 = stands for the quartile division it's the distribution that is followed by most
N = the total frequency of the continuous (or measurable)
 DECILES are points that divide a distribution characteristics (or variables) in our day-to-
into ten equal parts .Each part is called a decile. day life.
So, D1 = P10 D2 = P20, …D10 = P10  A distribution that occurs naturally in many
 Dn = XLB + (i (N/10 – F) /f) situations.
 Where: Dn = the score corresponding to the  In statistics, it is called the normal curve.
ith decile rank  In the social sciences, it is called the bell
XLB = the lower limit of the decile class curve.
interval  In Physics, it is called the Gaussian
f = the frequency of the decile interval distribution.
F = the cumulative frequency of the interval  The total area under the curve is 1.
before the decile interval  B. Pascal & P. de Fermat – Fathers of
i = the class size Probability.
4 = stands for the decile division  Kinds of Probability
N = the total frequency  Theoretical probability deals with the
 Z-scores (also known as standard score) tells us nature of the experiment and events; it is
how many standard deviations an observation is what we expect to happen
above or below the mean. - Based on facts
 A positive z-score measures the number of - Decisions should be made using
standard deviation a score is above the theoretical data
mean, and a negative z-score measures the  Experimental or empirical probability relies
number of standard deviations a score is on the fact of actual occurrence of the
below the mean. experiment and events; it is what actually
 A z-score describes the position of a raw happens when we try it out (based on
score in terms of its distance from the experiment or data).
mean, when measured in standard - Results base on experiment
deviation units.
 Also called standard score denoted by the
letter z
 BOX AND WHISKER PLOT (sometimes called a
boxplot) is a graph that presents information
from a five-number summary.
 It is a way of summarizing a set of data
measured on an interval scale as illustrated
below.
 A box-and-whisker plot shows the
distribution of data.
 A box-and-whisker diagram or a boxplot is a
graph that provides the five-number
summary for a finite data set in pictorial
form.
 PROBABILITY is the study of random events.
 It is used to make predictions about future
events.
 PROBABILITY DISTRIBUTIONS are a
fundamental concept in statistics and we use
probability in daily life to make decisions when
we don't know for sure what the outcome will
be.
 Used in analysing games of chance,
genetics, weather prediction, and a myriad
of other everyday events.

You might also like