Professional Documents
Culture Documents
Population Mean
SLOVIN’S FORMULA: ∑ ̅
Used to determine appropriate number of o 𝜇=
samples: For Grouped Data:
𝑵 Direct Method
𝒏=
𝟏 + 𝑵𝒆𝟐 o 𝑥̅ =
∑
Where n = number of samples
N = number of population Shortcut Method
∑
e = margin of error o 𝑥̅ = 𝐴 +
Step Deviation Method
IV. 4th DOCUMENT (PPT) ∑
o 𝑥̅ = 𝐴 + ×𝑖
MEASURES OF CENTRAL TENDENCY
A central value or a typical value for a 2. Weighted Mean
probability distribution Each value has different weight or degree
Occasionally called an average or just the of importance.
∑ 𝑥𝑤
center of distribution 𝑥̅ =
SOME DEFINITIONS ∑𝑤
Where: 𝑥̅ = the mean
Simpson and Kafka – “a typical value
X = measurement value
around which other figures gather”
W = number of measurements
Waugh – “an average stand for the whole
3. Harmonic Mean
group of which it forms a part yet
The quotient of “number of the given
represents the whole”
values” and “the sum of the reciprocals of
Layman’s term – AVERAGE
the given values”
For Ungrouped data:
IMPORTANCE OF CENTRAL TENDENCY: 𝑛
To find representative value 𝐻𝑀 𝑜𝑓 𝑋 = 𝑥̅ =
1
To make more concise data ∑
𝑥
To make comparison For Grouped Data:
Helpful in statistical analysis ∑𝑓
𝐻𝑀 𝑜𝑓 𝑋 = 𝑥̅ =
𝑓
∑
𝑥
A. MEAN
4. Geometric Mean
The most popular and widely used;
Well defined only for sets of positive real
sometimes called the arithmetic mean.
numbers
The sum of all the measurements divided
Math definition: the nth root of the product
by the number of measurements in the
of n numbers
set.
PROPERTIES OF MEAN Common example is when averaging the
growth rates
Can be calculated for any set of numerical
data, so it always exits GM is NOT the arithmetic mean and it is
NOT a simple average.
A set of numerical data has one and only
For Ungrouped Data:
mean
𝑙𝑜𝑔𝑥
The most reliable measure of central 𝐺 = 𝐴𝑛𝑡𝑖
𝑛
tendency since it takes into account every
For Grouped data:
item in the set of data 𝑓 𝑙𝑜𝑔𝑥
Greatly affected by extreme or deviant 𝐺 = 𝐴𝑛𝑡𝑖
𝑛
values (outliers)
Used only if the data are interval or ratio. B. MEDIAN (Md)
TYPES OF MEAN Is the middle value of the sample when
1. Arithmetic Mean the data are ranked in order according to
For Ungrouped Data: size
Sample Mean
Connor – “Median is that value of the Usually an actual value of an important
variable which devises the group into two part of the series
equal parts, one part comprising all values DISADVANTAGES OF MODE
greater, and the other, all values less than Not based on all observations
median” Not capable pf further mathematical
manipulation
For Ungrouped Data: Affected to a great extent by sampling
𝑁+1 fluctuations
𝑀𝑑 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
2 Choice of grouping has great influence on
For Grouped Data: the value of mode
𝑁
+1
𝑀𝑑 = 𝐿1 + 2 𝑡ℎ 𝑖𝑡𝑒𝑚 RELATIONS BETWEEN THE MEASURES OF
2
CENTRAL TENDENCY
ADVANTAGES OF MEDIAN In symmetrical distributions, the mean and
Can be calculated in all distributions median are equal
Can be understood even by common For normal distributions,
people mean=median=mode.
Can be ascertained even with the extreme In positively skewed data, mean>median.
items In negatively skewed data, mean<median.
Can be located graphically
Most useful dealing with qualitative data CONCLUSION:
DISADVANTAGES OF MEDIAN MCT tells us where the middle of a bunch
Is not based on all the values of data lies.
Is not capable pf further mathematical Mean
treatment o The most common
Is affected fluctuation of sampling o Sum of the numbers divided by the
In case of even number of values, it may number of numbers in a set of data
not the value from the data. o Also known as average
Median
C. MODE (Mo) o Number present in the middle
Value which occurs most frequently in a when the numbers are arranged in
set of values; the most popular value in order
the given set o If the number of values in a data
Croxton and Cowden – “the value at the set is even, then the median is the
point armed with the item tend to most mean of the two middle numbers.
heavily concentrated. It may be regarded Mode
as the most typical of a series of value” o The value that occurs most
PROPERTIES OF MODE frequently in a set of data.
Used when you want to find the value
which occurs most often
A quick approximation of the average V. 5TH DOCUMENT (PPT – GAME)
An inspection average ADDITIONAL INFO ABT MCT
The most unreliable among the three Bimodal – two distinct modes
measure of central tendencies because its Multi-modal – more than 2 distinct modes
value is undefined in some observations. CONSIDERATIONS FOR CHOOSING A MCT
ADVANTAGES OF MODE For nominal variable – mode
Really comprehensible and easily For ordinal variables – mode and median
calculated For interval-ratio variables – mean,
The best representative of data mode, and median may be all calculated
Not at all affected by extreme values o Mean provides the most
Can also be determined graphically information, but the median is
preferred if the distribution is ∑(𝒙 𝒙)𝟐
𝑺𝑫 =
skewed. 𝒏
o Skewed – mode and median
o Symmetrical – mean, median, VII. 8TH DOCUMENT
mode DIFFERENT KINDS OF GRAPHS
1. BAR GRAPH
Used to show relative sizes of data
Bars may be vertical or horizontal
VI. 6TH DOCUMENT (PPT) POINTERS IN CONSTRUCTION:
MEASURES OF VARIABILITY 1. Write the appropriate title.
The spread of the values about the mean 2. Label both axes, use legends for multiple
Intuitively, a smaller dispersion of scores bars, zero should be clearly stated.
arising from the comparison often 3. Bars must be proportion to the quantities
indicates more consistency and more they are representing.
reliability. 4. The width of the bars must be equal, there
1. RANGE must be uniform spaces between bars.
The quickest way to determine dispersion 5. If necessary, highlight sources or
of scores footnotes.
r = H-L
the simplest measure of variability, but its 2. LINE GRAPH
simplicity fails to show any clustering Shows the relationship between two or
scores and is greatly affected by an outlier more sets of continuous data
2. MEAN DEVIATION POINTERS IN CONSTRUCTION:
∑|𝒙 − 𝒙| 1. State clearly the title of the graph.
𝑴𝑫 = 2. Label both axes. A legend should be used
𝒏
Where: MD = mean deviation for multiple lines. The zero point should be
X = individual item clearly indicated.
𝑥̅ = mean 3. Connect plotted points from left to right.
N = the number of items under 4. Sources and footnotes should be
observation provided.
5. Multiple lines should be distinguished by
using diff colors.
VII. 7TH DOCUMENT
3. VARIANCE 3. CIRCLE GRAPH
the average of the squared deviation from Used to compare parts to a whole
the mean The size of the sector of the circle is
a closer measure of scattering of data proportion to the size of the category it
about an average represents.
however, it is also easily distorted by POINTERS IN CONSTRUCTION:
outliers because it is affected by individual 1. Organize the data on the table by
score providing columns
∑(𝒙 𝒙)𝟐 a. The fractional part or the percent of
𝑽= 𝒏 each quantity which is of the whole
b. The number of degrees
4. STANDARD DEVIATION representing fractional part,
the most important measure of dispersion obtained by multiplying 360˚ by the
like the MD, standard deviation fractional parts
differentiates sets of scores with equal 2. On a circle, construct successive central
averages. angles using the number of degrees
Has several applications in inferential stat representing each part.
Very useful in stat works 3. Label each part and write an appropriate
title.
When there is only one predictor variable,
4. PICTOGRAPH/PICTOGRAM the prediction method is called simple
A picture graph used to show the regression.
numerical data through symbols Linear regression consists of finding the best
The picture to be used must symbolize the fitting straight line through the points. The best
data to be represented fitting line is called a regression line.
GUIDELINES/POINTERS: A. Linear regression Function: 𝒚 = 𝒂 + 𝒃𝒙
1. Indicate the appropriate title. Where b = slope of the regression
2. Decide on the symbol to use for each item a = the y-intercept
3. Proportionally represent the given data on
the symbols to be used. 1. To solve for b: 𝒃 = 𝒓
𝑺𝒚
𝑺𝒙
4. Appropriate legend should be clearly
Where r = the Pearson’s r correlation
indicated.
𝑆 = standard deviation of y
IX. 9TH DOCUMENT (PPT) 𝑆 = standard deviation of x
FREQUENCY DISTRIBUTION TABLE a. r or Pearson’s r correlation:
The tabular representation of data measures the strength of the linear
A table used in stat as a method of relationship between two variables
∑ 𝒙𝒊 ∑ 𝒚𝒊
recording the data collected. ∑ 𝒙𝒊 𝒚𝒊
𝒓= 𝒏
𝒏 𝒏
A tally is often used to keep track of
scores b. 𝑺𝒚 and 𝑺𝒙 are solved using the standard
∑(𝒙 𝒙)𝟐
deviation formula: 𝑺𝑫 =
STEPS IN CONSTRUCTING FDT FOR 𝒏
GROUPED DATA
1. Find the range. r = H-L 2. To solve for a: 𝒂 = 𝒚 − 𝒃𝒙
2. Decide on the number of classes. a. 𝒚 and 𝒙 are solved using the formula for
∑𝒙
o A class is a grouping or category. mean: 𝒙 = 𝒏
Statisticians said that the ideal
number of classes is between 5
and 15.
XI. 11TH DOCUMENT (PPT)
3. Determine the class interval.
EXPANDED FDT
o Class interval is the size of each
Midterm Test of 45 Students in BSE 2-1M
class. For convenience, intervals
Class f CM LL UL LB UB “<”Cf “>”Cf
are rounded to the nearest integer.
𝒓𝒂𝒏𝒈𝒆 19-21
o 𝒊 = 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔𝒆𝒔
4. Determine the classes starting with the 1. LL – Lower Limit
lowest class. The smaller number in a class
o 𝒍𝒐𝒘𝒆𝒔𝒕 𝒄𝒍𝒂𝒔𝒔 = 𝑳𝑺 + (𝒊 − 𝟏) Ex: Class 19-21; LL is 19
5. Determine the class frequency for each 2. UL – Upper Limit
class by counting the tally. The larger number in a class
Ex: Class 19-21; UL is 21
3. CM – Class Mark
X. 10TH DOCUMENT (PPT) The middle value in a class
LINEAR REGRESSION
𝐶𝑀 =
In Simple Linear Regression, we predict
scores on one variable from scores on the Ex: class 19-21
second variable. o 𝐶𝑀 =
Criterion variable – the variable we are o 𝐶𝑀 = 20
predicting (referred to as Y) Class Boundaries – are often described as true
Predictor variable – the variable we are limits because they are more precise
basing our predictions (referred to as X) expressions of class limits
4. LB – Lower Boundary
𝐿𝐵 = 𝐿𝐿 − 0.5
5. UB – Upper Boundary
𝑈𝐵 = 𝑈𝐿 + 0.5
Cumulative Frequency
6. “<”Cf – Less than Cumulative Frequency
Can be determined by adding the
frequency of the class from the lower
classes
adding from bottom to top
7. “>”Cf – Greater than Cumulative Frequency
Can be determined in the same manner
but in reverse order
Adding from the top to bottom
∑ 𝒙𝒊
𝟐
∑ 𝒚𝒊
𝟐 Pearson Product-Moment Correlation
∑ 𝒙𝟐𝒊 ∑ 𝒚𝟐𝒊
𝒏 𝒏 Coefficient (or Pearson’s r)
b. 𝑺𝒚 and 𝑺𝒙 are solved using the standard
(∑ ) (∑ )(∑ )
∑(𝒙 𝒙)𝟐 𝑟=
deviation formula: 𝑺𝑫 = 𝒏
[ ∑ (∑ ) ] [ ∑ (∑ ) ]
INTERPRETING Pearson’s r
XII. 12TH DOCUMENT (PPT) r Interpretation
CORRELATION ±0.805 − ±0.995 Very High
A connection or association between two ±0.605 − ±0.795 High
variables ±0.405 − ±0.595 Moderate/Substantial
Is a statistical term describing the degree ±0.205 − ±0.395 Low/Slight
to which two variables move in ±0.005 − ±0.195 Negligible
coordination with one another
STEPS IN USING PEARSONS CORRELATION FOUR STEP PROCESS OF FINDING THE
1. Organize data into 5 columns; such that AREA UNDER THE NORMAL CURVE (given a
the first 2 columns are filled with every pair z-value)
of x and y values. 1. Express the given z-value into a 3-digit
2. Fill the remaining columns accordingly: form
o Column 3 (XY): products of pairs 2. Using the z-table, find the first 2 digits on
of x and y values the left column
o Column 4 (𝑿𝟐): squares of x- 3. Match the 3rd digit with the appropriate
values per row column on the right
o Column 5 (𝒀𝟐 ): squares of y- 4. Read the area (or probability) at the
values per row intersection of the row and column
3. Get the sum of each column.
4. Substitute to their respective variables in
the formula and simplify. Calculated r-
values may be rounded off to at least 2 XIV. 14TH DOCUMENT
decimal places. DIFFERENT KINDS OF EVENTS
PROBABILITY
Sometimes called the game of chance
XIII. 13TH DOCUMENT (PPT) Refers to the likelihood of an event
AREA UNDER THE STANDARD NORMAL occurring
DISTRIBUTION
CONCEPTS TOWARDS SOLVING
NORMAL DISTRIBUTION PROBABILITY
aka Gaussian distribution A. Sample Space (S) – consists of all possible
Is the most important prob distribution in outcomes of an experiment
statistics for independent, random B. Sample Point – each outcome in S
variables C. Event E – a subset of S
Two-Tailed Test at 𝜶 = 𝟓% = −𝑡 , .
4. Conclude:
Given: 𝐻 : 𝜇 = 𝜇
𝛼 = 5%
∑𝒙
𝒙=
𝒏
𝑥̅ = 8.11667
∑(𝒙 − 𝒙)𝟐
𝑠=
𝒏
𝑠 = 2.38287