B26 Notes

BSEM 26: ELEMENTARY STATISTICS AND II.
2ND DOCUMENT (PPT)

PROBABILITY STATISTICAL TERMS
I. 1st DOCUMENT (PPT) 1. Data – any quanti or quali info
ORIGIN OF STATISTICS a. Quanti – numerical
 Statistics is as old as man’s societal b. Quali – non-numeric
existence 2. Population (N) – totality of elements
 As a discipline, it began when man started 3. Sample (n) – part of population determined by
to count and measure. sampling procedures
 Historically, stat dated back to the ancient 4. Parameter – any stat info taken from
Egyptians and Chinese who used charts population
and tables to keep state records 5. Statistic – any estimate of statistical attributes
Statistics taken from a sample
 come from Latin word “status” meaning 6. Variable – a factor that differentiates a sample
“state” – involves compilation of data and from another group
graphs describing various aspects of the a. Discrete – counted
state or country. b. Continuous – measured
 “state-istics” in ancient days – was the
place to find info on revenues SCALES OF MEASUREMENTS
 Simply, the science of data. 1. Nominal M – depicts the presence or absence
Uses/Applications: of a certain attribute; usually involves the random
 In making decisions assignment of numbers to represent attribute
2. Ordinal M – provides the degree of the
 In solving problems
presence of an attribute; usually classified
 In designing products and processes
according to order or ranks
3. Interval M – data are arranged in some order
STEPS IN SCIENTIFIC METHOD
and the differences between data are meaningful;
1. Develop a clear description
data may lack inherent zero starting point
2. Identify the important factors
4. Ratio – an interval level modified to include the
3. Propose a model
inherent starting point
4. Refine a model
5. Conduct experiments
SIGMA NOTATION or Summation (∑)
6. Manipulate the model
 Stat symbol which abbreviates the sum of
7. Confirm the solution
the quantities in a given range
8. Conclusions and recommendations
 The Greek letter ∑ (capital sigma)
TWO MAJOR AREAS OF STATS indicates “summation of”
A. Descriptive Stat – a stat method, simply,  ∑ 𝑥 wherein 4 is the upper limit, 1 is the
analysis of data lower limit
B. Inferential Stat – analysis of data leading to
predictions, conclusions III. 3RD DOCUMENT (VIDEO)
COLLECTION OF DATA – first step in any
TYPES OF DATA statistical work
A. Quantitative Data – numerical info CLASSIFICATION OF DATA
B. Qualitative Data – descriptive attributes, 1. Primary Data – gathered directly from the
cannot be subjected to mathematical operations source
2. Secondary Data – gathered from
VARIABILITY - spread of data secondary sources
TYPES OF VARIABLES
A. Quantitative Variable METHODS OF GATHERING DATA
1. Discrete – counted 1. Interview
2. Continuous - measured 2. Questionnaire
B. Qualitative variables 3. Observation
4. Registration or Census
5. Experimentation o 𝑥̅ =
∑
 Population Mean
SLOVIN’S FORMULA: ∑ ̅
 Used to determine appropriate number of o 𝜇=
samples: For Grouped Data:
𝑵  Direct Method
𝒏=
𝟏 + 𝑵𝒆𝟐 o 𝑥̅ =
∑
Where n = number of samples
N = number of population  Shortcut Method
∑
e = margin of error o 𝑥̅ = 𝐴 +
 Step Deviation Method
IV. 4th DOCUMENT (PPT) ∑
o 𝑥̅ = 𝐴 + ×𝑖
MEASURES OF CENTRAL TENDENCY
 A central value or a typical value for a 2. Weighted Mean
probability distribution  Each value has different weight or degree
 Occasionally called an average or just the of importance.
∑ 𝑥𝑤
center of distribution 𝑥̅ =
SOME DEFINITIONS ∑𝑤
Where: 𝑥̅ = the mean
 Simpson and Kafka – “a typical value
X = measurement value
around which other figures gather”
W = number of measurements
 Waugh – “an average stand for the whole
3. Harmonic Mean
group of which it forms a part yet
 The quotient of “number of the given
represents the whole”
values” and “the sum of the reciprocals of
 Layman’s term – AVERAGE
the given values”
For Ungrouped data:
IMPORTANCE OF CENTRAL TENDENCY: 𝑛
 To find representative value 𝐻𝑀 𝑜𝑓 𝑋 = 𝑥̅ =
1
 To make more concise data ∑
𝑥
 To make comparison For Grouped Data:
 Helpful in statistical analysis ∑𝑓
𝐻𝑀 𝑜𝑓 𝑋 = 𝑥̅ =
𝑓
∑
𝑥
A. MEAN
4. Geometric Mean
 The most popular and widely used;
 Well defined only for sets of positive real
sometimes called the arithmetic mean.
numbers
 The sum of all the measurements divided
 Math definition: the nth root of the product
by the number of measurements in the
of n numbers
set.
PROPERTIES OF MEAN  Common example is when averaging the
growth rates
 Can be calculated for any set of numerical
data, so it always exits  GM is NOT the arithmetic mean and it is
NOT a simple average.
 A set of numerical data has one and only
For Ungrouped Data:
mean
𝑙𝑜𝑔𝑥
 The most reliable measure of central 𝐺 = 𝐴𝑛𝑡𝑖
𝑛
tendency since it takes into account every
For Grouped data:
item in the set of data 𝑓 𝑙𝑜𝑔𝑥
 Greatly affected by extreme or deviant 𝐺 = 𝐴𝑛𝑡𝑖
𝑛
values (outliers)
 Used only if the data are interval or ratio. B. MEDIAN (Md)
TYPES OF MEAN  Is the middle value of the sample when
1. Arithmetic Mean the data are ranked in order according to
For Ungrouped Data: size
 Sample Mean
 Connor – “Median is that value of the  Usually an actual value of an important
variable which devises the group into two part of the series
equal parts, one part comprising all values DISADVANTAGES OF MODE
greater, and the other, all values less than  Not based on all observations
median”  Not capable pf further mathematical
manipulation
For Ungrouped Data:  Affected to a great extent by sampling
𝑁+1 fluctuations
𝑀𝑑 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
2  Choice of grouping has great influence on
For Grouped Data: the value of mode
𝑁
+1
𝑀𝑑 = 𝐿1 + 2 𝑡ℎ 𝑖𝑡𝑒𝑚 RELATIONS BETWEEN THE MEASURES OF
2
CENTRAL TENDENCY
ADVANTAGES OF MEDIAN  In symmetrical distributions, the mean and
 Can be calculated in all distributions median are equal
 Can be understood even by common  For normal distributions,
people mean=median=mode.
 Can be ascertained even with the extreme  In positively skewed data, mean>median.
items  In negatively skewed data, mean<median.
 Can be located graphically
 Most useful dealing with qualitative data CONCLUSION:
DISADVANTAGES OF MEDIAN  MCT tells us where the middle of a bunch
 Is not based on all the values of data lies.
 Is not capable pf further mathematical  Mean
treatment o The most common
 Is affected fluctuation of sampling o Sum of the numbers divided by the
 In case of even number of values, it may number of numbers in a set of data
not the value from the data. o Also known as average
 Median
C. MODE (Mo) o Number present in the middle
 Value which occurs most frequently in a when the numbers are arranged in
set of values; the most popular value in order
the given set o If the number of values in a data
 Croxton and Cowden – “the value at the set is even, then the median is the
point armed with the item tend to most mean of the two middle numbers.
heavily concentrated. It may be regarded  Mode
as the most typical of a series of value” o The value that occurs most
PROPERTIES OF MODE frequently in a set of data.
 Used when you want to find the value
which occurs most often
 A quick approximation of the average V. 5TH DOCUMENT (PPT – GAME)
 An inspection average ADDITIONAL INFO ABT MCT
 The most unreliable among the three  Bimodal – two distinct modes
measure of central tendencies because its  Multi-modal – more than 2 distinct modes
value is undefined in some observations. CONSIDERATIONS FOR CHOOSING A MCT
ADVANTAGES OF MODE  For nominal variable – mode
 Really comprehensible and easily  For ordinal variables – mode and median
calculated  For interval-ratio variables – mean,
 The best representative of data mode, and median may be all calculated
 Not at all affected by extreme values o Mean provides the most
 Can also be determined graphically information, but the median is
preferred if the distribution is ∑(𝒙 𝒙)𝟐
 𝑺𝑫 =
skewed. 𝒏
o Skewed – mode and median
o Symmetrical – mean, median, VII. 8TH DOCUMENT
mode DIFFERENT KINDS OF GRAPHS
1. BAR GRAPH
 Used to show relative sizes of data
 Bars may be vertical or horizontal
VI. 6TH DOCUMENT (PPT) POINTERS IN CONSTRUCTION:
MEASURES OF VARIABILITY 1. Write the appropriate title.
 The spread of the values about the mean 2. Label both axes, use legends for multiple
 Intuitively, a smaller dispersion of scores bars, zero should be clearly stated.
arising from the comparison often 3. Bars must be proportion to the quantities
indicates more consistency and more they are representing.
reliability. 4. The width of the bars must be equal, there
1. RANGE must be uniform spaces between bars.
 The quickest way to determine dispersion 5. If necessary, highlight sources or
of scores footnotes.
 r = H-L
 the simplest measure of variability, but its 2. LINE GRAPH
simplicity fails to show any clustering  Shows the relationship between two or
scores and is greatly affected by an outlier more sets of continuous data
2. MEAN DEVIATION POINTERS IN CONSTRUCTION:
∑|𝒙 − 𝒙| 1. State clearly the title of the graph.
𝑴𝑫 = 2. Label both axes. A legend should be used
𝒏
Where: MD = mean deviation for multiple lines. The zero point should be
X = individual item clearly indicated.
𝑥̅ = mean 3. Connect plotted points from left to right.
N = the number of items under 4. Sources and footnotes should be
observation provided.
5. Multiple lines should be distinguished by
using diff colors.
VII. 7TH DOCUMENT
3. VARIANCE 3. CIRCLE GRAPH
 the average of the squared deviation from  Used to compare parts to a whole
the mean  The size of the sector of the circle is
 a closer measure of scattering of data proportion to the size of the category it
about an average represents.
 however, it is also easily distorted by POINTERS IN CONSTRUCTION:
outliers because it is affected by individual 1. Organize the data on the table by
score providing columns
∑(𝒙 𝒙)𝟐 a. The fractional part or the percent of
 𝑽= 𝒏 each quantity which is of the whole
b. The number of degrees
4. STANDARD DEVIATION representing fractional part,
 the most important measure of dispersion obtained by multiplying 360˚ by the
 like the MD, standard deviation fractional parts
differentiates sets of scores with equal 2. On a circle, construct successive central
averages. angles using the number of degrees
 Has several applications in inferential stat representing each part.
 Very useful in stat works 3. Label each part and write an appropriate
title.
 When there is only one predictor variable,
4. PICTOGRAPH/PICTOGRAM the prediction method is called simple
 A picture graph used to show the regression.
numerical data through symbols Linear regression consists of finding the best
 The picture to be used must symbolize the fitting straight line through the points. The best
data to be represented fitting line is called a regression line.
GUIDELINES/POINTERS: A. Linear regression Function: 𝒚 = 𝒂 + 𝒃𝒙
1. Indicate the appropriate title. Where b = slope of the regression
2. Decide on the symbol to use for each item a = the y-intercept
3. Proportionally represent the given data on
the symbols to be used. 1. To solve for b: 𝒃 = 𝒓
𝑺𝒚
𝑺𝒙
4. Appropriate legend should be clearly
Where r = the Pearson’s r correlation
indicated.
𝑆 = standard deviation of y
IX. 9TH DOCUMENT (PPT) 𝑆 = standard deviation of x
FREQUENCY DISTRIBUTION TABLE a. r or Pearson’s r correlation:
 The tabular representation of data  measures the strength of the linear
 A table used in stat as a method of relationship between two variables
∑ 𝒙𝒊 ∑ 𝒚𝒊
recording the data collected. ∑ 𝒙𝒊 𝒚𝒊
 𝒓= 𝒏
 Lists a set of scores and their frequency ∑ 𝒙𝟐𝒊

∑ 𝒙𝒊
𝟐
∑ 𝒚𝟐𝒊
∑ 𝒚𝒊
𝟐
𝒏 𝒏
 A tally is often used to keep track of
scores b. 𝑺𝒚 and 𝑺𝒙 are solved using the standard
∑(𝒙 𝒙)𝟐
deviation formula: 𝑺𝑫 =
STEPS IN CONSTRUCTING FDT FOR 𝒏
GROUPED DATA
1. Find the range. r = H-L 2. To solve for a: 𝒂 = 𝒚 − 𝒃𝒙
2. Decide on the number of classes. a. 𝒚 and 𝒙 are solved using the formula for
∑𝒙
o A class is a grouping or category. mean: 𝒙 = 𝒏
Statisticians said that the ideal
number of classes is between 5
and 15.
XI. 11TH DOCUMENT (PPT)
3. Determine the class interval.
EXPANDED FDT
o Class interval is the size of each
Midterm Test of 45 Students in BSE 2-1M
class. For convenience, intervals
Class f CM LL UL LB UB “<”Cf “>”Cf
are rounded to the nearest integer.
𝒓𝒂𝒏𝒈𝒆 19-21
o 𝒊 = 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔𝒆𝒔
4. Determine the classes starting with the 1. LL – Lower Limit
lowest class.  The smaller number in a class
o 𝒍𝒐𝒘𝒆𝒔𝒕 𝒄𝒍𝒂𝒔𝒔 = 𝑳𝑺 + (𝒊 − 𝟏)  Ex: Class 19-21; LL is 19
5. Determine the class frequency for each 2. UL – Upper Limit
class by counting the tally.  The larger number in a class
 Ex: Class 19-21; UL is 21
3. CM – Class Mark
X. 10TH DOCUMENT (PPT)  The middle value in a class
LINEAR REGRESSION
 𝐶𝑀 =
 In Simple Linear Regression, we predict
scores on one variable from scores on the  Ex: class 19-21
second variable. o 𝐶𝑀 =
 Criterion variable – the variable we are o 𝐶𝑀 = 20
predicting (referred to as Y) Class Boundaries – are often described as true
 Predictor variable – the variable we are limits because they are more precise
basing our predictions (referred to as X) expressions of class limits
4. LB – Lower Boundary
 𝐿𝐵 = 𝐿𝐿 − 0.5
5. UB – Upper Boundary
 𝑈𝐵 = 𝑈𝐿 + 0.5
Cumulative Frequency
6. “<”Cf – Less than Cumulative Frequency
 Can be determined by adding the
frequency of the class from the lower
classes
 adding from bottom to top
7. “>”Cf – Greater than Cumulative Frequency
 Can be determined in the same manner
but in reverse order
 Adding from the top to bottom
MIDTERM ENDED HERE, BUT LET’S JUST

INCLUDE LINEAR REGRESSION FOR FINALS
(just in case )
 Also refers the statistical measure that
X. 10TH DOCUMENT (PPT) expresses the extent to which two
LINEAR REGRESSION variables are linearly related.
 In Simple Linear Regression, we predict
scores on one variable from scores on the CLASSIFYING CORRELATIONS
second variable.  A positive correlation means a direct
 Criterion variable – the variable we are relationship. (0 < 𝑟 ≤ 1)
predicting (referred to as Y) o As x increases, y also increases.
 Predictor variable – the variable we are  A negative correlation means an inverse
basing our predictions (referred to as X) relationship. (−1 ≤ 𝑟 < 0)
 When there is only one predictor variable, o As x increases, y decreases.
the prediction method is called simple  A perfect correlation happens when the
regression. correlation coefficient is equal to -1 or 1.
o This means that the change is one
Linear regression consists of finding the best variable is exactly proportional to
fitting straight line through the points. The best the change in the other.
fitting line is called a regression line.
A. Linear regression Function: 𝒚 = 𝒂 + 𝒃𝒙 VISUALIZING CORRELATION
Where b = slope of the regression  Positive – pataas
a = the y-intercept  Negative – pababa
 No correlation – pantay lang
𝑺𝒚
1. To solve for b: 𝒃 = 𝒓 𝑺𝒙
Where r = the Pearson’s r correlation
TESTS FOR CORRELATION
𝑆 = standard deviation of y
A. Non-Parametric
𝑆 = standard deviation of x
 Spearman Rank-Sum Test
a. r or Pearson’s r correlation: ∑
 measures the strength of the linear  𝑟 =1− ( )
relationship between two variables
∑ 𝒙𝒊 ∑ 𝒚𝒊
∑ 𝒙 𝒊 𝒚𝒊 B. Parametric
 𝒓= 𝒏
∑ 𝒙𝒊
𝟐
∑ 𝒚𝒊
𝟐  Pearson Product-Moment Correlation
∑ 𝒙𝟐𝒊 ∑ 𝒚𝟐𝒊
𝒏 𝒏 Coefficient (or Pearson’s r)
b. 𝑺𝒚 and 𝑺𝒙 are solved using the standard
(∑ ) (∑ )(∑ )
∑(𝒙 𝒙)𝟐  𝑟=
deviation formula: 𝑺𝑫 = 𝒏
[ ∑ (∑ ) ] [ ∑ (∑ ) ]
o Attempts to draw a line of best fit

2. To solve for a: 𝒂 = 𝒚 − 𝒃𝒙
through the data of two variables
a. 𝒚 and 𝒙 are solved using the formula for
∑𝒙
 The Pearson’s correlation coefficient, r,
mean: 𝒙 = 𝒏 indicates how far away all these data
points are to this line of best fit.
INTERPRETING Pearson’s r
XII. 12TH DOCUMENT (PPT) r Interpretation
CORRELATION ±0.805 − ±0.995 Very High
 A connection or association between two ±0.605 − ±0.795 High
variables ±0.405 − ±0.595 Moderate/Substantial
 Is a statistical term describing the degree ±0.205 − ±0.395 Low/Slight
to which two variables move in ±0.005 − ±0.195 Negligible
coordination with one another
STEPS IN USING PEARSONS CORRELATION FOUR STEP PROCESS OF FINDING THE
1. Organize data into 5 columns; such that AREA UNDER THE NORMAL CURVE (given a
the first 2 columns are filled with every pair z-value)
of x and y values. 1. Express the given z-value into a 3-digit
2. Fill the remaining columns accordingly: form
o Column 3 (XY): products of pairs 2. Using the z-table, find the first 2 digits on
of x and y values the left column
o Column 4 (𝑿𝟐): squares of x- 3. Match the 3rd digit with the appropriate
values per row column on the right
o Column 5 (𝒀𝟐 ): squares of y- 4. Read the area (or probability) at the
values per row intersection of the row and column
3. Get the sum of each column.
4. Substitute to their respective variables in
the formula and simplify. Calculated r-
values may be rounded off to at least 2 XIV. 14TH DOCUMENT
decimal places. DIFFERENT KINDS OF EVENTS
PROBABILITY
 Sometimes called the game of chance
XIII. 13TH DOCUMENT (PPT)  Refers to the likelihood of an event
AREA UNDER THE STANDARD NORMAL occurring
DISTRIBUTION
CONCEPTS TOWARDS SOLVING
NORMAL DISTRIBUTION PROBABILITY
 aka Gaussian distribution A. Sample Space (S) – consists of all possible
 Is the most important prob distribution in outcomes of an experiment
statistics for independent, random B. Sample Point – each outcome in S
variables C. Event E – a subset of S
PROPERTIES OF A NORMAL DISTRIBUTION DIFF KINDS OF EVENTS

 The distribution curve is 𝒃𝒆𝒍𝒍-shaped 1. Mutually Exclusive Events – means
 The curve is symmetrical about its events that can’t happen at the same time
center
 The mean, median, and mode coincide Theorem 1. The probability of A and B together
at the center equals 0 (impossible).
 The tails of the curve flatten out 𝑷(𝑨 𝒂𝒏𝒅 𝑩) = 𝟎
indefinitely along the horizontal axis but
never touches it (curve is asymptotic) Theorem 2. The probability of A or B is the sum of
 Area under the curve = 1 the individual probabilities
𝑷(𝑨 𝒐𝒓 𝑩) = 𝑷(𝑨) + 𝑷(𝑩)
EMPIRICAL RULES FOR A NORMAL
DISTRIBUTION 2. Independent Events – if the result of the
2nd event is not affected by the result of
 Approximately 68% of the data lie within
the first event
1 SD from the mean
 Approximately 95% of the data lie within
Theorem 3. If A and B are independent events,
2 SD from the mean
the prob of both events occurring is the product of
the prob of the individual events.
𝑷(𝑨 𝒕𝒉𝒆𝒏 𝑩) = 𝑷(𝑨) ∗ 𝑷(𝑩)
3. Dependent Events – if the result of one Random variable (x)
event is affected by the result of another  Takes on a defined set of values with
event different probabilities
Types of Random Variable
Theorem 4. If A and B are dependent events, the  Discrete – have a countable number of
prob of both events occurring is the product of the outcomes
prob of the first event and the prob of the second  Continuous – have an infinite continuum
event once the first event has occurred of possible values
𝑷(𝑨 𝒂𝒏𝒅 𝑩) = 𝑷(𝑨) ∗ 𝑷(𝑩|𝑨)
Probability Functions P(x)
 Maps the possible values of x against their
respective probabilities of occurrence
XV. 15TH DOCUMENT (PPT)  P(x) ranges from 0 to 1.0
PROBABILITY DISTRIBUTION
Cumulative distribution function (CDF)
DISTRIBUTION:  Adding the probabilities
A. Frequency Distribution – listing of
observed/actual frequencies of all outcomes of an
experiment that actually occurred when
experiment was done. XVI. VIDEO (panoorin na lang dahil solving, saka
mas maganda sya mag-explain)
B. Probability Distribution
 Listing of the probabilities of all the
possible outcomes that could occur if the
experiment was done XVII. DOCUMENT (PPT)
 Can be described as HYPOTHESIS
o Diagram (prob tree)  An “educated” guess
o Table  A predictive statement, capable of being
o Mathematical formula tested by scientific methods, that relates
an independent variable to some
TYPES OF PROB DISTRIBUTION dependent variable.
Prob Distribution
Binomial SYMBOLS APPLICABLE

Distribution Symbols Meaning
Discrete PD
Poisson 𝑯𝟎 Null hypothesis
Distribution 𝑯𝒂 Alternative hypo
𝜶 Greek letter Alpha – probability of
Normal
Continuous PD committing Type I error
Distribution
(aka Level of Significance)
𝜷 Greek letter Beta – probability of
committing Type II error
Discrete Distribution – Random variable can ∆ Greek letter Delta – used for Test
take only limited number of values Statistic
Ex: no of heads in 2 tosses 𝒙 Sample mean
Continuous Distribution – random variable can 𝒏 Sample size
take any value 𝒔 Standard deviation
Ex: height of the students in class 𝒕 t-distribution – case where SD is
unknown
PROBABILITY 𝝁 Greek letter ‘mu’ – mean of the
 How frequently we expect diff outcomes to normal population
occur if we repeat the experiment over
and over (“frequentist” view)
Null Hypothesis – 𝑯𝟎 SELECTING AND INTERPRETING Significance
 An assertion that we hold as true unless Level 𝜶:
we have sufficient statistical evidence to 1. Deciding on a criterion for accepting or
conclude otherwise. rejecting the 𝐻 .
𝐻 :𝜇 = 𝜇 Example: 𝛼 = 5%
2. 𝛼 refers to the percentage of sample
Alternative Hypothesis – 𝑯𝒂 means that is outside certain prescribed
 Negation of null hypo limits.
 𝐻 = 𝜇 ≠ 𝜇 – “not equal” For the example:
 𝐻 = 𝜇 > 𝜇 – “better than”  We reject the 𝐻 if it falls in the 2
 𝐻 = 𝜇 < 𝜇 – “less than” regions of area 0.025
 Do not reject the 𝐻 if it falls within
Level of Significance and Confidence the region of area 0.95
Significance – 𝜶 3. The higher the 𝛼, the higher is the
 The percentage risk to reject null hypo probability of rejecting the 𝐻 when it is
when it is true true.
 Generally taken as 1%, 5%, 10%
Confidence – (𝟏 − 𝜶) TYPE I & TYPE II ERRORS
 The interval in which the null hypo will 1. TYPE I ERROR
exist when it is true. o Situation when we reject 𝐻 when
it is true
Risk of Rejecting a Null Hypo when it is true o Probability (Type I error) = 𝛼
Designation Risk 𝛼 Confidence 2. TYPE II ERROR
(1 − 𝛼) o Situation when we accept 𝐻 when
it is false.
Supercritical 0.001 0.999 o Probability (Type II error) = 𝛽
1% 99.9%
Critical 0.01 0.99 EXAMPLE:
1% 99% Given: 𝐻 : 𝜇 = 300 𝑑𝑎𝑦𝑠
Important 0.05 0.95 𝐻 : 𝜇 > 300 𝑑𝑎𝑦𝑠
5% 95% Answer:
Moderate 0.10 0.90 a. better to make Type II error
10% 90% (where 𝐻 is false. Actually, 𝜇 > 300 𝑑𝑎𝑦𝑠 but we
accept 𝐻 and assume that 𝜇 = 300 𝑑𝑎𝑦𝑠.
b. Since it is better to make a type
HYPOTHESIS TESTING – refers to II error, we shall choose a low 𝛼. Increasing the 𝛼
1. Making an assumption (hypothesis). increases the chances of making a type I error.
2. Collecting sample data.
3. Calculating a sample statistic.
4. Evaluate the hypothesis (how likely our A. ONE-TAILED TEST:
hypothesized parameter is correct.  A one-sided test
 A statistical hypo test in which the values
To test validity of our assumption, we determine which we can reject the 𝐻 are located
the difference between the hypothesized entirely in one tail of the probability
parameter value and the sample value. distribution.
1. LOWER Tailed Test

RESULT of Hypo Testing is either: o Will reject 𝐻 if the sample mean is
 Reject 𝐻 in favor of 𝐻 significantly lower than the
 Do not reject 𝐻 . hypothesized mean
o 𝐻 :𝜇 = 𝜇 ; 𝐻 :𝜇 < 𝜇
2. UPPER Tailed Test c. Now, solve ∆
o Will reject 𝐻 if the sample mean is
significantly higher than the √𝒏(𝒙 − 𝝁𝟎 )
hypothesized mean ∆=
𝒔
o 𝐻 :𝜇 = 𝜇 ; 𝐻 :𝜇 > 𝜇
∆= −2.74
B. TWO-TAILED TEST
o Will reject 𝐻 if the sample mean is 2. Look for the critical value.
significantly higher or lower than
the hypothesized mean. Critical t-value = −𝒕𝒅𝒇 = −𝒕𝒏 𝟏, 𝜶
o 𝐻 :𝜇 = 𝜇 ; 𝐻 :𝜇 ≠ 𝜇 = −𝑡 , .
Two-Tailed Test at 𝜶 = 𝟓% = −𝑡 , .
Using z-table, that’s equal to −2.201.
3. Compare ∆ and the critical value.
−2.74 < −2.201
4. Conclude:
HYPOTHESIS TEST for Population Mean:
Given: 𝐻 : 𝜇 = 𝜇
𝑥 = {8.1, 5.7, 11.6, 12.9, 3.8, 5.9, 7.8,

9.1, 7.0, 8.2, 9.3, 8.0} pounds
𝑛 = 12 Since ∆< −𝒕𝒏 𝟏, 𝜶 , then we reject 𝑯𝟎 and

conclude that the program is overstating.
𝜇 = 10 𝑝𝑜𝑢𝑛𝑑𝑠
𝛼 = 5%
1. Solve for Test Statistic:

a. Solve for the sample mean
∑𝒙
𝒙=
𝒏
𝑥̅ = 8.11667
b. Solve for the standard deviation
∑(𝒙 − 𝒙)𝟐
𝑠=
𝒏
𝑠 = 2.38287

B26 Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B26 Notes

Uploaded by

Copyright:

Available Formats

BSEM 26: ELEMENTARY STATISTICS AND II.

2ND DOCUMENT (PPT)

 Lists a set of scores and their frequency ∑ 𝒙𝟐𝒊

MIDTERM ENDED HERE, BUT LET’S JUST

o Attempts to draw a line of best fit

PROPERTIES OF A NORMAL DISTRIBUTION DIFF KINDS OF EVENTS

Binomial SYMBOLS APPLICABLE

1. LOWER Tailed Test

Using z-table, that’s equal to −2.201.

3. Compare ∆ and the critical value.

−2.74 < −2.201

HYPOTHESIS TEST for Population Mean:

𝑥 = {8.1, 5.7, 11.6, 12.9, 3.8, 5.9, 7.8,

𝑛 = 12 Since ∆< −𝒕𝒏 𝟏, 𝜶 , then we reject 𝑯𝟎 and

1. Solve for Test Statistic:

b. Solve for the standard deviation

You might also like