You are on page 1of 17

1 STAT 1 Notes by F5XS Elementary Statistics

The term statistics came from the Latin phrase “ratio status”, which means study of practical politics or the statesman’s art. In the middle of 18 th century, the term statistik was used, a German term defined as “the political science of several countries.” From statistik it became statistics defined as a statement in figures and facts of the present condition of a state. The word statistics can be viewed in two contexts. If in a singular sense, statistics is a science concerned with the collection, organization, presentation, analysis, and interpretation of data. If in a plural sense, statistics is a collection of facts and figures of processed data. Two broad categories of statistics  descriptive statistics  used to describe a mass of data in a clear, concise, and informative way  deals with the methods of organizing, summarizing, and presenting data  inferential statistics is concerned with making generalizations about the characteristics of a larger set where only a part is examined

Data  facts and figures that are collected, presented, and analyzed  can be numeric or non-numeric  must be contextualized The universe is a collection or set of all individuals or entities whose characteristics are to be studied. A finite universe is present when the elements of the universe can be counted for a given time period while an infinite universe is present when the number of elements of the universe is unlimited. A variable is an attribute or characteristic of interest measurable on each and every unit of the universe. A qualitative variable assumes values that are not numerical but can be categorized. Categories of qualitative variable may be identified by either nonnumerical descriptions or by numeric codes. A quantitative variable indicates the quantity or amount of a characteristic. The data are always numeric and can be discrete or continuous. Discrete quantitative variables constitute a finite or countable number of possible values while continuous quantitative variable assumes any value in a given interval. The population is the set of all possible values of the variable. A sample is a subset of the population or universe. Data may be classified into four hierarchical levels of measurement:  nominal  data collected are labels, names, or categories  frequencies or counts of observations belonging to the same category can be obtained  lowest level of measurement  ordinal  data collected are labels with implied ordering

2 STAT 1 Notes by F5XS Elementary Statistics

 the difference between two data labels is meaningless  interval  data can be ordered or ranked  the difference between two data values is meaningful  data at this level may lack an absolute zero point  ratio  data have all the properties of the interval scale  the number zero indicates the absence of the characteristic being measured  it is the highest level of measurement Methods of collecting data  objective method  the data are collected through measurement, counting, or by observation  this method requires the use of a measuring or counting instrument  subjective method  the information is provided by identified respondents  the instrument used to gather data may take the form of a questionnaire  the researcher collects data by:  conducting personal interviews either face-to-face or through telephones  gathering responses using mailed questionnaires  use of existing records uses data which have been previously collected by another person or institution for some other purposes Types of data  primary data are acquired directly from the source  secondary data are not acquired directly from the source

Methods of data presentation  textual  a narrative form of describing the characteristics of the universe or population based on the data collected and organized by giving highlights  tabular  data are organized into classes or categories by rows and/or columns and appropriate pieces of information are found in the cells of the table  relatively more information that can be presented and trends can be easily seen  some details are lost when data are summarized in tabular form  graphical  provides visual presentation of the distributional properties of the data  most efficient way of presenting trends  some details are lost in using this type of presentation  examples: pie, bar, line, scatter plot Parts of a statistical table  table heading – includes the table number and title  caption – designates the information contained in the columns  body – main part of the table containing the information or figures presented  stubs/classes – categories which describes the data usually found at the left side of the table

3 STAT 1 Notes by F5XS Elementary Statistics

Stem-and-leaf plot  presents data in ordered form and provides an idea of the shape of the distribution of a set of quantitative data  combines the grouping of a frequency distribution and the pictorial display of a histogram  best for smaller number of observations with values greater than zero  also called stemplot Steps in constructing a stem and leaf plot 1] arrange data in ascending or descending order 2] split each datum into a leaf value, which is the last digit, and a stem value, which consists of the remaining digit 3] list the stems vertically in increasing or decreasing order 4] draw a vertical line to the right of the stems 5] for each stem, write its leaves to the right of the vertical line in ascending order Important facts of a stemplot  reveals the center of the distribution  illustrates the overall shape of the distribution (like symmetry and spread)  shows marked deviations from the overall shape

Descriptive Measures  quantities that are used to summarize the characteristics of a universe or population  some of these are measures of: location, dispersion, skewness, and kurtosis Measures of location  summarizes a data set by giving a “typical value” within the range of the data values that describes its location relative to entire data set  some common measures  minimum-maximum  minimum is the smallest value in the data set, denoted by MIN; maximum is the largest value in the data set, denoted by MAX  measures of central tendency  values about which the set of observations tend to cluster  also called as an average  most commonly used average: mean, median, and mode  percentiles, deciles, quartiles (fractiles) Mean  the sum of all observations in the data set divided by the total number of observations 𝑁 𝑥  𝜇 = 𝑖 =1 𝑖 𝑁 where 𝑥𝑖 = 𝑖𝑡ℎ observation of the variable 𝑥 and 𝑁 is the total number of observations in the data set o there is only one mean for a given data set o it is defined only for quantitative data o it reflects the magnitude of every observation o it is easily affected by the presence of extreme values

4 STAT 1 Notes by F5XS Elementary Statistics

o the sum of the deviations from the mean is equal to zero o the means of different sets/groups of comparable data may be combined when properly weighted Median  the middle value when the data values are arranged in ascending or descending order of magnitude 𝑋𝑁 +1 , 𝑁 is odd
2

 Md = 𝑋𝑁

+ 𝑋𝑁 +1
2 2 𝑁

, 𝑁 is even

o there is only one median for a data set o it is not amenable to further complications o the sum of the absolute deviations of the observations from a value, say 𝑐 , is smallest when 𝑐 is equal to the median; 𝑥𝑖 − 𝑐 is minimum when 𝑐 = 𝑀𝑑 Mode  the value in the data set which occurs most frequently o it may or may not exist o if it exists, there can be more than one mode for a given data set o it is determined by the frequency and not by the values of the observations o it is applicable for both quantitative and qualitative data Fractiles (percentiles, in this scenario)  numerical measures that give the position of a data value relative to the entire data set  the 𝑗𝑡ℎ percentile, denoted as 𝑃𝑗 , is the data value that separates the bottom 𝑗% of the data from the top 100 − 𝑗 %  finding the 𝑗𝑡ℎ percentile 1] arrange the data values in ascending/descending order to form an array 𝑗 2] find the location of 𝑃𝑗 in the array by computing 𝐿 = × 𝑁 where 𝑁 is the 100 total number of data values, and 𝑗 is the percentile of interest 3] a] if 𝐿 is a whole number, then 𝑃𝑗 is the mean of the data values in position 𝐿 and position 𝐿 + 1 b] if 𝐿 is not a whole number, then 𝑃𝑗 is taken as the data value in the next higher whole number position Measure of Dispersion  a quantity that describes the spread or variability of the observations in a given data set  the higher the value, the greater the variability in the data set  absolute measure of dispersion: range, inter-quartile range, variance, standard deviation  relative measure of dispersion: coefficient of variation Range (𝑹)  the difference between the maximum and minimum values in a data set  𝑅 = MAX − MIN  quick and easy to understand

5 STAT 1 Notes by F5XS  a rough measure of dispersion  usually reported together with the median Inter-Quartile Range (𝑰𝑸𝑹)  the difference between the third quartile and the first quartile  𝐼𝑄𝑅 = 𝑄3 − 𝑄1  not affected by the presence of extreme values  not as easy to calculate as the range Variance (𝝈𝟐 )  the average squared difference of the observations from the mean 𝑁 𝑥 − 𝜇 2  𝜍 2 = 𝑖 =1 𝑖 𝑁  one of the most useful measures of dispersion  all observations contribute in the computation  always non-negative  comes in the square of the unit of measure of the given set of values Standard Deviation (𝝈)  the average deviation of the observations from the mean  the positive square root of the variance  𝜍 = 𝜍 2  the unit of measure is the same as that of the observations  usually reported with the mean Coefficient of Variation (𝑪𝑽)  a relative measure that indicates the magnitude of variation relative to the magnitude of the mean, expressed in percent  𝐶𝑉 = 𝜍 𝜇 × 100%  this measure of dispersion is unitless  used to compare dispersion of two or more data sets with the same or different units  the higher the 𝐶𝑉, the more variable is the data set relative to its mean Chebyshev’s Rule  permits us to make statements about the percentage of observations that must be within a specified number of standard deviation from the mean  the proportion of any distribution that lies within 𝑘 standard deviations of the mean is at least 1 − 1 𝑘 2 where 𝑘 is any positive number larger than 1  for any data set with mean 𝜇 and standard deviation 𝜍, the following statements apply:  at least 75% of the observations are within 2𝜍 of its mean  at least 75% of the observations are within 3𝜍 of its mean Elementary Statistics

A distribution is said to be symmetric about the mean if the distribution to the left of the mean is the “mirror image” of the distribution to the right of the mean Measure of Skewness  describes the degree of departure of the distribution of the data from symmetry 3 𝜇 − Md  𝑆𝐾 = 𝜍

6 STAT 1 Notes by F5XS Elementary Statistics

 a symmetric distribution has 𝑆𝐾 = 0 since its mean is equal to its median and its mode Measure of Kurtosis  describes the extent of peakedness of flatness of the distribution of the data 𝑁 𝑥 − 𝜇 4  𝐾 = 𝑖 =1 𝑖 𝑁𝜍 4 − 3 Box-and-Whiskers Plot  indicates the symmetry of the distribution and incorporates measure of location to describe the variability of the observations  used for identifying outliers  diagram is made up of a box which lies between the first and third quartiles  the whiskers are the straight lines extending from the ends of the box to the smallest and larges values (𝑄1 − 1.5𝐼𝑄𝑅 and 𝑄3 + 1.5𝐼𝑄𝑅) that are not outliers  the outliers are denoted by dots

A random experiment is a process of drawing observations capable of repetition under the same conditions with well-defined possible outcomes. Sample space  set or collection of all possible outcomes of a random experiment  may either be finite or infinite  elements of the sample space are referred to as sample points Event  a subset of the sample space  may either be simple or compound  observing an element of an event indicates the occurrence of the event A probability is a numerical value ranging from 0 to 1 that measures the likelihood of an event occurring. There are three approaches to assigning probability:  a priori approach  utilizes an experimental model whose underlying assumptions are used to measure the likelihood of an event  assumptions are conditions on the likelihood of an event  for a random experiment with the assumption of an equally-likely sample 𝑁 𝐸 space, the probability of event 𝐸 , denoted as 𝑃[𝐸 ] is defined as 𝑃 𝐸 = 𝑁
𝑆

where 𝑁[𝐸 ] is the number of elements of event 𝐸 and 𝑁[𝑆] is the total number of possible outcomes of the random experiment  for a random experiment with the assumption of an unequally-likely sample space, the probability of event 𝐸 , denoted as 𝑃[𝐸 ], is defined as 𝑃 𝐸 = 𝜔 𝑗 𝜖𝐸 𝑃 {𝜔𝑗 } = 𝜔 𝑗 𝜖𝐸 𝑃 𝑗 where 𝑃 𝜔𝑗 or 𝑃 𝑗 is the probability o the 𝑗th element of the event  a posteriori approach  utilizes the relative frequency of the occurrence of an event in repeated trials of the random experiment as the probability of the event number of occurences of 𝐸  𝑃 𝐸 =
number of trials

7 STAT 1 Notes by F5XS Elementary Statistics

 subjective approach utilizes one’s personal judgment and knowledge in assessing how likely an event will occur Properties of 𝑃[𝐸 ]  0 ≤ 𝑃 𝐸 ≤ 1  for a random experiment with sample space 𝑆, 𝑃 𝑆 = 1  if 𝐸1 , 𝐸2 , 𝐸3 , …, 𝐸𝑛 are mutually disjoint events in 𝑆, then 𝑃 𝐸1 ∪ 𝐸2 ∪ 𝐸3 ∪ ⋯ ∪ 𝐸𝑛

=𝑃𝐸1+𝑃𝐸2+𝑃𝐸3+⋯+𝑃[𝐸𝑛]

Given the probability of an even 𝐸 , we say that 𝐸 is a sure event if 𝑃 𝐸 = 1 while it is an impossible event if 𝑃 𝐸 = 0. The complementary of an event is the set of all outcomes in the sample space 𝑆 not in 𝐸 , denoted by 𝐸 𝐶 . 𝑃 𝐸 𝐶 = 1 − 𝑃[𝐸 ]. Two events 𝐸1 and 𝐸2 are said to be mutually exclusive if they have no common element and are therefore mutually exclusive. This they cannot happen simultaneously. Two events 𝐸1 and 𝐸2 are said to be independent if the likelihood of the occurrence 𝐸1 is not affected by the occurrence of 𝐸2 . The union of two events consist the elements of 𝐸1 but not in 𝐸2 , elements of 𝐸2 but not in 𝐸1 , and elements of both 𝐸1 and 𝐸2 . The intersection of two events consists of elements found in both 𝐸1 and 𝐸2 . Observing the intersection of two events implies the simultaneous occurrence of the two events. The sum probability of two events 𝐸1 and 𝐸2 in 𝑆 is defined mathematically as… 𝑃 𝐸1 ∪ 𝐸2 = 𝑃 𝐸1 + 𝑃 𝐸2 − 𝑃 𝐸1 ∩ 𝐸2 The conditional probability 𝐸1 given 𝐸2 is defined mathematically as… 𝑃[𝐸1 ∩ 𝐸2 ] 𝑃 𝐸1 𝐸2 = , 𝑃 𝐸2 > 0 𝑃 𝐸2

A random variable is a rule or function that assigns exactly one real number to every possible outcome of a random experiment. Discrete random variables take on a set of distinct possible values or a countably infinite number of possible values. Continuous random variables take on any value within a specified interval or continuum of values. Probability of values of 𝑥 on a binomial experiment: 𝑛 𝑘 𝑃 𝑋 = 𝑥 = 𝑝 1 − 𝑝 𝑛−𝑘 𝑘 where 𝑛 is the number of trials, 𝑘 is the number of “successes”, and 𝑝 is the probability of getting a “success” given 𝑥 . Probability of values of 𝑥 on a random experiment without replacement: 𝑃 𝑋 = 𝑥 = 𝑠
𝑘 𝑡−𝑠 𝑛−𝑘 𝑡 𝑛

where 𝑛 is the number of trials, 𝑘 is the number of “successes”, 𝑠 is the number of elements under “success”, and 𝑡 refers to the total number of elements.

8 STAT 1 Notes by F5XS Elementary Statistics

Expected Value of a Random Variable  denoted by 𝜇𝑥 or 𝐸 [𝑋]  interpreted as the long-run average of a random variable  for a discrete random variable, it is computed as 𝜇𝑥 = 𝐸 𝑋 =

all 𝑥 𝑥𝑃

𝑋 = 𝑥

Standard Deviation of a Random Variable  denoted by 𝜍𝑥  measures the average deviation of the values of the random variable from its mean 2  for a discrete random variable, it is computed as 𝜍𝑥 = all 𝑥 𝑥 − 𝜇𝑥 𝑃 𝑋 = 𝑥 Probability distribution of a continuous random variable  the probability that a random variable takes on an exact value is zero, i.e. 𝑃 𝑋 = 𝑥 = 0, for a continuous random variable  the probability distribution is specified by a function 𝑓 𝑥 from which probability statements are made Properties of a probability density function  the total area under the curve is 1  the probability that the random variable 𝑋 will take on a value between two quantities 𝑥1 and 𝑥2 is given by the area under the curve bounded by the lines 𝑋 = 𝑥1 and 𝑋 = 𝑥2

A continuous random variable 𝑋 is said to be a normal random variable if it follows a normal probability distribution specified by… 1 𝑥−𝜇 2 1 𝑓 𝑥 = 𝑒 −2 𝜍 𝜍 2𝜋 Properties of a normal curve  it is bell-shaped and unimodal  it is symmetric at 𝑋 = 𝜇  it is asymptotic to the 𝑋 axis  the total area under the curve is 1  it has a distribution with…  68% of the observations within 𝜇 − 𝜍, 𝜇 + 𝜍  95% of the observations within 𝜇 − 2𝜍, 𝜇 + 2𝜍  99.7% of the observations within 𝜇 − 3𝜍, 𝜇 + 3𝜍 Standard Normal Distribution  has a normal distribution with mean equal to 0 and variance equal to 1  the random variable which follows the standard normal distribution is referred to as the standard normal variate, denoted by 𝑍 The 𝒁 table summarizes the cumulative probability distribution for 𝑍. Rules in computing probabilities  𝑃 𝑍 = 𝑎 = 0, hence 𝑃 𝑍 ≤ 𝑎 = 𝑃 𝑍 < 𝑎  𝑃[𝑍 ≤ 𝑎] can be obtained directly from the 𝑍 table  𝑃 𝑍 > 𝑎 = 1 − 𝑃 𝑍 ≤ 𝑎

9 STAT 1 Notes by F5XS  𝑃 𝑍 > −𝑎 = 𝑃[𝑍 < 𝑎]  𝑃 𝑍 < −𝑎 = 𝑃[𝑍 > 𝑎]  𝑃 𝑎1 < 𝑍 < 𝑎2 = 𝑃 𝑍 < 𝑎2 − 𝑃[𝑍 < 𝑎1 ] Transformation Theorem: Given a normal random variable 𝑋 with mean 𝜇 and 𝑋−𝜇 variance 𝜍 2 , then 𝑍 = follows the standard normal distribution. 𝜍

Elementary Statistics

Methods of drawing conclusions  deductive method  draws conclusions from general to specific  assumes that any part of the universe will bear the observed characteristics of the universe  conclusions are stated with certainty  inductive method  draws conclusions from specific to general  assumes that the characteristics observed from a part of the universe is likely to hold true for the whole universe  conclusions are subject to uncertainty Inferential statistics makes use of the inductive method of drawing conclusions. Reasons on why sampling is done:  reduced cost  greater speed or timeliness  greater efficiency and accuracy  greater scope  convenience  necessity  ethical considerations Two types of samples  probability samples  samples are obtained using some objective chance mechanism, thus involving randomization  require the use of a complete listing of the elements of the universe called the sampling frame  probabilities of selection are known  generally referred to as random samples  allow drawing of valid generalizations about the universe/population  non-probability samples  samples are obtained haphazardly, selected purposively or are taken as volunteers  the probabilities of selection are unknown  should not be used for statistical inference  result from the use of judgment sampling, accidental sampling, purposively sampling, and the like

10 STAT 1 Notes by F5XS Elementary Statistics

Methods of probability sampling  simple random sampling  most basic method of drawing a probability sample  assigns equal probabilities of selection to each possible sample  results to a simple random sample  simple random sampling without replacement does not allow repetitions of selected units in the sample while simple random sampling with replacement allows repetitions of selected units in the sample  stratified random sampling  the universe is divided into 𝐿 mutually exclusive sub-universes called strata  independent simple random samples are obtained from each stratum  advantages of stratification  gives a better cross-section of the population  simplifies the administration of the survey/data gathering  the nature of the population dictates some inherent stratification  allows one to draw inferences for various subdivisions of the population  increases the precision of the estimates generally  systematic random sampling  adopts a skipping pattern in the selection of sample units  gives a better cross-section if the listing is linear in trend but has high risk of bias if there is periodicity in the listing of units in the sampling frame  allows the simultaneous listing and selection of samples in one operation  cluster sampling  considers a universe divided into 𝑁 mutually exclusive sub-groups called clusters  a random sample of 𝑛 samples is selected and their elements are completely enumerated  has simpler frame requirements  administratively convenient to implement  simple two-stage sampling 1] in the first stage, the units are grouped into 𝑁 sub-groups, called primary sampling units and a simple random sampling of 𝑛 primary sampling units are selected 2] in the second stage, from each of the 𝑛 primary sampling units selected with 𝑀 elements, simple random sampling of 𝑚 units, called secondary sampling units will be obtained Sampling is a process that…  can be repeatedly done under basically the same conditions  can lead to well-defined possible outcomes  is unpredictable Stratified random sampling  total number of units in the universe: 𝑁 = 𝐿 𝑖 =1 𝑁1  total number of units in the stratified sample: 𝑛 = 𝐿 𝑖 =1 𝑛1  total number of units in a sample stratum with equal allocation: 𝑛𝑖 = 𝑛 𝐿  total number of units in a sample stratum with proportional allocation: 𝑛𝑖 = 𝑁𝑖 𝑁 × 𝑛

11 STAT 1 Notes by F5XS SWR Number of possible samples Probability of selecting each sample 𝜇𝑋 = 𝐸 𝑋 = 𝜇
2 𝜍𝑋 = 𝑉 𝑋 = all 𝑋

Elementary Statistics SWOR 𝑁 𝑛 1 𝑁 𝑛 𝑁

1 𝑛 𝑁

𝑛 𝑥

− 𝜇𝑋 2 𝑃 𝑋 = 𝑥 =
all 𝑋

2 𝑥 2 𝑃 𝑋 = 𝑥 − 𝜇𝑋

SWR
2 𝜍𝑋 = 𝜍 𝑛 2

SWOR 𝜇𝑋 = 𝜇 2 2 𝜍𝑋 = 𝜍 𝑛 𝑁 − 𝑛 𝑁 − 1

The central limit theorem states that for any non-normal distribution with mean 𝜇 and variance 𝜍 2 , the sample mean approaches the normal distribution with mean 𝜇 and variance 𝜍 2 as the sample size increases (a sample size of 25 is large enough).

Estimation is concerned with finding a value or range of values for an unknown parameter.  an estimator of a parameter is a rule or a formula for computing the statistic using the sample data  usually denoted by a Greek letter with a “hat”, e.g. 𝜃 and 𝜇  in other cases, special symbols are used like 𝑋 for the sample mean as estimator of the population mean  an estimate is a numerical value of the estimator Some desirable properties of an estimator:  an estimator must be accurate  accuracy measure the closeness of an estimate to the true value  the difference between the expected value of the estimates and the parameter measures the accuracy of an estimator which is referred to as the bias BIAS 𝜃 , 𝜃 = 𝐸 𝜃 − 𝜃  an estimator must be precise  precision measure the closeness of the different possible values of the estimator to each other  the precision of an estimator can be measured by its variance or by its standard error: MSE 𝜃 = BIAS2 𝜃 , 𝜃 + VAR 𝜃 When estimating, the following factors (specifications) must be known to determine the appropriate sample size: Level of confidence desired, 1 − 𝛼 × 100%, also called the confidence coefficient, is the measure of confidence that the estimate obtained is near or the same as the true value of the parameter. Variability of the population being studied, 𝜍 2 , is a measure of how dispersed the population observations are from each other. A large sample is needed when the population is widely dispersed. When the population

12 STAT 1 Notes by F5XS Elementary Statistics

variability is unknown, it is estimated using information on the same or related variable from previous studies. Maximum allowable error, 𝑑 , also called the maximum tolerable error or margin of error, is the specified acceptable difference of the estimate and the parameter for a given level of confidence. The sample size, 𝑛, necessary to meet the above specifications is 𝑛 ≥ 2 𝑍𝑎 2 𝜍 𝑑 . However…:  the formula assumes drawing a simple random sample of size 𝑛 from a population of size 𝑁  always adjust 𝑛 to 𝑛∗ where 𝑛∗ = 𝑁𝑛 𝑁 + 𝑛 as a marginal difference will be noticed between 𝑛 and 𝑛∗ when 𝑛 is very, very small relative to 𝑁  when determining the sample size for estimating proportions, take 𝜍 = 𝑃 1 − 𝑃 , where 𝑃 is the “best” educated guess of the population proportion, usually equal to 0.50 because it is with this value that the maximum sample size is obtained A point estimate is a single number computed from a random sample which represents a plausible value of the parameter. It pinpoints a location or a point in the distribution of possible values of the random variable. Parameter Population Mean 𝜇 SRS 𝑋 = 𝑛
𝑖 =1 𝑋𝑖

StRS 𝐿 𝑛

𝑎 𝑛
2 𝑋𝑠𝑡

= 𝑖
=1 𝑤𝑖

𝑋𝑖 𝐿

Population Proportion 𝑃 𝑃𝑠𝑡

= 𝑝𝑠𝑡 = 𝐿
𝑖 =1 𝑊𝑖

𝑝𝑖 𝑃

= 𝑝 =

= 𝑋𝑖 − 𝑋 𝑛 − 1 𝑛 2 2 𝑖 =1 𝑋𝑖 − 𝑛𝑋 = 𝑛 − 1 𝑠 2 = 𝑉 𝑋 = 𝑠 𝑛
2 𝑉 𝑋 = 𝑠 𝑛 2 𝑛 𝑖 =1 𝑖 =1 𝑊𝑖

𝑃𝑖 𝑛
𝑗 =1 𝑤𝑗

Population Variance 𝜍 2 Variance of the Sample Mean (with replacement) Variance of the Sample Mean (without replacement) Standard Error of the Sample Mean (with replacement)

2 𝑠𝑠𝑡 = 𝑋𝑗

− 𝑛 𝑗 =1 𝑤𝑗 𝐿 𝑋𝑠𝑡

2 𝑉

𝑋𝑠𝑡 = 𝑖
=1 𝑤𝑖

2 𝑠 2 𝑛 𝑁𝑖 − 𝑛𝑖 𝑁𝑖 − 1 𝑁

− 𝑛 𝑁 − 1 𝑠 𝑛 𝑉

𝑋𝑠𝑡 𝐿

= 𝑖
=1 𝑤𝑖

2 𝑠 2 𝑛 𝐿 𝑠𝑋

= 𝑠𝑋𝑠𝑡

= 𝑖
=1 𝑤𝑖

2 𝑠 2 𝑛

13 STAT 1 Notes by F5XS Elementary Statistics 𝑠𝑋𝑠𝑡 Standard Error of the Sample Mean (without replacement) 𝑠𝑋 = 𝑠 𝑁 − 𝑛 𝑛 𝑁 − 1 = 𝑖
=1 𝐿 𝑤𝑖

2 𝑠 2 𝑁𝑖 − 𝑛𝑖 𝑛 𝑁𝑖 − 1

An interval estimate is a range of values computed from a random sample, which represents an interval of plausible values for the unknown value of the parameter of the population. When some measure of certainty or confidence is attached to the interval estimate, the interval is referred to as a confidence interval estimate. The measure of certainty or confidence, also called the confidence level or confidence coefficient, provides information on how “confident” the researcher is in stating that the interval estimate obtained f rom the random sample contains the true value of the parameter. Different cases of obtaining the confidence interval estimate of the population mean, 𝜇: If a continuous random variable 𝑋 is normally distributed with variance 𝜍 2 , then a 1 − 𝛼 × 100% confidence interval about 𝜇 is 𝑋 ∓ 𝑍𝛼 2 𝜍 . 𝑛 If a continuous random variable 𝑋 is normally distributed with unknown mean 𝜇 and unknown variance 𝜍 2 , then a 1 − 𝛼 × 100% interval about 𝜇 is 𝑋 ∓ 𝑡𝛼 2 𝑛−1 𝑠 . 𝑛 As sample size 𝑛 becomes large (𝑛 ≥ 25), a 1 − 𝛼 × 100% conidence interval about the mean 𝜇 of an approximately normally distributed random variable 𝑋 with unknown variance 𝜍 2 is 𝑋 ∓ 𝑍𝛼 2 𝜍 . 𝑛 Estimation of population proportion, 𝑃: A point estimator of 𝑃 is given in the previous table with estimated standard error 𝑠𝑝 = 𝑝 1 − 𝑝 𝑝 1 − 𝑝 𝑛. A 1 − 𝛼 × 100%

confidence interval about 𝑃 is 𝑝 ∓ 𝑍𝛼 2 𝑛. For estimating the population proportion, the sample size is considered large enough when 𝑛𝑝 1 − 𝑝 > 3.

Hypothesis testing is a technique used to determine whether a specific conjecture about a parameter(s) of the population under study will be accepted or rejected. A statistical hypothesis is an assertion about the value of the population parameter or the form of the distribution. In any test of hypothesis, there are two types of statistical hypotheses involved. The null hypothesis, denoted as 𝐇𝐨, is usually a statement of equality signifying no difference, no change, no relationship, or no effect. The alternative hypothesis, denoted by 𝐇𝐚 is a contrasting statement which is accepted if sample data do not provide sufficient evidence to support the null hypothesis. A test statistic is chosen to decide which hypothesis is to accept or reject. A decision rule specifies the range of values of the test statistic which leads to the rejection of the null hypothesis in favor of alternative hypothesis. The critical or rejected region defines the range of values of the test statistic that are very unlikely to be obtained when the null hypothesis is true and will result to the rejection of the null hypothesis.

14 STAT 1 Notes by F5XS Elementary Statistics

Decision (based on sample evidence) Reject Ho Fail to reject Ho

True state of the population Ho is true Ho is false Correct Decision Type I Error Correct Decision Type II Error

The probability of obtaining a Type I error, denoted by P Type I Error = P Reject Ho Ho is True = 𝛼, often called the level of significance, measures the risk of rejecting a true null hypothesis. On the other hand, the probability of making a correct decision, computed as P Accept Ho Ho is True = 1 − α, is known as the level of confidence. The probability of obtaining a Type II error, denoted by P Type II Error = P Accept Ho Ho is False = 𝛽 measures the risk of accepting a false null hypothesis but can only be evaluated when the alternative hypothesis specifies an exact value. On the other hand, the probability of rejecting a false hypothesis, computed as P Reject Ho Ho is False = 1 − β, is also known as the power of the test. However, 𝛼 + 𝛽 ≠ 1, i.e. they are not complementary events.

For the test on one population mean, the test statistic is given by 𝑡𝑐 = 𝑛 𝑥 − 𝜇𝑜 𝑠 which is distributed as the Student’s 𝑡-distribution with 𝑑𝑓 = 𝑛 − 1. Assumptions of the test on the population mean:  the sample is randomly taken from the population of interest  the population is normally distributed Decisoin Rule: At a specified 𝛼, reject Ho if (when 𝑛 < 25) (when 𝑛 ≥ 25) 𝑡𝑐 > 𝑡𝑎 2,(𝑛−1) 𝑡𝑐 > 𝑍𝛼 2 𝑡𝑐 > 𝑡𝑎 ,(𝑛−1) 𝑡𝑐 > 𝑍𝑎 𝑡𝑐 < −𝑡𝑎 ,(𝑛−1) 𝑡𝑐 < −𝑍𝑎 Otherwise, fail to reject Ho.

Ha Ha: 𝜇 ≠ 𝜇𝑜 Ha: 𝜇 > 𝜇𝑜 Ha: 𝜇 < 𝜇𝑜

Test Procedure Two-tailed 𝑡-test One-tailed 𝑡-test One-tailed 𝑡-test

For the test on one population proportion, the test statistic is given by 𝑍𝑐 = 𝑛(𝑃 − 𝑃𝑜 ) which is approximately distributed as standard normal. 𝑃𝑜 1 − 𝑃𝑜

Assumptions of the test on the population proportion:  the mean of 𝑃 is 𝜇𝑃  the standard deviation of 𝑃 is 𝜍𝑃 = 𝑛  the shape of the distribution is approximately normal for large samples 𝑃 1 − 𝑃

15 STAT 1 Notes by F5XS Elementary Statistics Decisoin Rule: At a specified 𝛼, reject Ho if (when 𝑛 ≥ 25) 𝑍𝑐 > 𝑍𝛼 2 𝑍𝑐 > 𝑍𝑎 𝑍𝑐 < −𝑍𝑎 Otherwise, fail to reject Ho.

Ha Ha: 𝑃 ≠ 𝑃𝑜 Ha: 𝑃 > 𝑃𝑜 Ha: 𝑃 < 𝑃𝑜

Test Procedure Two-tailed approximate 𝑡-test One-tailed approximate 𝑡-test One-tailed approximate 𝑡-test

In the estimation and test of hypothesis on two population means, we have two cases of obtaining a test statistic, depending on the relation of the obtained samples. If samples are related from each other… Parameter Estimator Point estimators 𝑛 𝑖 =1 𝑑𝑖 Mean difference, 𝜇𝐷 𝑑 = 𝑛 Standard deviation of 𝑑𝑖 ’s Standard error of 𝑑 𝑠𝑑 = 𝑛
𝑖 =1 𝑠𝑑

= 𝑛 A 1 − 𝛼 × 100% confidence interval estimator 𝑑 ∓ 𝑡𝛼 2 𝑛−1 𝑠𝑑 or 𝑑 ∓ 𝑍𝛼 2 𝑠𝑑 Mean difference, 𝜇𝐷 Decisoin Rule: At a specified 𝛼 , reject Ho if (when (when 𝑛 < 25) 𝑛 ≥ 25) 𝑡𝑐 > 𝑡𝑎
2 ,(𝑛−1) 𝑑𝑖

− 𝑑 𝑛 − 1 𝑠𝑑

2

Ha

Test Procedure Two-tailed 𝑡-test on two population means using related samples One-tailed 𝑡-test on two population means using related samples One-tailed 𝑡-test on two population means using related samples

Ha: 𝜇𝐷 ≠ 𝐷𝑜 Ha: 𝜇𝐷 > 𝐷𝑜 Ha: 𝜇𝐷 < 𝐷𝑜 𝑡𝑐

> 𝑍𝛼 𝑡𝑐 > 𝑍𝑎

2 𝑡𝑐

> 𝑡𝑎 ,(𝑛−1) 𝑡𝑐 < −𝑡𝑎 ,(𝑛−1) 𝑡𝑐

< −𝑍𝑎

Otherwise, fail to reject Ho. If samples are independent from each other… Parameter Estimator Point estimators Mean difference, 𝜇𝐷 𝑑 = 𝑥1 − 𝑥2 2 2 𝑠1 𝑛1 − 1 + 𝑠2 𝑛2 − 1 Standard deviation of 𝑑𝑖 ’s 𝑠𝑝 = 𝑛1 + 𝑛2 − 2 Standard error of 𝑑 𝑠𝑝 =
2 𝑠𝑝

1 1 + 𝑛1 𝑛2

16 STAT 1 Notes by F5XS Elementary Statistics

Mean difference, 𝜇𝐷

A 1 − 𝛼 × 100% confidence interval estimator 𝑑 ∓ 𝑡𝛼 2 𝑛−1 𝑠𝑑 or 𝑑 ∓ 𝑍𝛼 2 𝑠𝑝

In the test of hypothesis on more than two population means, the appropriate test to compare more than two population means is the 𝐹 -test which can be performed using the analysis of variance technique. The Ho and Ha will always be the same in the ANOVA technique. For the Ho, it is hypothesized that there is no difference between the given population means. For the Ha, it is hypothesized that at least one of the population means differ. There are three assumptions considered essential for the 𝐹 -test to be valid: the samples should come from normally distributed populations; the variances of the populations should be equal, or often referred to as homoscedasticity; and the errors must be mutually independent. Source of Variation Among Populations Within Populations Total Ha Ha: 𝜇𝑖 ≠ 𝜇𝑗 |𝑖, 𝑗 ∈ 1, 𝑡 Degrees of Freedom 𝑡 − 1 𝑖
=1 𝑡

Sum of Squares 𝑡

Mean Squares ASS A𝑑𝑓 WSS 𝑊𝑑𝑓 𝑭𝒄 𝑇𝑖

2 GT 2 − 𝑛𝑖 𝑛 𝑠𝑖2 𝑛𝑖 − 1 𝑛

− 𝑡 𝑖
=1

AMS WMS 𝑛

− 1

ASS + WSS Decisoin Rule: At a specified 𝛼, reject Ho if 𝐹𝑐 > 𝐹𝛼 , 𝑡−1
, 𝑛−𝑡

Test Procedure One-way analysis of variance

Otherwise, fail to reject Ho.

The Pearson’s correlation coefficient, denoted by 𝜌, is a parameter which gives a measure of linear relationship between two variables in the population 𝑠 and is estimated using 𝑟 = 𝑋𝑌 𝑠𝑋 𝑠𝑌 . The covariance between 𝑋 and 𝑌 measures the covariation between the two variables and is computed using: 𝑛 𝑛 𝑁 𝑖 𝑛 𝑖 =1 𝑋𝑖 𝑖 =1 𝑌 𝑋𝑖 − 𝑋 𝑌𝑖 − 𝑌 𝑛 =1 𝑋𝑖 𝑌 𝑖 − 𝑛 𝑠𝑋𝑌 = = 𝑛 − 1 𝑛 − 1 𝑖
=1

The test statistic to be used to test the statistical significance of the sample correlation coefficient is: 𝑟 𝑛 − 2 𝑡𝑐 = 1 − 𝑟 2 Ha Ha: 𝜌 ≠ 0 Ha: 𝜌 > 0 Ha: 𝜌 < 0 Test Procedure Two-tailed 𝑡-test One-tailed 𝑡-test One-tailed 𝑡-test Decisoin Rule: At a specified 𝛼, reject Ho if (when 𝑛 < 25) (when 𝑛 ≥ 25) 𝑡𝑐 > 𝑡𝑎 2,(𝑛−2) 𝑡𝑐 > 𝑍𝛼 2 𝑡𝑐 > 𝑡𝑎 ,(𝑛−2) 𝑡𝑐 > 𝑍𝑎 𝑡𝑐 < −𝑡𝑎 ,(𝑛−2) 𝑡𝑐 < −𝑍𝑎 Otherwise, fail to reject Ho.

17 STAT 1 Notes by F5XS Elementary Statistics

The estimated line for a sample of 𝑛 pairs of observations is 𝑌1 = 𝑏0 + 𝑏1 𝑋𝑖 , where 𝑏0 is the sample regression constant and 𝑏1 is the sample regression coefficient. A measure of the adequacy of the model is the coefficient in determination and it gives values on a scale of 0% to 100%. It is the proportion of the total variability in 𝑌 that can be explained or accounted for by 𝑌’s relationship with 𝑋 . 𝑠𝑋𝑌 2 𝑅 2 = 𝑟 2 = 𝑠𝑋 𝑠𝑌 In the 𝜒 2 goodness-of-fit test, the null and alternative hypotheses remain the same for every sample: Ho: The observed frequencies are in agreement with the expected frequencies Ha: The observed frequencies are not in agreement with the expected frequencies The test statistic follows the 𝜒 2 distribution with 𝑘 − 1 degrees of freedom and is computed as: 𝑘
2 𝜒𝑐 = 𝑖 =1 2 𝑂𝑖

− 𝐸𝑖 𝐸𝑖

2 𝑘

= 𝑖
=1 𝑂𝑖

2 − 𝑛 𝐸𝑖

In the 𝜒 test of independence, the null and alternative hypotheses remain the same for every sample: Ho: The row variable and the column variable are not associated Ha: The row variable and the column variable are associated The test statistic follows the 𝜒 2 distribution with 𝑟 − 1 𝑐 − 1 degrees of freedom and is computed as: 𝑟 𝑐 𝑟 𝑐 2 2 𝑂𝑖𝑗 𝑂𝑖𝑗 − 𝐸𝑖𝑗 2 𝜒𝑐 = = − 𝑛 𝐸𝑖𝑗 𝐸𝑖𝑗 𝑖
=1 𝑗 −1 𝑖 =1 𝑗 =1

Ha Ha: see above

Test Procedure 𝜒 2 goodness-of-fit test 𝜒 2 test of independence

Decisoin Rule: At a specified 𝛼, reject Ho if 2 2 𝜒𝑐 > 𝜒𝛼 , 𝑘−1 2 2 𝜒𝑐 > 𝜒𝛼 , 𝑟−1 𝑐−1 Otherwise, fail to reject Ho. 𝜒

2 tests always require the following assumptions:  a simple random sample of size 𝑛 must be drawn from the population  the categories are non-overlapping  an observation can only belong to one and only one category  no more than 20% of the categories have expected frequencies less than 5 and no expected frequency is less than 1