You are on page 1of 148

SESSION 1: INTRODUCTION TO STATISTICS AND DATA COLLECTION

I – Introduction to statistics:
1. What is Statistics?
- Statistics is the science of collecting, organizing, analyzing, interpreting, presenting data and
drawing conclusions from data.
- Example:
A teacher wants to get a better overview of student performance in the classroom so she has to do
these following steps:
● The teacher collects the process scores and test scores of students (collect)
● She compiles a score table (organize)
● Then the teacher calculates the average score of each student and the average score of the
whole class (analysis)
● Therefore, she is able to evaluate the learning situation of each student and the whole class
(interpreting), adjusts the program and teaching methods (presenting and conclusion).

- Population: The entire group sharing common characteristics.


- Sample: A representative subset of the population chosen for study or analysis.

2. Overview of Statistics:

1
3. Two Branches of Statistics:

- Statistics: The branch of mathematics that transforms data into useful information for decision
makers. Statistics can be applied in many fields including physical and social science, business,
government, and manufacturing.
- Statistics is separated into 2 categories:

Descriptive Statistics Inferential Statistics

- Collecting, summarizing, presenting - Using data collected from a small group


and analyzing data to draw conclusions about a larger group

- Focusing on summarizing and - Focusing on making predictions,


describing the main features of a generalizations, and inferences about a
dataset, including measures of central population based on a sample of data
tendency (such as mean, median, and utilizing probability theory and statistical
mode), measures of variability (like hypothesis testing
range and standard deviation), and
the distribution of the data

II – Data collection:
1. Variables and data:
a. Data Terminology

2
- Data: are facts and figures… collected for analysis, presentation and interpretation.
- An observation: a single member of a collection of items that we want to study, such as a
person, firm, or region.
Example: an employee, or an invoice mailed last month
- A variable: a characteristic about the items that we want to study (e.g., student name, Gender,
DOB).
Example: an employee’s income or an invoice amount.
- Data set: all the values of all of the variables for all of the observations we chose.
Data usually are entered into a spreadsheet or database as an n X m matrix

Example:

Income of a graduate 2023

Jan 5.000.000đ

Feb 6.000.000đ

Mar 6.500.000đ

Apr 4.000.000đ

May 7.000.000đ

Jun 6.000.000đ

Jul 5.500.000đ

Aug 5.500.000đ

Sep 4.500.000đ

Oct 6.500.000đ

Dec 7.000.000đ
From the dataset:

- Variable is the income each month


- Observation is the income of July
b. Categorical and Numerical Data
- A data set may contain a mixture of data types. Two broad categories:

3
● Categorical (qualitative) data: values that are described by words rather than numbers -
nonnumerical values.
+ Verbal label (labels describing different categories or groups)
+ Coded (codes representing different categories or groups)
● Numerical (quantitative) data: arise from counting, measuring something, or some kind of
mathematical operation. Two types: Discrete (integers), Continuous (physical measurements,
financial variables)

a. Time Series Data and Cross-Sectional Data


- Time series Data: observation in the sample represents a different equally spaced point in
time (years, months, days). The periodicity is the time between observations.
🡪 trends and patterns over time
Ex: a firm’s sales, market share, debt/ equity ratio, employee absenteeism, inventory turnover, and
product quality ratings

- Cross-sectional Data: observation represents a different individual unit (e.g., a person, firm,
geographic area) at the same point in time.
→ variation among observations and relationships
Ex: daily closing prices of a group of 20 stocks recorded on December 1, 2015.

- Combine the two data types to get pooled cross-sectional and time series data.

Ex: monthly unemployment rates for the 13 Canadian provinces or territories for the last 60 months

4
2. Level of measurement:
- Four levels of measurement for data: nominal, ordinal, interval, and ratio

a. Nominal Measurement:
- Nominal data: the weakest level of measurement and the easiest to recognize, identify a
category. “Nominal” data are the same as “qualitative”, “categorical” or “classification” data. The only
permissible mathematical operations are counting (e.g., frequencies).

Example:

● Hair color: Hair color is a nominal variable because there is no order to the different colors.
We can have black hair, brown hair, blonde hair, red hair, etc. These colors are not ranked in any way,
so it makes no sense to say that black hair is better than brown hair, or vice versa.
● Gender: The categories of gender might be "male," "female," "non-binary," and "other". These
categories are not ordered from high to low, and there is no inherent meaning to the order in which
they are listed.
● Zip codes: Zip codes are nominal variables because they are simply labels that are used to
identify different geographic locations. Note that although zip codes are series of numbers, there is no
order to zip codes, so it makes no sense to say that one zip code is better than another.

5
→ Nominal measurements have no natural order
b. Ordinal Measurement
- Ordinal data codes imply a ranking of data values. There is no clear meaning to the distance
between.
- Like nominal data, ordinary data lacks the properties that are required to compute many
statistics, such as the average. Ordinal data can be treated as nominal, but not vice versa.

Example:

● Customer satisfaction ratings: Customer satisfaction ratings can be ranked in order, from least
satisfied to most satisfied, such as very dissatisfied, somewhat dissatisfied, neutral, somewhat
satisfied, and very satisfied. This data is ordinal because there is a clear order to the ratings, but the
exact difference between each rating is not known.
● Educational levels: Educational levels can be ranked in order, from lowest to highest, such as
elementary school, middle school, high school, and college. This data is ordinal because there is a
natural order to the levels, but the exact difference between each level is not known.
→ Ordinal measurements have ordering, but the differences between have no clear meaning

c. Interval Measurement
- Interval data is not only a rank but also has meaningful intervals between scale points.
Intervals between numbers represent distances, which have a meaningful and constant
interpretation. The difference between any two values on the scale is always the same. However,
ratios are not meaningful for interval data, and the measurements do not have a true zero value

Example:

● Temperature: Temperature is often measured in degrees Fahrenheit or Celsius. The difference


between 50 degrees Fahrenheit and 60 degrees Fahrenheit is the same as the difference between 20
degrees Celsius and 30 degrees Celsius.
● IQ scores: IQ scores are often measured on a scale of 0 to 200. The difference between an IQ
score of 100 and an IQ score of 110 is the same as the difference between an IQ score of 150 and an
IQ score of 160.

6
→ However, it is important to note that interval data does not have an absolute zero point. This
means that we cannot say that 0 degrees Celsius is the same as no temperature, or that an IQ score of
0 represents no intelligence.

→ Interval measurements have ordering and the differences between have meaning, but ratios
have no meaning

d. Ratio Measurement
- The data have all the properties of interval data and the ratio of two values is meaningful. The
measurements have a true zero value. We can recode ratio measurements downward into ordinal or
nominal measurements (but not conversely).

Example:

● Height: A person's height can be measured in centimeters, inches, or feet. Zero represents no
height, and the distance between each unit of measurement is equal. For example, the difference
between 5 feet and 6 feet is the same as the difference between 1 meter and 2 meters.
● Weight: A person's weight can be measured in kilograms, pounds, or grams. Zero represents
no weight, and the distance between each unit of measurement is equal. For example, the difference
between 50 kilograms and 55 kilograms is the same as the difference between 110 pounds and 121
pounds.
→ Ratios measurements have ordering, the differences between have meaning, and ratios have
meaning

7
3. Sampling concepts:
- A population: the collection of all items of interest or under investigation, could be finite or
infinite.
- A sample: looking only at some items selected from the population - an observed subset of
the population.

Sampling is used when: Infinite Population, Destructive Testing, Timely Results, Accuracy, Cost,
Sensitive Information. In general, a sample is usually preferred because the observation of a whole
population costs so much resources.

- A census: an examination of all items in a defined population. Census is used when: Small
population, large sample size, database exist, legal requirement.

- Rule of Thumb: A population may be treated as infinite when the population size N is at least
20 times the sample size n (i.e., when N/n ≥ 20)
- Parameters and Statistics:

8
● A parameter is a specific characteristic of a population
● A statistic is a specific characteristic of a sample
- From a sample of n items, chosen from a population, we compute statistics that can be used
as estimates of parameters found in the population.
● Population mean = µ
● Population proportion = π
● Sample mean = x
● Sample proportion = p
Ex: Imagine you are interested in determining the average income (mean) of all residents (population)
in a city. In this case:
● Parameter: The population mean (µ) would be the actual average income of all residents in
the entire city.
Now, conducting a survey of a specific neighborhood in that city, you collect data from a
sample of 100 households:
● Statistic: The sample mean (x̄) would be the average income of the 100 households you
surveyed. This is an estimate that you can use to make an inference about the population
mean.

- Target Population:
● The target population contains all the individuals in which we are interested
● The sampling frame is the group from which we take the sample

4. Sampling methods:

9
- Two main categories: Statistical Sampling and Nonstatistical Sampling
- Statistical Sampling (Random Sampling Methods)

a. Simple Random Sampling


- We denote the population size by N and the sample size by n. In a simple random sample,
every item in the population of N items has the same chance of being chosen in the sample of n
items.

Example: Select one student at random from a list of 15 students to do an oral test.

Sampling without replacement: once an item has been selected to be included in the sample, it
cannot be considered for the sample again. Problem when our sample size n is close to our
population size N → Cause bias/ tendency to overestimate/ underestimate when doing the
research

A finite population is effectively infinite if the sample is less than 5 percent of the population (if
n/N < .05)

Sampling with replacement: the same random number could show up more than once.
Duplicates are unlikely when n is much smaller than N

b. Systematic Sampling
- Systematic sample: choose every kth item from a sequence or list, starting from a randomly
chosen entry among the first k items on the list.
● Decide on sample size: n
● Divide frame of N individuals into n groups of k individuals: k=N/n
● Randomly select one xth individual from the first group

10
● Select every xth individual in other groups thereafter

Example:

Imagine you are a teacher and you want to survey your students about their favorite subject in
school. You have a class of 30 students and you want to select a sample of 10 students to participate
in the survey.

→ Systematic sampling method:

1. Determine the sampling interval: Divide the total population size (30 students) by the desired
sample size (10 students) to get the sampling interval. In this case, the sampling interval is 3.
2. Select a random starting point: Randomly select a student from the class as the starting point.
For example, let's say you randomly select student number 5.
3. Select the sample: Starting from the randomly selected student, select every third student
until you have 10 students. In this case, the sample would be students number 5, 8, 11, 14, 17, 20, 23,
26, and 29.

c. Stratified Sampling
- Divide population into homogeneous subgroups (called strata) according to some common
characteristic (e.g. age, gender, occupation)
- Select a simple random sample from each subgroup
- Combine samples from subgroups into one

11
Example:

Imagine you're a teacher and you want to conduct a survey to understand the study habits of
your students. Your students come from three different grade levels: 9th grade, 10th grade, and 11th
grade.

In this scenario, stratified sampling involves dividing your students into three strata based on
their grade levels and then randomly selecting samples from each stratum.

1. Identify Strata:
- Stratum 1: 9th-grade students
- Stratum 2: 10th-grade students
- Stratum 3: 11th-grade students
2. Determine Sample Size: Decide how many students you want to survey overall.
3. Randomly Sample Within Each Stratum: Randomly select a certain number of students from
each grade level. For example, if you want to survey 30 students in total, you might choose 10
students randomly from each grade.
4. Collect Data: Survey the selected students about their study habits.

d. Cluster Sampling
- Divide population into several “clusters” (e.g. regions), each representative of the population
● One-stage cluster sampling: randomly selected k clusters
● Two-stage cluster sampling: randomly select k clusters and then choose a random sample of
elements within each cluster.

Example:

12
Let's say you're a researcher studying the eating habits of people in a city. Instead of individually
selecting people, you decide to use cluster sampling.

1. Define Clusters: Divide the city into clusters based on geographical regions. For example, you
might have four clusters: District 1, District 2, District 3, District 4.
2. Randomly Select Clusters: Randomly choose two clusters out of the four. Let's say you select
District 1 and District 2 clusters.
3. Include All Members in Selected Clusters: Instead of surveying individuals, you survey
everyone in the selected clusters. So, in District 1 and District 2 clusters, you would survey all the
residents.

e. Nonstatistical Sampling (Non-Random Sampling Methods)

Examples in detail:

Judgment Sample: Let's say you're a manager in a company and you want to gather feedback
about a new workplace initiative. Instead of randomly selecting employees, you decide to use
judgment sampling.

1. Define Criteria: Identify specific criteria that you believe are relevant to the success of the new
initiative. For instance, you might consider employees who have been with the company for a
significant amount of time, or those who have experience with similar initiatives.
2. Use Expertise: Utilize your knowledge and experience as a manager to handpick employees
who you believe would provide valuable insights. This could involve considering individuals who have
shown a keen interest in similar projects before.
3. Select Participants: Choose a small number of employees based on your criteria. For example,
you might decide to interview five employees who meet the specific criteria you've outlined.

13
4. Conduct Interviews: Interview the selected employees to gather their opinions and feedback
on the new initiative.

Convenience sample: Let's say you're curious about your co-workers' opinions on a new office
policy, and you decide to use convenience sampling to gather quick feedback during lunchtime.

1. Select a Convenient Location: Choose a spot where your co-workers often gather during
lunch, like the breakroom or a common area.
2. Approach Easily Accessible Participants: Rather than selecting participants randomly or with
specific criteria, you opt for convenience. As people enter the breakroom, you approach them for a
quick chat about the new office policy.
3. Ask a Few Questions: Keep it brief and ask a few simple questions about their thoughts on the
new policy. For example, "Hey, what do you think about the new office policy on flexible work hours?"
4. Record Responses: Jot down or mentally note their responses. Since this method is more
about ease than representativeness, you're not aiming for a comprehensive survey but rather a quick
sense of opinions.
5. Repeat as Convenient: Continue this process during lunchtime over the next few days,
approaching co-workers as it's convenient for both you and them.

Focus Group: Let's say you work for Apple which is planning to launch a new version of an iPod,
and you want to gather in-depth insights from current iPod users. You decide to use a focus group
sampling method.

1. Identify Target Participants: Define your target group, in this case, current iPod users. This
might include people who use iPods for music, podcasts, and other media.
2. Recruit Participants: Reach out to a diverse group of iPod users, ensuring you have
participants who use different models and have varied experiences with their devices.
3. Schedule a Focus Group: Set up a time and place for a focus group session. This could be a
meeting room where participants can comfortably discuss their experiences.

14
4. Moderate the Discussion: As the moderator, guide the discussion by asking open-ended
questions about their experiences with iPods. For example, you might ask about their favorite
features, any challenges they've faced, and what improvements they'd like to see in a new version.
5. Record Insights: Take notes on the participants' responses and interactions during the focus
group. This qualitative data can provide rich, detailed insights into their thoughts and opinions.

f. Sources of Error

Example:
- Nonresponse bias:
● Phone Surveys Bias: If a survey only calls mobile phones and ignores landlines, it could
misrepresent opinions, especially among older individuals who primarily use landlines.
● Extreme Responses Online: In an online customer survey, low responses may lead to bias if
only customers with extreme experiences (very positive or negative) participate, missing
the views of those with moderate experiences.

- Selection bias:
● Hospital Study Bias: Research relying only on hospital data for a disease might introduce bias
by excluding individuals who never sought medical help, giving an incomplete picture.
● Volunteer Clinical Trials: Clinical trials with volunteers may not represent the whole population
if those with milder symptoms are more likely to participate, potentially skewing the results.

15
5. Surveys:
a. Survey

16
b. Questionnaire Design

- Begin with short, clear instructions.


- State the survey purpose.
- Assure anonymity
*Anonymity means the information collected is protected and used only for research purposes, it
doesn’t mean personal information is not collected
- Instruct on how to submit the completed survey.
- Break survey into naturally occurring sections
- Let respondents bypass sections that are not applicable (e.g., “if you answered no to question
7, skip directly to Question 15”).

17
EXERCISE

1. Determine the level of measurement. (Nominal, Ordinal, Interval, Ratio)

A. Firms described as small, medium, and large.

B. Sales revenue of firms

C. The number of years in operations

D. Statistics software (Eview, SPSS, STATA)

E. Student being rated as fail, average, good, excellent

F. Customer satisfaction based on 7-point Likert.

G. Industry codes

H. Income of people in Ho Chi Minh city

ANSWER:

A. Firms described as small, medium, and large: Ordinal

B. Sales revenue of firms: Ratio

C. The number of years in operations: Ratio

D. Statistics software (Eview, SPSS, STATA): Nominal

E. Student being rated as fail, average, good, excellent: Ordinal

F. Customer satisfaction based on 7-point Likert: Ordinal

G. Industry codes: Nominal

H. Income of people in Ho Chi Minh city: Ratio

18
SESSION 2: DESCRIBING DATA

I - Center:

1. Arithmetic Mean (mean): not use for outlier (an observation or data point that significantly
deviates from the overall pattern or trend of a dataset)
- The most common measure of central tendency

Example:

19
2. Median: The median (denoted M) is the 50th percentile or midpoint of the sorted sample data
set x1, x2, . . . , xn
- In an ordered array => most appropriate measure of central tendency for ordinal data,
“middle” number (50% above, 50% below)
- Median is not affected by extreme values
- Example: 1,2,7,8,9,10
- Position: (6+1)/2 =3.5
- Median: (7+8)/2=7.5
Explanation: Since there are 6 numbers, the median will be the average of the 3rd and 4th
numbers. => The 3rd number is 7, and the 4th number is 8.

3. Mode: The mode is the most frequently occurring data value


Not affected by extreme values (apply to nominal scale)
● Value that occurs most often
● Used for either numerical or categorical (nominal) data
● A data set may have multiple modes or no mode:

- Example: 3,5,2,8,5,3,9,6,5
Explanation: In this case, the number 5 appears most frequently, three times. Therefore, the
mode of the given set is 5.
- Example:
Lee’s scores: 60, 70, 70, 70, 80 Mean = 70, Median = 70, Mode = 70
Pat’s scores: 45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45

20
Sam’s scores: 50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none
Xiao’s scores: 50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50, 90

4. Geometric mean: The geometric mean (denoted G) is a multiplicative average, obtained by


multiplying the data values and then taking the nth root of the product. This is a measure of central
tendency used when all the data values are positive (greater than zero).
Mean rate of return:
- Measures the measure the profit or loss of an investment ( all types of stocks to bonds, real
estate, art, projects, etc.)over time

- Ri is the percentage change from the beginning of the period until the end.
Example:
An investment of $100,000 declined to $50,000 at the end of year one and rebounded to
$100,000 at end of year two:

a. Ordered array:
- Ordered array: a sequence of data ranked in a particular order
● Shows range (min to max)
● Signals -> variability
o larger ranges indicate higher variability
o smaller ranges suggest lower variability

21
● May help identify outliers (unusual observations)
● Large data => less useful
Example:
● Raw form: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
● Ordered array from smallest to largest : 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
b. The stem-and-leaf:
- The stem-and-leaf: a tool of exploratory data analysis (EDA) that seeks to reveal essential data
features in an intuitive way.

● Method: Separate the sorted data into:


+ leading digits (the stem)
+ trailing digits (the leaves)
Example:

22
● Reveal:
+ Central tendency
+ Dispersion

- A stem-and-leaf plot works well for small samples of integer data with a limited range but
becomes awkward when you have decimal data (e.g., $60.39) or multi digit data (e.g., $3,857).
c. Dot Plots:
- A dot plot is another simple graphical display of n individual values of numerical data.
- The basic steps in making a dot plot are to (1) make a scale that covers the data range, (2)
mark axis demarcations and label them, and (3) plot each data value as a dot above the scale at its
approximate location.
- A dot plot shows
● variability by displaying the range of the data.
● the center by revealing where the data values tend to cluster and where the midpoint lies.
● some things about the shape of the distribution if the sample is large enough.

23
d. Frequency distribution:
- Frequency distribution: is a table formed by classifying n data values into k classes called bins
- Example: Suppose you have a dataset representing the scores of 20 students in a math test:
78,85,92,78,90,85,80,88,92,78,85,88,90,92,78,80,85,90,88,80

Score Frequency

78 4

80 3

85 4

88 3

90 3

92 3

● A frequency distribution is a table containing class groupings. The corresponding


frequencies with which data fall within each grouping

e. Histogram:
- Definition: A histogram is a graphical representation of a frequency distribution.

24
- Y-axis:
● The number of data values (or a percentage) within each bin of a frequency distribution
● Frequency, relative frequency, or percentage
- X-axis:
● Ticks show the end points of each bin
● The class boundaries (or class midpoints)
- No gaps between bars

f. Frequency Polygons and Orgives (cumulative):


- Frequency Polygon: is a line graph of the cumulative frequencies that connects the midpoints
of the histogram intervals, plus extra intervals at the beginning and end so that the line will touch the
X-axis.
- Same purpose as histogram but is more visually attractive when you need to compare two or
more data sets (since more than one frequency polygon can be plotted on the same scale)
- Orgives (%): are graphs that can be used to determine how many data values lie above or
below a particular value in a data set (Siyavula, 2023).

25
g. Scatter Plots:
- A scatter plot shows n pairs of observations (x1, y2), (x2, y2), . . . , (xn, yn) as dots (or some other
symbol) on an X-Y graph. A scatter plot is a starting point for bivariate data analysis.

26
Example:

i. Time Series Plot:


- Show patterns in the values of a variable over time. (Only applies to sequences of
discrete-time data)
- The purpose is to describe features of data using different types of models -> usually applied
time-series analysis or forecasting.

j. Log scales (ratio scale):


- Useful -> time series (growing at a compound annual percentage rate)
● Ex: GDP, the national debt, or your future income).
- Equal distances => equal ratios
- Reveal: quantity growth

27
● increasing percent (concave upward or convex function),
● constant percent (straight line)
● declining percent (concave downward).

● the distance from 100 to 1,000 => same => distance from 1,000 to 10,000
● both have the same 10:1 ratio
- Suited: positive data values
- Displayed: vertical axis => reveal more detail for small data values

● Pictograms: a víual display in which data values are replaced by pictures

II- Deceptive graphs: graphical representations of data that are intentionally or unintentionally
designed in a way that misleads the viewer, distorts the data, or presents information in a manner
that can create a false or exaggerated impression.

28
1. Error 1: Nonzero Origin
- A nonzero origin will exaggerate the trend. Measured distances do not match the stated
values or axis demarcations. The accounting profession is particularly aggressive in enforcing this rule.
- Although zero origins are preferred, sometimes a nonzero origin is needed to show sufficient
detail. For instance, tracking a satellite's movement relative to a reference point on Earth requires a
nonzero origin to accurately represent its position in calculations, ensuring the relative position is
accounted for.

2. Error 2: Elastic Graph Proportions


- By shortening the X-axis in relation to the Y-axis, vertical change is exaggerated. For a time
series (X-axis representing time), this can make a sluggish sales or profit curve appear steep.
Conversely, a wide X-axis and short Y-axis can downplay alarming changes.

29
3. Error 3: Dramatic Titles and Distracting Pictures
- The title often is designed more to grab the reader’s attention than to convey the chart’s
content
4. Error 4: 3-D and Novelty Graphs
- Depth may enhance the visual impact of a bar chart, but it introduces ambiguity in bar height.

5. Error 5: Rotated Graphs:


- Making a graph 3-dimensional and rotating it through space, can make trends appear to
dwindle into the distance or loom alarmingly toward the reader.

6. Error 6: Unclear Definitions or Scales


- Missing or unclear units of measurement (dollars? percent?) can render a chart useless.
Gridlines help the viewer compare magnitudes but are often omitted to avoid graph clutter. For
maximum clarity in a bar graph, label each bar with its numerical value.

30
7. Error 7: Vague Sources
- Vague sources may indicate that the author lost the citation, didn’t know the data source, or
mixed data from several sources.
8. Error 8: Complex Graphs
- Complicated visual displays make the reader work harder. Keep your main objective in mind.

9. Error 9: Gratuitous Effects


- Slide shows often use color and special effects (sound, interesting slide transitions, spinning
text, etc.) to attract attention.

31
10. Error 10: Estimated Data
- In a spirit of zeal to include the “latest” figures, the last few data points in a time series are
often estimated. At a minimum, estimated points should be noted.
11. Error 11: Area Trick
- One of the most pernicious visual tricks is simultaneously enlarging the width of the bars as
their height increases, so the bar area misstates the true proportion.

32
SESSION 3: DESCRIPTIVE STATISTICS

I - Data types:

1. Quartiles:
- The quartiles (denoted Q1, Q2, Q3) are scale points that divide the sorted data into four
groups of approximately equal size, that is, the 25th, 50th, and 75th percentiles, respectively.

● First quartile position: Q1 = (n+1)/4


● Second quartile position: Q2 = (n+1)/2 (the median position)
● Third quartile position: Q3 = 3(n+1)/4
where n is the number of observed values

33
2. Box-and-whisker Plot:
Box-and whisker Plot shows
● center (position of the median Q2)
● variability (width of the “box” defined by Q1 and Q3 and the range between xmin and
xmax).
● shape (skewness if the whiskers are of unequal length and/or if the median is not in
the center of the box)

A Graphical display of data using 5-number summary:


- Minimum -- Q1 -- Median -- Q3 -- Maximum

3. Range
- Simplest measure of variation
- Difference between the largest and the smallest values in a set of data:

34
- Problems:
● ​considers the two extreme data values. It seems desirable to seek a broad-based measure of
variability that is based on all the data values x1, x2, . . . , xn

- Solution:
● Variance: Average (approximately) of squared deviations of values from the mean
o The square of the distance (xi-mean) is to avoid the case where the negative and
positive value distances cancel each other out, distorting the total value.
o Use (n-1) for the sample because the sample is often more biased than the population,
and - 1 is subtracted from the value taken as a measure of central tendency.
=> Variance is used to determine how varied the data points are compared to the measure
of Central Tendency.

35
- Standard deviation:

● Used to measure variation (most common)


● Show variation about the mean
● Same units as the original data

- Interquartile Range:
● Interquartile range = 3rd quartile – 1st quartile
● IQR = Q3 – Q1

36
- Values outside the inner fences: unusual
- Values outside the outer fences: outliers => determine the outliers
- In a Boxplot, Xmin and Xmax: smallest and highest values in the inner fences.

- Median is near the center of the interquartile range, distribution is symmetric (mean
approximately median), if mean<median then left-skewed, mean> median then right-skewed.

- Comparison Table of five measures of variability for a sample:

37
- Population:
● Population summary measures: parameters
● Population mean: the sum of the values in the population divided by the population size, N

+ μ = population mean
+ N = population size

● Population variance:

● Population Standard Deviation:

38
(measure variation, same units)

4. Shape:
- Describes how data are distributed
- Measures of shape:
- Skewness refers to the symmetry or asymmetry of the frequency distribution

- Kurtosis: represents too high / too low

- Chebyshev's Theorem: states that a certain proportion of any data set must fall within a
particular range around the central mean value which is determined by the standard deviation of the
data. Regardless of how the data are distributed, at least (1 - 1/k2) x 100% of the values will fall within
k standard deviations of the mean (for k > 1)

39
Ex: (1 - 1/22) x 100% = 75% ….... k=2 (μ ± kσ = μ ± 2σ )
Let μ = 72, σ = 8 (standard deviation)
=> At least 75% of the scores will be within the interval 72 ± 2.8 or [56,88] (regardless of how
the scores are distributed)
Note: Chebyshev's Theorem can can be apply any population with mean μ and standard deviation σ
- The Empirical Rule: Applies only to normal or bell-shaped distributions.
● Data distribution: approximately bell-shaped => interval μ ± kσ
(contain a know percentage of the data)

- Z Scores:
● Measure distance from the mean
=>-> compare a raw data (observation) to the average population or sample mean
Ex:
Z-score of 2.0: a value is 2.0 standard deviations from the mean
Z score > 3.0 or < -3.0: an outlier

40
Ex: Mean=14.0, Standard Deviation=3.0
=> What is the Z score of the value 18.5
Answer:

+ The value 18.5 is 1.5 standard deviations above the mean


+ Negative Z-score: a value is less than the mean
+ If only given std deviation of population then std deviation of sample = std deviation of
population/ square root of n
- Grouped data: involves categorizing observations into bins with specified intervals, and the
weighted mean and standard deviation can be estimated by assigning weights based on bin
frequencies, using the midpoints of bins in calculations.
● Weighted mean:
+ A sum: each data value a weight wj that represents a fraction of the total (i.e., the k weights
must sum to 1).

Ex: Your instructor give


+ A weight of 30 percent to homework, 20 percent to the midterm exam, 40 percent to the final
exam, and 10 percent to a term project (so that .30 + .20 + .40 + .10 = 1.00).
+ Your scores on these were 85, 68, 78, and 90. Your weighted average for the course would
be:
= (0.3 x 85) + (0.2 x 68) + (0.4 x 78) + (0.1 x 90) = 79.3 or 79.3
- Application:
● Accounting (weights for cost categories),

41
● Finance (asset weights in investment portfolios)
- Approximations for Grouped Data:
● Mean:

● Sample

● Variance:

● Standard Deviation:

4. Linear Relationship:
- The Covariance:
● Measures the direction of the linear relationship between two variables
● Population Covariance:

● Sample Covariance:

42
- Interpreting:
● cov(X,Y) > 0: X and Y move in the same direction
● cov(X,Y) < 0 X and Y move in the opposite directions
● cov(X,Y) = 0 X and Y are independent
- Coefficient of Correlation:
● Measures the relative strength of the linear relationship between two variables
● Sample Coefficient of correlation

-
-

43
- Features:
● Unit free
● Ranges between –1 and 1
○ cov(X,Y) closer to –1, the stronger the negative linear relationship
○ cov(X,Y) = 1 , the stronger the positive linear relationship
○ cov(X,Y) = 0 , the weaker the linear relationship

44
SESSION 4: PROBABILITY
I. Random experiments
1. Sample Space
- Random experiment is an observational process whose results cannot be known in advance.
- The set of all possible outcomes (denoted S) is the sample space for the experiment.
● A sample space with a countable number of outcomes is discrete.
+ Flip a coin, the sample space consists of 2 outcomes S = {Head, Tail}
+ Roll a die, the sample space consists of 6 outcomes S = {1, 2, 3, 4, 5, 6}
- If the outcome of the experiment is a continuous measurement, the sample space cannot be
listed, but it can be described by a rule. E.g. S = {all X such that X > 0}

2. Event

- An event is any subset of outcomes in the sample space


● A simple event or elementary event is a single outcome.
+ Flip a coin: S = {Head, Tail}
● A discrete sample space S consists of all the simple events (Ei): S = {E1, E2, …, En}
● A compound event consisting of two or more simple events
+ Flip a coin twice: S = {HeadHead, HeadTail, TailHead, TailTail}

II. Probability
- The probability of an event is a number that measures the relative likelihood that the event
will occur.
● The probability of an event A, denoted P(A), must within the interval from 0 to 1:

0 ≤ P(A) ≤ 1

● If P(A) = 0: The event cannot occur


● If P(A) = 1: The event is certain to occur

1. Assigning Probability

45
- Three distinct ways of assigning probability:

a. Empirical Approach

- Empirical Approach: estimation of probability for events of interest

by using past statistics.

- Counting the frequency of observed outcomes (f) defined in our experimental sample space
and dividing by the number of observations (n). The estimated probability is fn

b. Classical Approach
- A priori: the process of assigning probabilities before actually observe the event or try an
experiment.
- When flipping a coin, rolling a pair of dice cards, lottery numbers, and roulette, the nature of
the process allows us to envision the entire sample space.

Example: assumes that the coin has only two equally likely outcomes: heads (H) or tails (T). In
this case, the probability of getting heads (P(H)) or tails (P(T)) is 1/2 or 0.5To generalize the
classical approach, let's consider rolling a fair six-sided die. Each face of the die has one
number from 1 to 6. The classical probability of rolling any specific number, say 3, is 1/6. This is
because there is one favorable outcome (rolling a 3) out of six possible outcomes (rolling a 1,
2, 3, 4, 5, or 6).

c. Subjective Approach
- A subjective probability reflects someone’s informed judgment about the likelihood of an
event - needed when there is no repeatable random experiment.

46
Note: The subjective approach is usually only useful in cases of absence of information (cannot
assign possibilities and no historical data). Thus, later when more data is available, a different
approach can be evaluated when revising this problem.

III. Rules of probability


1. Complement of an Event

- The complement of an event A is denoted by A′ (or Ā) and consists of everything in the


sample space except event A.
● A and A′ together comprise the entire sample space:
+ P(A) + P(A′ ) = 1
+ P(A′ ) = 1 – P(A)

2. Union of Two Events


- The union of two events consists of all outcomes in the sample space S that are contained
either in event A or in event B or in both.

47
- The union of A and B is denoted:
● A union B
● “A or B”

3. Intersection of Two Events


- The intersection of two events A and B: the event consisting of all outcomes in the sample
space S that are contained in both event A and event B
- The intersection of A and B is denoted:
● A∩B
● “A and B”

4. General Law of Addition


- The probability of the union of two events A and B is the sum of their probabilities less the
probability of their intersection.

P(A ∪ B) = P(A) + P(B) – P(A ∩


B)

5. Mutually Exclusive Events


- Events A and B are mutually exclusive (or disjoint) if their intersection is the empty set (a set
that contains no elements). Events that cannot occur simultaneously

If A ∩ B = φ, then P(A ∩ B) = 0
Example:

48
● Event A = a day in January. Even B = a day in February

6. Special Law of Addition


- In the case of mutually exclusive events, since these events do not overlap, the addition law
reduces to:

P(A ∪ B) = P(A) + P(B)

7. Collectively Exhaustive Events


- Events are collectively exhaustive if their union is the entire sample space S (i.e., all the
events that can possibly occur).

Example: Randomly choose a period of time


Event A = weekday
Even B = weekend
Event C: January
Event D: Spring
● Events A, B, C and D are collectively exhaustive (but not mutually exclusive – a weekday can
be in January or in Spring)
● Events A and B are collectively exhaustive and also mutually exclusive (weekday and weekend
constitute 365 days of a year but cannot happen at the same time)

8. Conditional Probability
- The probability of event A given that event B has occurred is a conditional probability.

-
- Denoted P(A | B). The vertical line “ | ” is read as “given”.

𝑃(𝐴 ∩ 𝐵)
P(A | B) = for P(B) > 0
𝑃(𝐵)

49
9. Independent Events
- Two events A & B are independent if and only if:

P(A | B) = P(A)

- Events A and B are independent when the probability of one event is not affected by the fact
that the other event has occurred.

10. Multiplication Rules


- Using algebra, we can rewrite the formula of conditional probability

P(A ∩ B) = P(A | B).P(B)

Note: If A and B are independent, then P(A | B) = P(A) and the multiplication rule simplifies to

P(A ∩ B) = P(A ).P(B)

11. Odds of an Event


- In sports and games of chance, we define the odds in favor of an event A as the ratio of the
probability that event A will occur to the probability that event A will not occur. Its reciprocal is the
odds against event A.

50
51
a. Relationship with Probability
- The odds in favor of event A occurring is:

Odds = P(A)/P(A')=P(A)/[1-P(A)]

- The odds against event A occurring is:

Odds = P(A')/P(A)=[1-P(A)]/P(A)

IV. Contingency tables


- A contingency table is a cross-tabulation of frequencies into rows and columns. A cell shows a
frequency, which is used to report the results of a survey.

- Collect data of 100 cars:


● Each car either has AC or no AC
● Each car either has GPS or no GPS

1. Joint Probabilities
- A joint probability representing the probability of the intersection of two events.
- Found by dividing the cell (except the total row and column) by the total sample size

● P(GPS∩AC) = 35/100 = 0.35

52
● P(No GPS∩AC) = 55/100 = 0.55

2. Marginal probability
- The marginal probability of an event is found by dividing a row or column total by the total
sample size.
● P(AC) = 90/100 = 0,9
● P(GPS) = 40/100 = 0,4

3. Conditional probability
- Found by restricting ourselves to a single row or column of the given condition.

- Dividing the cell by the total of the given condition


● P(GPS|AC) = GPS∩ACAC= 35/90 = 0,3889
● P(NoAC|GPS )= NoAC∩GPS GPS = 5/40 = 0,125

V. Decision tree

53
Given AC or no AC:

Given GPS or no GPS:

54
VI. Bayes’ Theorem
- Bayes’ Theorem is used to revise previously calculated probabilities based on new
information.
- Developed by Thomas Bayes in the 18th Century.
- It is an extension of conditional probability.
- The prior (marginal) probability of an event B is revised after event A has been considered to
yield a posterior (conditional) probability.

- In situations where P(A) is not given, the form of Bayes’ Theorem is: n

General Forms of Bayes’ Theorem

- where:
● Bi = ith event of k mutually exclusive and collectively exhaustive events
● A = new event that might impact P(Bi)

VII. Counting Rules


1. Fundamental Rule of Counting
a. Counting Rule 1:
- If event A can occur in n1 ways and event B can occur in n2 ways, then events A and B can
occur in n1 x n2 ways. In general, the number of ways that m events can occur:

n1n2 ×… ×nm

Example: You want to go to a park, eat at a restaurant, and see a movie. There are 3 parks, 4
restaurants, and 6 movie choices. How many different possible combinations are there?

Answer: (3)(4)(6) = 72 different possibilities

55
b. Counting Rule 2:
- If any one of k different mutually exclusive and collectively exhaustive events can occur on
each of n trials, the number of possible outcomes is equal to kn

Example: If you roll a fair die 3 times then there are 63 = 216 possible outcomes

2. Factorials
- The number of unique ways that n items can be arranged in a particular order: n!, the product
of all integers from 1 to n

n! = 1.2.3…(n-2)(n-1)(n)

Example: You have five books to put on a bookshelf. How many different ways can these
books be placed on the shelf?

Answer: 5! = (5)(4)(3)(2)(1) = 120 different possibilities

3. Permutations
- Choose X items at random without replacement from a group of n items. The number of
ways of arranging X objects selected from n objects in order

Example: You have five books and are going to put three on a bookshelf following alphabetical
order. How many different ways can the books be ordered on the bookshelf?

Answer: different possibilities

56
4.Combinations
- A combination is a collection of X items chosen at random without replacement from n items.
The number of ways of selecting X objects from n objects, irrespective of order, is

Example: You have five books and are going to randomly select three to read regardless of the
order. How many different combinations of books might you select?

Answer: different possibilities

57
SESSION 5: DISCRETE PROBABILITY DISTRIBUTION

I - Discrete probability distributions

1. Random Variables
- A random variable is a function or rule that assigns a numerical value to each outcome in the
sample space of a random experiment.
- A discrete random variable has a countable number (integer) of distinct values.

● Some have a clear upper limit (e.g., number of absences in a class of 40 students)
● Others do not (e.g., number of text messages you receive in a given hour).

- A continuous random variable produces outcomes from a measurement

Example: your annual salary, or your weight/height

2. Probability Distributions
a. Discrete Probability Distribution
- A discrete probability distribution assigns a probability to each value of a discrete random
variable X. The distribution must follow the rules of probability

P(xi) = P(X=xi)

● The probability for any given value of X

0 ≤ 𝑃(𝑥𝑖) ≤ 1

● The sum of all values of X

58
- More than one random variable value can be assigned to the same probability, but one
random variable value cannot have two different probabilities.

b. Cumulative Probability Function


- The Cumulative Probability Function (CDF), denoted F(x0), shows the probability that X is less
than or equal to x

𝐹(𝑥0) = 𝑃(𝑋 ≤ 𝑥0)

Or in other words: 𝐹(𝑥0) = ∑ 𝑃(𝑥) (𝑋≤𝑥0)

II - Expected value and variance

1. Expected Value
- The expected value E(X) (of a discrete random variable) is the sum of all X-values weighted by
their respective probabilities.
- Because E(x) is an average (weighted mean) → E(x) is the mean and uses the symbol μ.

59
2. Variance and Standard Deviation
- The variance Var(X) (of a discrete random variable) is the sum of the squared deviations
about its expected value, weighted by the probability of each X-value.
- Var(X) is a weighted average → measures variability around the mean

- The standard deviation is the square root of the variance and is denoted σ:

60
III - Uniform distribution

1.Characteristics of the Uniform Distribution


- The uniform distribution describes a random variable with a finite number of consecutive
integer values from a to b.
- Each value of the random variable is equally likely to occur.

- Rolling a die, the probability of having 1 side of the die:

61
IV - Binomial distribution

1. Bernoulli Experiments
- A random experiment with only 2 outcomes is a Bernoulli experiment.

● One outcome is labeled a “success” (denoted X = 1) and the other a “failure” (denoted X = 0).
o π is the P(success), 1 – π is the P(failure).
● We assume π < 0.5 for convenience.

2. Possible Bernoulli Settings (extra information)


- A manufacturing plant labels items as either defective or acceptable
- A firm bidding for contracts will either get a contract or not
- A marketing research firm receives survey responses of “yes I will buy” or “no I will not”
- New job applicants either accept the offer or reject it
- We have: P(0) + P(1) = (1 – p) + p = 1 (0 ≤ p ≤ 1)
- The expected value (mean): E(X) = p
- The variance: Var(X) = p(1 - p)

2.Binomial Distribution
- The binomial distribution arises when a Bernoulli experiment is repeated n times.
- In a binomial experiment, X = the number of successes in n trials.

-> Binomial random variable X is the sum of n independent Bernoulli random variables.

- The number of combinations of selecting X objects out of n objects is

62
- P(X = x) is determined by the two parameters n and π. The binomial probability function:

● P(X) = probability of X successes in n trials, with probability of success p on each trial


● X = number of ‘successes’ in sample, (X = 0, 1, 2, ..., n)
● n = sample size (number of trials or observations)
● p = probability of “success”

Example: Flipping a coin

A fair coin is flipped 4 times. What is the probability of getting exactly 2 heads?

Solution: In this case, we have:

● n = 4 (the number of trials)


● p = 0.5 (the probability of success on each trial)
● x = 2 (the number of successes)

The formula for the binomial distribution is:

P(X = x) = nCx * πx * (1- π)(n-x)

where:

● nCx is the number of combinations of n things taken x at a time


● πx is the probability of x successes
● (1- π)(n-x) is the probability of n - x failures

→ Plugging in the values we have, we get: P(X = 2) = 4C2 * (0,5)2 * (1 - 0.5)(4-2)= 6 * 0.25 * 0.25
= 0.375

Therefore, the probability of getting exactly 2 heads is 0.375.

63
3. Binomial Shape
- π < 0.5 : skewed right
- π=0: symmetric
- π > 0.5 : skewed left

V - Hypergeometric distribution

- The hypergeometric distribution is similar to the binomial except that sampling is without
replacement from a finite population of N items
- The trials are not independent and the probability of success is not constant from trial to trial
- Finding probability of “X=xi” items of interest in the sample (n) where there are “s” items of
interest in the population (N)
- The hypergeometric distribution has three parameters:

● N (the number of items in the population)

64
● n (the number of items in the sample)
● s (the number of successes in the population).
- Hypergeometric Distribution Formula

Where

● N = population size
● s = number of items of interest in the population
● N – s = number of events not of interest in the population
● n = sample size
● x = number of items of interest in the sample
● n – x = number of events not of interest in the sample

Example: 3 different computers are selected from 10 in the department. 4 of the 10 computers
have illegal software loaded. What is the probability that 2 of the 3 selected computers have illegal
software loaded?

● N = 10
● s=4
● x=2
-> The probability that 2 of the 3 selected computers have illegal software loaded is 0.30, or 30%.

65
VI - Geometric distribution

- The geometric distribution describes the number of Bernoulli trials until the first success.

● X is the number of trials until the first success.


● X ranges from {1, 2, . . .}
● At least one trial to obtain the first success, but the number of trials is not fixed.
● π is the constant probability of success on each trial.

Example:
Suppose you are playing a game of darts. The probability of hitting the bullseye is 0.2.
What is the probability of hitting the bullseye on your second try?
Solution: In this case, we are interested in the probability of getting a success (hitting
the bullseye) after one failure (missing the bullseye).
The formula for the geometric distribution is: P(X = x) = π(1 - π)(x-1)

66
where:
● P(X = x) is the probability of getting a success on the xth trial.
● π is the probability of success on each individual trial x is the number of trials In
this case, p = 0.2 and k = 2.
→ Plugging these values into the formula, we get: P(X = 2) = 0.2(1 - 0.2)(2-1) = 0.16
Therefore, the probability of hitting the bullseye on your second try is 0.16.

VII - Poisson distribution

- The Poisson distribution describes the number of occurrences within a randomly


chosen unit of time (e.g., minute, hour) or space (e.g., square foot, linear mile).
- The events must occur randomly and independently over a continuum of time or
space

● Use Poisson distribution when


● To count the number of times an event occurs in a given area of
opportunity (time, space, volume…)
● The probability that an event occurs in one area of opportunity is the same
for all areas of opportunity
● The number of events that occur in one area of opportunity is independent
of the number of events that occur in the other areas of opportunity
● The average number of events per unit is λ (lambda)
● Examples of Poisson Distribution
● The number of phone calls received by a call center per hour.
● The number of taxis passing a particular street corner per day.
● The number of computer crashes in a day.
● The number of mosquito bites on a person.

67
a. Poisson Distribution Formula

where:

● x = number of events in an area of opportunity


● λ = expected number of events (average number of events per unit)
● e = base of the natural logarithm system (2.71828...)

Mean Variance Standard Deviation

- Always right-skewed. The larger the l, the less right-skewed the distribution

68
- Poisson Distribution Example

Example: An average number of houses sold per day by a real estate company is 2.
What is the probability that 3 houses will be sold tomorrow?
X = 3; l = 2:

Option: Use the Poisson approximation (as an alternative) to the binomial

- The Poisson distribution may be used to approximate a binomial by setting λ = nπ


- This approximation is helpful when n is large.
- A common rule of thumb says the approximation is adequate if

Option: Use the binomial approximation (as an alternative) to the Hypergeometric

- Both the binomial and hypergeometric involve sample size of n and the number of successes
X.
- The binomial sample is with replacement while the hypergeometric sample is without
replacement
- Rule of Thumb: If n/N < 0.05, we can use the binomial approximation to the hypergeometric,
using sample size n and p = s/N.

Option: Transformation rules

69
- A linear transformation of a random variable X is performed by adding a constant, multiplying
by a constant, or both

- Two useful rules about the mean and variance of a transformed random variable aX + b,
where a and b are any constants (a ≥ 0).

● Example: Professor Hardtack gave a tough exam whose scores had μ = 40 and σ = 10. He
decided to add 20 points to every student’s score → Raise the mean 20 points.
- Rule 1: adding a constant to all X-values will shift the mean but will leave the standard
deviation unchanged. Alternatively, by multiply every exam score by 1.5 (40x1.5 = 60)
- Rule 2 : the standard deviation would rise from 10 to 15 → increasing the dispersion. In
other words, this policy would “spread out” the students’ exam scores. Some scores might even exceed
100.

● Sums of Random Variables


- If we consider the sum of two independent random variables X and Y, given as X + Y, then:

70
● Covariance
- When X and Y are dependent, the covariance of them, denoted by Cov(X,Y) or σxy, describes
how the variables vary in relation to each other.
- Cov(X,Y) > 0 : indicates that the two variables move in the same direction
- Cov(X,Y) < 0 : indicates that the two variables move in opposite direction.
- We use both the covariance and the variances of X and Y to calculate the standard deviation
of the sum of X and Y.

71
SESSION 6: CONTINUOUS PROBABILITY DISTRIBUTION

I - Continuous probability distributions


- A Continuous Variable is a variable that can assume any value in an interval. These can
potentially take on any value, depending only on the ability to measure accurately.
- Examples:
● thickness of an item
● time required to complete a task
● temperature in a room

- Discrete Variable: each value of X has its own probability P(X).


- Continuous Variable: events are intervals and probabilities are areas underneath smooth
curves. A single point has no probability.

1. PDF and CDF of Continuous Distributions


- Probability Density Function (PDF):
● Denoted f(x); must be nonnegative.
● Total area under curve = 1.
● Mean, variance, and shape depend on the PDF parameters

- Cumulative Distribution Function (CDF):


● Denoted F(x).
● Shows P(X ≤ x).
● Useful for finding probabilities.

72
2. Probabilities as Areas
- Continuous probability functions are smooth curves.
● Unlike discrete distributions, the area at any single point = 0.
● The entire area under any PDF must be 1.

- P(a < X < b) is the integral of the probability density function f(x) over the interval from a to b.
Because P(X = a) 5 0 the expression P(a < X < b) is equal to P(a ≤ X ≤ b).

3. Expected Value and Variance


- The mean and variance of a continuous random variable are analogous (similar) to E(X) and
Var(X) for a discrete random variable, except that the integral sign e replaces the summation sign Σ.

73
II - Uniform continuous distribution
1.Characteristics of the Uniform Distribution:
- Uniform continuous distribution has equal probabilities for all possible outcomes of the
random variable. aka rectangular distribution - Denoted U(a, b) for short.
- If X is a random variable that is uniformly distributed between a and b
● PDF constant height: f(x)=1/(b-a)

Area = base x height = (b-a) x 1/(b-a) = 1

● CDF increases linearly to 1

For a ≤ X ≤ b

P(X ≤ x) = (x - a)/(b - a)

74
SUMMARY TABLE

● Example: Using the uniform probability distribution to find P(3 ≤ X ≤ 5):

P(3 ≤ X ≤ 5) = (Base)(Height)=(d-c)/(b-a) = (5-3)/(6-2) = 0.5

III - NORMAL DISTRIBUTION


1. Characteristics of the Normal Distribution
- A normal probability distribution is defined by two parameters, μ and σ. denoted N(μ, σ2).
- Main characteristics of a normal distribution:
● Bell Shaped

● Symmetrical

● Mean= Median = Mode

- Center is determined by the mean, μ


- Spread is determined by the standard deviation, σ

75
- The random variable has an infinite theoretical range: - ∞ to + ∞
- The Normal CDF
● The formula for the normal PDF is

Where

● e = the mathematical constant approximated by 2.71828


● π = the mathematical constant approximated by 3.14159
● μ = the population mean
● σ = the population standard deviation
● x = any value of the continuous variable, −∞ < x < ∞

- Changing μ shifts the distribution left or right.


- Changing σ increases or decreases the spread.

2. The Normal CDF


- For a normal random variable X with mean μ and variance σ2 , i.e., X~N(μ, σ2), the CDF is

3. Finding Normal Probabilities

76
- The probability for a range of values is measured by the area under the curve

IV - Standard normal distribution


1. Characteristics of the Standard Normal
- Any normal distribution can be transformed into the standardized normal distribution (Z),
with mean 0 and standard deviation 1
- By subtracting the mean and dividing by the standard deviation to produce a standardized
variable (Z-score). -> Z-score = how many standard deviations from the mean each value lies.
- The maximum height of f(z) is at 0 (the mean).

77
- The shape of the distribution is unaffected by the z transformation, only the scale changed.
We can express the problem in original units (X) or in standardized units (Z).

78
2. Finding Normal Probabilities

3. Standardized Normal Table


- The Standardized Normal table shows values of the cumulative normal distribution function
- For a given Z-value a , the table shows F(a) (the area under the curve from −∞ to a)
- Normal Areas from Appendix C-1. Appendix C-1 shows areas from 0 to z using increments of
0.01 from z = 0 to z = 3.69 (e.g. z = 1,96 🡪 look up column (vertically) for z = 1,9 then look up for the
top row for z = 0,06
P(0 ≤ z ≤ 2)

79
- Normal Areas from Appendix C-2. Appendix C-2 shows cumulative normal areas from the left
to z.

V - Normal Approximations
1. Normal Approximation to the Binomial
- Normal approximation to the binomial: If n is sufficiently large then we can actually use the
normal distribution to approximate the probabilities related to the binomial distribution

Example: For 4, 16, and 64 flips of a fair coin with X defined as the number of heads in n tries. As
sample size increases, it becomes easier to visualize a smooth, bell-shaped curve overlaid on the bars.

- The logic of this approximation is that as n becomes large, the discrete binomial bars become
more like a smooth, continuous, normal curve
- Rule of thumb: when nπ > 10 and n(1- π) > 10, then it is appropriate to use the normal
approximation to the binomial.

80
- The binomial mean and standard deviation will be equal to the normal µ and σ

2. Normal Approximation to the Poisson


- The normal approximation to the Poisson works best when λ ≥ 20
- Set the normal µ and σ equal to the Poisson mean and standard deviation.

* Tips: Usually this type exercise would tell you to follow a Poisson distribution with mean of X.

Example: On Wednesday between 10 a.m. and noon, customer billing inquiries arrive at a mean rate
of 42 inquiries per hour at Consumers Energy. What is the probability of receiving more than 50 calls?

● We set: μ = λ = 42, σ = λ = 42 = 6.48074


● The continuity-corrected cutoff point for X ≥ 51 is X = 50.5 (halfway between 50 and 51):

. . . 46 47 48 49 50 51 52 53 . . .

● The standardized Z-value for the event “more than 50” is P(X > 50.5) 5 P(Z > 1.31) since

𝑥−μ 50.5 − 42
z= σ
= 6.48074
≃ 1.31

● Using Appendix C-2 we look up P(Z < -1.31) = .0951, which is the same as P(Z > 1.31) because
the normal distribution is symmetric.

VI - Exponential distribution
1.Characteristics of the Exponential Distribution (Usually for continuous variables)
- Often used to model the length of time between two occurrences of an event (the time
between 2 events that happen)

Examples:

81
● Time between trucks arriving at an unloading dock
● Time between transactions at an ATM Machine
● Time between phone calls to the main operator

- Defined by a single parameter, its mean λ (lambda)


- The probability that an arrival time is less than some specified time X is

Where: λ (lambda): mean of arrivals per unit (same as Poisson distribution)

Examples: Between 2 p.m. and 4 p.m. on Wednesday, patient insurance inquiries arrive at Blue
Choice insurance at a mean rate of 2.2 calls per minute. What is the probability of waiting more than
30 seconds for the next call?

● We set λ = 2.2 events/minute and x = 0.5 minute.


● We have:

−λ𝑥 −(2.2)(0.5)
P(X > 0.50) = 𝑒 =𝑒 = .3329, or 33.29%

● There is about a 33 percent chance of waiting more than 30 seconds before the next call
arrives. Since x = 0.50 is a point that has no area in a continuous model, P(X ≥ 0.50) and P(X >
0.50) refer to the same event (unlike, say, a binomial model, in which a point does have a
probability). The probability that 30 seconds or less (0.50 minute) will be needed before the
next call arrives is:

P(X ≤ 0.50) = 1 - .3329 = .6671

VII - Poisson distribution relationship with exponential distribution

- The count of customer arrivals is a discrete random variable: Poisson distribution. When the
count of customer arrivals has a Poisson distribution
● The distribution of the time between two customer arrivals will have an exponential
distribution

82
SUMMARY TABLE

Finding Probability

83
- Probability of waiting more than x
- Probability of waiting less than x

Example: Customers arrive at the service counter at the rate of 20 per hour. What is the
probability that the arrival time between consecutive customers is less than 6 minutes?

● The mean number of arrivals per hour is 20: λ = 20


● 6 minutes is 0.1 hours (X < 0,1)
● P(arrival time < 0.1) = 1 – e-λX = 1 – e-(20)(0.1) = 0.864665
● In Excel: EXPONDIST(0.1, 20,TRUE)
● So there is a 86.47% chance that the arrival time between successive customers is less than 6
minutes

84
SESSION 7: SAMPLING, SAMPLING DISTRIBUTION, CONFIDENCE
INTERVAL

I - Sampling and estimation

- A sample statistic: a random variable whose value depends on items included in the random
sample.
- Some samples may represent the population well, while other samples could differ greatly
from the population (particularly if the sample size is small)
⇒ In larger samples, the sample means would tend to be even closer to μ. This fact is the basis
for statistical estimation.
- To make inferences about a population we must consider four factors:
● Sampling variation (uncontrollable).
● Population variation (uncontrollable).
● Sample size (controllable).
● Desired confidence in the estimate (controllable).

1.Estimators and estimates

- An estimator: a statistic derived from a sample to infer the value of a population parameter.
- An estimate: the value of the estimator in a particular sample.
- Sample estimator of population parameters.

85
Examples of estimators

2. Sampling error

- Sampling error is the difference between an estimate and the corresponding population
parameter (the sampled value and the true population value).
- This is because you can not choose a sample that is perfectly representative of the population.

Sampling error = 𝑋 - μ

3. Properties of Estimators
a. Bias
- The bias is the difference between the expected value of the estimator and the true
parameter.

Bias = E(𝑋)- μ

- An unbiased estimator neither overstates nor understates the true parameter on average.

E(𝑋)= μ

86
- Sample mean (𝑋) and sample proportion (p) are unbiased estimators of μ and π
- Sampling error is an inevitable risk in statistical sampling and random sampling, whereas bias
is systematic
b. Efficiency
- Efficiency refers to the variance of the estimator’s sampling distribution
- Smaller variance means a more efficient estimator. We prefer the minimum variance
estimator

- 𝑋 and s2 are minimum variance estimators of μ and σ2

c. Consistency
- Consistent estimator converges toward the parameter being estimated as the sample size
increases. Which means that the larger the sample, its characteristics will be closer to the whole
population.

87
- The variances of three estimators, 𝑋, s and p diminish as n increases, so all are consistent
estimators.

II - Central limit theorem

- Sampling distribution of an estimator: the probability distribution of all possible values the
statistic may assume when a random sample of size n is taken.
- Central limit theorem establishes that, in many situations, for independent and identically
distributed random variables, the sampling distribution of the standardized sample mean
tends towards the standard normal distribution even if it not originally normally distributed =>
can apply to many problems involving other types of distributions for decreasing the stress on
calculating and analyzing.

- The sample mean is an unbiased estimator for μ:

E(x)= μ (the expected value of mean)

88
- Sampling error of the sample mean - standard error of the mean: described by its standard
deviation:

σ
σ𝑋 =
𝑛

- Three important facts about the sample mean:


● If the population is normal, the sample mean has a normal distribution centered at μ,
with a standard error equal to X=n

● As sample size n increases, the distribution of sample means converges to the


population mean μ (i.e., the standard error of the mean X=n gets smaller).

89
● Even if your population is not normal, by the Central Limit Theorem, if the sample size
is large enough, the sample means will have approximately a normal distribution.

1. Applying the Central Limit Theorem

90
a. Uniform Population
- The Rule of Thumb says that n ≥ 30 is required to ensure a normal distribution for the sample
mean, but actually a much smaller n will suffice if the population is symmetric.
- The Central Limit Theorem predicts:
● The distribution of sample means drawn from the population will be normal.
● The standard error of the sample mean X will decrease as the sample size increases.

b. Skewed Population
- The Central Limit Theorem predicts
● The distribution of sample means drawn from any population will approach normality.
● The standard error of the sample mean X will diminish as sample size increases.
- In highly skewed populations, even n ≥ 30 will not ensure normality, though it is not a bad
rule.
- In severely skewed populations, the mean is a poor measure of center to begin with due to
outliers.
- Histograms of the actual means of many samples drawn from this uniform population.

91
c. Range of Sample Means
- The Central Limit Theorem permits us to define an interval which the sample means are
expected to fall in.
- As long as the sample size n is large enough, we can use the normal distribution regardless of
the population shape (or any n if the population is normal to begin with).

- We use the familiar z-values for the standard normal distribution. If we know μ and σ, the CLT
allows us to predict the range of sample means for samples of size n:

2. Sample size and standard error

- The standard error decreases as n increases


- To halve (÷2) the standard error, you must quadruple (x4) the sample size

92
σ
- You can make the standard error as small as you want by increasing n. The mean X̿ of
𝑛

sample means X converges to the true population mean μ as n increases.

3. Confidence interval for a mean (μ) with known σ


a. What Is a Confidence Interval?
- A sample mean X calculated from a random sample x1, x2, . . . , xn is a point estimate of the
unknown population mean μ
- A confidence interval (CI) is a range of values that represents the likely range of a population
parameter. The CI is created by taking a sample of data from a population and then using statistical
methods to estimate the population parameter. The CI is expressed as a range of values.
- Construct a confidence interval for the unknown mean μ by adding and subtracting a margin
of error from X, the mean of our random sample
- The confidence level for this interval is expressed as a percentage such as 90, 95, or 99
percent → 95% confidence level means that there is a 95% probability that the true value falls within
that range, or “we are 95% certain (confidence level) that most of these samples (confidence
intervals) contain the true population parameter”.

- Confidence level using z

93
b. Choosing a Confidence Level

- In order to gain confidence, we must accept a wider range of possible values for μ. Greater
confidence implies loss of precision (i.e., a greater margin of error)
- Common confidence level

c. When Can We Assume Normality?

- If σ is known and the population is normally distributed, then we can safely construct the
confidence interval for μ.
- If σ is known but we do not know whether the population is normal
- Rule of thumb: n ≥ 30 is sufficient to assume a normal distribution for X (by the CLT) as
long as the population is reasonably symmetric and has no outliers.

4. Confidence interval for a mean (μ) with unknown σ


a. Student’s t Distribution
- When σ is unknown 🡪 the formula for a confidence interval resembles the formula for known
σ except that t replaces z and s replaces σ.

94
- The confidence intervals will be wider (other things being the same) - tα/2 is always greater
than zα/2.

Example:

A random sample of n = 25 has 𝑋 = 50 and S = 8.


Because 25<30 => we have to assume the population is normal => use t-distribution
Form a 95% confidence interval for μ
d.f = n - 1, so t0.025,24 = 2.0639 (0.95 -> alpha/2 = 0,025 -> t of alpha/2 = 2.0639)
The confidence interval is
X + t* S/√n =50-2.0639*S/√n=46.698
→ X - t Sn =50+2.0639*8/√25=53.302

b. Degrees of Freedom
- Knowing the sample size allows us to calculate a parameter called degrees of freedom (d.f)
used to determine the value of the t statistic used in the confidence interval formula.

d.f. = n - 1 (degrees of freedom for a confidence interval for μ)

- The degree of freedom (d.f) is the number of observations that are free to vary after sample
mean has been calculated.
- They depend on the sample size (n) and the number of restrictions (usually 1).

95
c. Comparison of z and t

- As degrees of freedom increase, the t-values approach the familiar to normal z-values.

d. Outliers and Messy Data

- The original population is assumed normal.


- Confidence intervals using Student’s t are reliable as long as the population is not badly
skewed and if the sample size (n) is too small and the variance of data (s2) is unknown
- In statistics, the t-distribution is most often used to:
+ Find the critical values for a confidence interval when the data is approximately normally
distributed.
+ Find the corresponding p-value from a statistical test that uses the t-distribution (t-tests,
regression analysis).

5. Confidence interval for a proportion (π)

- The distribution of a sample proportion p = x/n tends toward normality as n increases.

- Standard error σp will decrease as n increases like the standard error for 𝑋. We say that p = x/n
is a consistent estimator of π.

96
a. Standard Error of the Proportion
- The standard error of the proportion is denoted σp -

- σ𝑝 will be largest when the population proportion is near π = .50 and becoming smallest

when π is near 0 or 1.
b. Confidence Interval for π

c. Narrowing the Interval


- The width of the confidence interval for π depends on
● Sample size n
● Confidence level
● Sample proportion p
- A narrower interval (i.e., more precision) increase the sample size or reduce the confidence
level (e.g., from 95 percent to 90 percent)
d. Polls and Margin of Error
- In polls and survey research, the margin of error is typically based on a 95 percent confidence
level and the initial assumption that π = .50
- Each reduction in the margin of error requires a disproportionately larger sample size

e. Rule of Three
- If in n independent trials, no events occur, the upper 95% confidence bound is approximately
3/n

97
- Example: When proofreading, you read about 50 pages and do not found any mistake, then
you can estimate the chance you find a mistake in the next 50 pages is 3/50
6. Sample size determination for a mean
a. Sample Size to Estimate μ

b. Estimate σ
- Method 1: Take a Preliminary Sample
● Take a small preliminary sample and use the sample estimates in place of σ. This method is
the most common, though its logic is somewhat circular (i.e., take a sample to plan a sample).
- Method 2: Assume Uniform Population
● Estimate upper and lower limits a and b and set σ = [(b - a)2/12 ]1/2 .
- Method 3: Assume Normal Population
● Estimate upper and lower bounds a and b and set σ = (b - a)/6. This assumes norma‫ג‬lity with
most of the data within μ + 3σ and μ - 3σ the range is 6σ
- Method 4: Poisson Arrivals
● In the special case when λ is a Poisson arrival rate, then σ=λ .

7. Sample size determination for a proportion

- The formula for the required sample size for a proportion

2 π (λ−π)
n= 𝑧σ/2 × 2
𝐸

8. Confidence interval for a population variance σ2


a. Chi-Square Distribution
- The chi-square test (χ²) is a statistical hypothesis test commonly used to assess the
association between two categorical variables. It determines whether the observed

98
frequencies of outcomes in a contingency table differ significantly from the expected
frequencies.
- Note: Chi-square only theoretically applies for hypothesis testing but in reality there are not
many practical applications.
- If the population is normal, construct a confidence interval for the population variance σ2
using the chi-square distribution with degrees of freedom equal to d.f. = n – 1
- Lower-tail and upper-tail percentiles for the chi-square distribution (denoted XL2and XU2) can
be found in Appendix E.

99
EXERCISE

1. The width of a confidence interval for μ is not affected by


A. the sample size.
B. the confidence level.
C. the standard deviation.
D. the sample mean.
CI = X ± z x (s/√n)

2. If a normal population has parameters μ = 40 and σ = 8, then for a sample size n = 4


A. the standard error of the sample mean is approximately 2.
B. the standard error of the sample mean is approximately 4.
C. the standard error of the sample mean is approximately 8.
D. the standard error of the sample mean is approximately 10.

We are given the following information:


Mean μ = 40,
Standard deviation σ = 8,
Sample size n = 4,
8/√4=2
→ The standard error of the sample mean is equal to 4

3. What is the approximate width of an 80% confidence interval for the true population
proportion if there are 12 successes in a sample of 80?
A. ± .078
B. ± .066
C. ± .051
D. ± .094

100
4. A highway inspector needs an estimate of the mean weight of trucks crossing a bridge on the
Interstate highway system. She selects a random sample of 49 trucks and finds a mean of 15.8 tons
with a sample standard deviation of 3.85 tons. The 90 percent confidence interval for the
population mean is
A. 14.72 to 16.88 tons.
B. 14.90 to 16.70 tons.
C. 14.69 to 16.91 tons.
D. 14.88 to 16.72 tons.

Degrees of freedom = df = n - 1 = 49 - 1 = 48
At 90% confidence level the t is ,
α = 1 - 90% = 1 - 0.90 = 0.1
α / 2 = 0.1/ 2 = 0.05
tα /2,df = t 0.05,48 = 1.677
Margin of error = E = tα /2,df* (s /√n)
= 1.677* ( 3.85/√49)
= 0.92235
Margin of error = E = 0.92
The 90% confidence interval estimate of the population mean is,
x-E<μ< x+E
15.8-0.92 < μ < 15.8+0.92
14.88 < μ < 16.72
(14.88,16.72)
The 90% confidence interval estimate of the population mean is : 14.88 to 16.72

101
5. Last week, 108 cars received parking violations in the main university parking lot. Of these, 27
had unpaid parking tickets from a previous violation. Assuming that last week was a random sample
of all parking violators, find the 95 percent confidence interval for the percentage of parking violators
that have prior unpaid parking tickets.
A. 18.1% to 31.9%
B. 16.8% to 33.2%
C. 15.3% to 34.7%
D. 19.5% to 30.5%
Point estimate = sample proportion = p = x / n = 27 /108=0.25
1 - p = 1 - 0.25=0.75
At 90% confidence level
α = 1 - 90%
α = 1 - 0.90 =0.10
α/2 = 0.05
Zα/2 = Z0.05 = 1.645
Margin of error = E = Zα / 2 * √((p * (1 - p)) / n)
= 1.645 (√((0.25*0.75) / 108)
=0.0685
A 90% confidence interval for population proportion p is ,
p-E<p<p+E
0.25 -0.0685 < p < 0.25 + 0.0685
0.1815< p < 0.3185
The 90% confidence interval for the population proportion p is : 18.1% and 31.9%

102
SESSION 8: ONE-SAMPLE HYPOTHESIS TESTS

I - Logic of hypothesis testing


1. Hypothesis
- A hypothesis is a claim (assertion) about a population parameter. It can be supported or
rejected by empirical evidence.
- Population mean: often denoted by the symbol μ (mu), is a measure of central tendency that
represents the average value of a variable in an entire population. The population mean is calculated
by summing up all the values in the population and dividing the sum by the total number of elements
in the population.

- Population proportion: often denoted by the symbol π (pi), is a statistical measure that
represents the proportion or percentage of a specific characteristic within an entire population. The
population proportion is calculated by dividing the number of elements in the population with the
specific characteristic by the total number of elements in the population

a. The null hypothesis H0 (maintain hypothesis): states the assertion to be tested


1. Always contains “=” sign.
2. Always about a population parameter.
3. Never about a sample statistic.

b. The Alternative Hypothesis H1: the opposite of the null hypothesis


c. Statistical Hypothesis: A statistical hypothesis is a mathematical statement about a population
parameter that can be tested using data.
d. Hypothesis Test: A hypothesis test is a way of testing a claim or assumption about a population
parameter using statistical evidence from a sample.

103
- Some examples of hypothesis testing are:
● Testing whether the average height of men is different from the average height of women.
● Testing whether the proportion of voters who prefer candidate A is greater than 50%.
● Testing whether there is a positive correlation between income and education level.
● Testing whether the mean blood pressure of patients who receive a new drug is lower than
the mean blood pressure of patients who receive a placebo.
● Testing whether there is a relationship between gender and academic performance.

2. Errors in hypothesis tests


a. Type I Error (false positive)
- Reject a true null hypothesis
- A serious type of error
- The probability of Type I Error is α (level of significance)
Example:
Scenario: Testing the Effectiveness of Drug X
H0: Drug X has no effect (no difference in effectiveness compared to a placebo).
H1: Drug X is effective (there is a significant difference compared to a placebo).
Suppose Drug X has no actual effect (H0 is true), but the statistical test indicates a significant
difference. The researcher concludes that Drug X is effective when it's not ➔ Type I Error.
Outcome: Patients might be prescribed a drug that is not effective, leading to unnecessary costs and
potential side effects.
b. Type II Error (false negative)
- Don’t reject something false
- The probability of Type II Error is β (depends on alpha and sample size)
Example:
As the above case, but now the researcher supposes that Drug X is indeed effective (H1 is true), but
the statistical test does not show a significant difference. The researcher fails to recognize the
effectiveness of the drug. ➔ Type II Error.
Outcome: Patients might miss out on a potentially beneficial treatment, and the drug may not be
adopted when it could have been useful.

104
c. Relationship between the two types of error
- The relationship between Type I Error and Type II Error is that they are inversely related. That
is, if you decrease the probability of one type of error, you increase the probability of the other type
of error, and vice versa.
- This is because the two types of errors depend on the same factors, such as the sample size,
the effect size, and the variability of the data. Therefore, there is a trade-off between minimizing
Type I Error and minimizing Type II Error. You have to balance the risks and consequences of both
types of errors and choose an appropriate level of significance and power for your test.

3. Critical value
a. Statistical Hypothesis
- Sample mean x̄ is close to the stated population mean μ
➔ H0 is not rejected. And vice versa.
- The question: How "close" is enough to conclude the hypothesis?
➔ It is based on the Critical Value.

105
b. Decision Rule

- A decision rule usually specifies a test statistic and a critical value (or a critical region) that
determines the rejection or acceptance of the null hypothesis.
- The test statistic is a numerical summary of the data that reflects the strength of the evidence
against the null hypothesis.
- The critical value (or region) is a threshold that defines the boundary between rejecting and
failing to reject the null hypothesis.
- Depending on the type of test statistic and the alternative hypothesis, there are different
types of decision rules:

● Upper-tailed test: Reject the null hypothesis if the test statistic is greater than the critical
value.
● Lower-tailed test: Reject the null hypothesis if the test statistic is less than the critical value.
● Two-tailed test: Reject the null hypothesis if the test statistic is either greater than the upper
critical value or less than the lower critical value.

- The decision rule is based on a chosen significance level, which is the probability of rejecting
the null hypothesis when it is true. The significance level is usually denoted by α, and it determines
the size of the critical region. A smaller α means a more stringent decision rule, and a larger α means
a more lenient decision rule.

For example, when the significance level is 0.05, the confidence interval will be 90%. Similarly,
they might be 0.01 and 99%, 0.025 and 95%, respectively.

106
4. Hypothesis tests: Testing a mean

- There are two cases when we test a mean:


● σ is known: the test statistic is a z score
● σ is unknown: the test statistic is a t score

- There are 5 main steps to test a hypothesis:


● Step 1: State the hypothesis
+ Depending on the alternative hypothesis operator, greater than operator will be a right-tailed
test (also called upper-tail test), less than operator is a left-tailed test (also called lower-tail test), and
not equal operator is a two-tailed test.
+ Example:
H0: μ ≤ 3
H1: μ > 3
=> Right-tailed test/ Upper-tail test

107
● Step 2: Specify the decision rule
+ We use the level of significance (α) to find the critical value of the test statistic that
determines the threshold for rejecting the null hypothesis.
+ Example: A right-tailed test with α = .05 and known σ, then the critical value of z will be 1.645,
therefore our decision rule is:
Reject H0 if z > 1.645
Otherwise do not reject H0

● Step 3: Calculate the test statistic


+ Use the formula for test statistic (z or t) to calculate it.
● Step 4: Make the decision
+ If the test statistic falls in the rejection region, then we reject the null hypothesis H0, or we can
say that H0 is statistically significant.
+ If not, then we fail to reject the null.
● Step 5: Conclusion
+ Based on the result of the hypothesis test, we can make the conclusion and make some
recommendations (if any).
5. P- value approach to testing
- The p-value (observed level of significance): Probability of obtaining a test statistic more
extreme ( ≤ or ≥ ) than the observed sample value given H0 is true.
● Smallest value of α for which H0 can be rejected
● Sample Statistic (eg: X ) ---> Test Statistic (eg: Z statistic)

108
● Compare the p-value with α
+ If p-value < α , reject H0
+ If p-value ≥ α , do not reject H0

6. Connection to confidence interval


- Confidence interval is applied for two-tailed test only

- The below example illustrates how to apply this:

7. Hypothesis tests for proportion


- Involve categorical variable
- 2 possible outcomes: (doesn’t) possess characteristics of interest
Example:
A researcher believes that 50% of first-time brides in the country X are younger than their grooms.
He/She can perform a hypothesis test for proportion to determine if the percentage is the same or
different from 50%.
- Proportion of the population in the category of interest: π

109
- Sample proportion in the category of interest: p

- X and n - X are at least 5 => p can be approximated by normal distribution with mean and
standard deviation calculated as following:

- Formula to calculate Z:
● P is approximately normal:

● An equivalent form (in terms of the number in the category of interest, X):

110
Sample Exercises
1. “I believe your airplane’s engine is sound,” states the mechanic. “I’ve been over it carefully,
and can’t see anything wrong, I’d be happy to tear the engine down completely for an internal
inspection at a cost of $1,500. But I believe that roughness you heard in the engine on your last
flight was probably just a bit of water in the fuel, which passed harmlessly through the engine and
is now gone”. As the pilot considers the mechanic’s hypothesis, the cost of Type I error is
A. The pilot will experience the thrill of no-engine flight.
B. The pilot will be out $1,500 unnecessarily.
C. The mechanic will lose a good customer.
D. Impossible to determine without knowing a.

Answer: B

Solution: The cost of a Type I error is the cost of rejecting a true null hypothesis. In this scenario,
a Type I error would occur if the pilot decides to pay $1,500 for the internal inspection, even though
the engine is sound and the roughness was caused by water in the fuel. The cost of this error would
be $1,500 unnecessarily.

2. Guidelines for the Jolly Blue Giant Health Insurance Company say that the average
hospitalization for a triple hernia operation should not exceed 30 hours. A diligent auditor studied
records of 16 randomly chosen triple hernia operations at Hackmore Hospital and found a mean
hospital stay of 40 hours with a standard deviation of 20 hours. “Aha!” she cried, “the average stay
exceeds the guideline.” State her hypothesis and test it at ∝ = .025

Hint:
To test this hypothesis, we can use a one-tailed t-test with a significance level of ∝ = .025. The
null hypothesis is that the true mean hospital stay is equal to or less than 30 hours, while the
alternative hypothesis is that the true mean hospital stay is greater than 30 hours.
Calculate the test statistic, then compare it to the critical value, we can conclude that the
hypothesis is accepted or rejected statistically:
● test statistic: t = (40-30)/(20/√16) = 2

111
● critical value: ∝ = .025, df = n - 1 = 16 - 1 = 15 => critical value = 2.131

Answer: We fail to reject the null hypothesis. There is not enough evidence at the 0.025
significance level to conclude that the average hospitalization for a triple hernia operation exceeds 30
hours based on the sample data.

3/ In the nation of Gondor, the EPA requires that half the new cars sold will meet a certain
particulate emission standard a year later. A sample of 64 one-year-old cars revealed that only 24
met the particulate emission standard. Test the hypotheses to see whether the proportion is below
the requirement.

Hint:
To test this hypothesis, we can use a one-tailed z-test with a significance level of ∝ = .05. The null
hypothesis is that the true proportion of cars meeting the standard is equal to or greater than 0.5,
while the alternative hypothesis is that the true proportion is less than 0.5.
Calculate the test statistic (using the formula for proportion), then compare it to the critical value,
we can conclude that the hypothesis is accepted or rejected statistically.
● test statistic (z score): z = (24/64 - 0.5)/√(0.5x(1-0.5)/64) = -4
● critical value = -1.645 (∝ = .05)

Answer: The null hypothesis is rejected, we can conclude that there is sufficient evidence to
suggest that the proportion of cars meeting the particulate emission standard in Gondor is less than
0.5, as required by the EPA.

112
SESSION 9: TWO-SAMPLE HYPOTHESIS TESTS

I- Independent samples
- Independent samples are often used to test the difference between two population means or
proportions

- Assumptions:
● Samples are randomly and independently draw
● Populations are normally distributed or both sample sizes are at least 30

- There are 3 major situations:

1. Situation 1

Example: You want to compare the average hours of sleep between students from two different
schools. You know the population variances for both schools.

113
2. Situation 2

Example: You want to compare the average study hours per week of two different groups of
students. You don't know the population variances, but you assume they are equal.

114
3. Situation 3

Example: You want to assess whether there's a difference in the average grades of two different
courses. You don't know the population variances, and you cannot assume them to be equal.

Note: If the sample sizes are equal, the Case 2 and Case 3 test statistics will always be identical,
but the degrees of freedom (and hence the critical values) may differ. If you have no information
about the population variances, then the best choice is Case 3.

4. The paired difference test


- When observations are matched pairs, the paired t-test is more powerful because it utilizes
information that is ignored if we treat the samples separately.
Example:
- Suppose you are conducting a study to assess the effectiveness of a new drug to treat
hypertension. You measure the blood pressure of each participant before administering the
drug and then again after the treatment period.

115
- Now, you have two sets of observations: the blood pressure measurements before and after
treatment for each individual. These measurements are paired because each "after"
measurement is related to a specific "before" measurement.
➔ Paired t-test

116
5. Population proportion test
- This is used to test the hypothesis of confidence interval (the difference of two population
proportion π1 – π2 )

117
6. Population variance test

- Where to find F critical? F table (Appendix F)


- The 2 d.f required:
● Numerator: later sample variance (column in F table)
● Denominator (row in F table)

118
119
Sample Exercises
1. Two well-known aviation training schools are being compared using random samples of their
graduates. It is found that 70 of 140 graduates of Fly-More Academy passed their FAA exams on the
first try, compared with 104 of 260 graduates of Blue Yonder Institute. Test the pass rates for
equality at ∝ = .05

Hint:
Firstly, calculate the pooled sample proportion (p) under the assumption that the pass rates are
equal, then calculate the test statistic (z). Compare it to the critical value and conclude that the
hypothesis is rejected or not.

2. Suppose you are working for a fitness company, and you want to determine if a new workout
program is more effective in terms of weight loss compared to the company's previous program.
You select a sample of 20 participants who have completed both programs, and you record the
weight loss (in pounds) for each participant before and after completing both programs. The data is
as follows:
Before Program A: 190, 195, 200, 185, 205, 210, 192, 198, 187, 193, 199, 204, 189, 194, 201,
188, 203, 197, 191, 206
After Program A: 180, 187, 195, 178, 198, 205, 185, 190, 180, 182, 192, 199, 178, 183, 197, 175,
198, 191, 177, 201
Perform a paired-samples t-test to determine if there is a statistically significant difference in
weight loss before and after completing Program A.

Hint:
Step 1: Calculate the differences between the "Before" and "After" weights for each participant.
Step 2: Calculate the mean (average) of the differences.
Step 3: Calculate the standard deviation of the differences. You can use the formula for the
sample standard deviation.
Step 4: Calculate the t-statistic.
Step 5: Determine the degrees of freedom (df). In a paired-samples t-test, df is equal to the
number of pairs minus 1.

120
Step 6: Find the critical t-value for your desired level of significance (alpha). Let's assume a 0.05
level of significance (95% confidence level) for a two-tailed test. You can use a t-table or a calculator
to find the critical t-value.
Step 7: Compare the calculated t-statistic to the critical t-value to determine statistical
significance.

Answer:
Conclusion: Since the calculated t-statistic falls outside the range of the critical t-values, we can
reject the null hypothesis. This means there is a statistically significant difference in weight loss before
and after completing Program A. In other words, the new workout program appears to be more
effective in terms of weight loss compared to the previous program.

121
SESSION 10: ANOVA

I - Analysis of variance
1. Purpose of analysis of variance
- Compare more than two means simultaneously and how to trace sources of variation to
potential explanatory factors by using analysis of variance (commonly referred to as ANOVA).
- Analysis of variance seeks to identify sources of variation in a numerical dependent variable Y
(the response variable). Variation in the response variable about its mean either is explained by one
or more categorical independent variables (the factors) or is unexplained (random error)
- Note: When conducting multiple t-tests, the probability of making a Type I error (incorrectly
rejecting a true null hypothesis) increases with the number of tests. ANOVA controls the overall Type I
error rate, reducing the risk of false positives when comparing multiple groups.

2. One-Factor ANOVA (Included only One Factor)

122
3. N-Factor ANOVA (Involved Two or more Factors)

4. Treatment
- Each possible value of a factor or combination of factors is a treatment.
- Test if each factor has a significant effect on Y:
H0: μ1 = μ2 = μ3
H1: Not all the means are equal
If we cannot reject H0, we conclude that observations within each treatment have the same
mean μ.
Note: If : μ1 = μ2 ≠ μ3 we also reject H0
5. ANOVA Assumptions (Only used for One-factor/ One way ANOVA)
- Independence: The observations in each group are independent of each other and the
observations within groups were obtained by a random sample.
- Normality: Each sample was drawn from a normally distributed population
- Homogeneity of Variances / Homoskedascity: The variances of the populations that the
samples come from are equal

II - One-factor anova
1. Purpose: Compares the means of c treatments (groups)
Example: Paint quality is a major concern of car makers. A key characteristic of paint is its viscosity, a
continuous numerical variable. Viscosity is to be tested for dependence on application, temperature
(low, medium, high), as illustrated in Figure 11.4. Although temperature is a numerical variable, it has
been coded into categories that represent the test conditions of the experiment because the car
maker did not want to assume that viscosity was linearly related to temperature.

123
2. Data Format
- Sample sizes within each treatment do not need to be equal.
- Total number of observations: n = n1 + n2 + n3 + … + nc
- Advantages to have balanced sample sizes:
1. Equal sample size ensures that each treatment contributes equally to the analysis;
2. Reduces problems arising from violations of the assumptions (e.g., nonindependent Y values,
unequal variances or nonidentical distributions within treatments, or non-normality of Y; and
3. Increases the power of the test (i.e., the ability of the test to detect differences in treatment
means).
- Example of one factor: Is there a difference between income of students from 3 schools
T1 = UEH ; T2 = UAH ; T3 = IU
y11: student 1 (UEH)
y21: student 2 (UEH)
y12: student 1 (UAH)
y22: student 2 (UAH)

➔ If the income of all student are the same ➔ The ȳ is not affected
➔ If the income of all student are not the same ➔ The ȳ is affected
3. Hypothesis to Be Tested
- The question of interest is whether the mean of Y varies from treatment to treatment. The
hypotheses to be tested are:

124
4. One-Factor ANOVA as a Linear Model
- An equivalent way to express the one-factor model is to say that observations in treatment j
came from a population with a common mean (μ) plus a treatment effect (Tj) plus random error (εij):

● yij: observation
● μ :common mean
● Tj: treatment effect
● eij: random effect
- The random error is assumed to be normally distributed with zero mean and the same
variance for all treatments.
- Testing hypotheses:

● If the null hypothesis is true (Tj = 0 for all j), then knowing that an observation x came from
treatment j does not help explain the variation in Y and the ANOVA model become:

● If the null hypothesis is false, then at least some of the Tj must be nonzero. In that case, the Tj
that are negative (below μ) must be offset by the Tj that are positive (above μ ) when weighted by
sample size.

125
III - Decomposition of variation
1. Group Means
- The mean of each group is calculated in the usual way by summing the observations in the
treatment and dividing by the sample size:

- The overall sample mean or grand mean y can be calculated either by summing all the
observations and dividing by n or by taking a weighted average of the c sample means:

IV - Partition of deviations
1. Partition Of Deviations
- Any deviation of an observation from the grand mean ȳ may be expressed in two parts: the
variation in yij = the variation of the predicted scores + the variation of the errors of prediction

126
2. Hypothesis Testing
- SSA and SSE are used to test the hypothesis of equal means by dividing each sum of squares its
degrees of freedom.
- These ratios are called Mean Squares (MSA and MSE)

V - Test statistic
1. F statistic
- F statistic: Ratio of variance due to treatments (MSA) and to error (MSE) -> find out if the
means between two populations are significantly different or not.
- When F is near zero ➔ little difference among treatments ➔ not reject H0
- Decision Rule: Reject H0 if F > Fa, otherwise do not reject
- Note: If the result is significant (reject the null hypothesis) -> Check the P- value of the
individual variable since the F-test can only test the overall effect on Y. We can test P-value to examine
the effect of each variable; otherwise the F-test result might be caused by joint-effects of all variables.

2. The Tuckey’s Test


- When? ➔ After rejection of equal means in ANOVA
- Tells which population means are significantly different
Ex: μ1 = μ2 >< μ3
- Tukey's studentized range test is a multiple comparison test
➔ For c groups, there are c(c-1)/2 distinct pairs of means to be compared
- Tukey 's Test is a two - tailed test for equality of paired means from c groups compared
simultaneously

127
- After running ANOVA, the result indicates that the means of sales revenue for the three stores
A, B, and C are different. Therefore, we reject the null hypothesis and conclude that the mean sales
revenues for these three stores are different. We then use Tukey's test to examine each pair of stores
(AB, BC, CA). Suppose the result shows that A is not significantly different from B (A # B = C). In this
case, we would modify our conclusion to state that only Store A has a statistically different mean sales
revenue, while the differences between Stores B and C are not statistically significant.

Example: Cupertino, San Jose, Santa Clara are restaurants that sell tofu pizza. We want to
examine if there was a difference in means among the three restaurants.

( In reviewing the graph of the sample means, it appears that Santa Clara has a much higher number
of sales than Cupertino and San Jose. There will be three pairwise post‐hoc tests to run.)
Solutions:
- The hypothesis are
H0a: μ1 = μ2 ; H1a: μ1 ≠ μ2
H0b: μ1 = μ3 ; H1b: μ1 ≠ μ3
H0b: μ2 = μ3 ; H1b: μ2 ≠ μ3

128
- These three tests will be conducted with an overall significance level of
- 𝛼 = 5%.
- Here are the differences of the sample means for each pair ranked from lowest to highest:

- The HSD critical values (using statistical software) for this particular test:
- HSDcrit at 5% significance level = 1.85
- HSDcrit at 1% significance level = 2.51
- For each test, reject
- 𝐻𝑜 f the difference of means is greater than HSDcrit
- Test 2 and Test 3 show significantly different means at both the 1% and 5% level.
- Conclusion: Santa Clara has a significantly higher mean number of tofu pizzas sold compared
to both San Jose and Cupertino. There is no significant difference in mean sales between San
Jose and Cupertino.

- The hypothesis are:

- Decision Rule:

T(c,n-c) is a critical value of the Tukey’s Test statistic T(calc) for the desired level of significance.

129
3. Hartley’s Test
- Hartley’s test is used to test for Homogeneity of Variances (Homoskedascity)

- The test statistic is the ratio of the largest sample variance to the smallest sample variance:

- Decision Rule: Reject H0 if Hcalc > Hcritical


- Hcritical can be found in Hartley's test statistics, using df1 = c, df2 = n/c-1

4. Levents’ Test
- Alternative to Hartley's F test. It is also used to test Homogeneity. Both Hartley and Levent
tests should be performed prior to a one-way ANOVA to ensure that ANOVA assumptions are met.
- Levene's test does not assume a normal distribution.
- Based on the distances of the observations from their sample medians rather than
their sample means.

130
Sample Exercises
1. Using the following Excel results:
(a) What was the overall sample size?
(b) How many groups were there?
(c) Write the hypotheses. (d) Find the critical value of F for α = .10.
(e) Calculate the test statistic.
(f ) Do the population means differ at α = .10?

a) Sample size = Total df +1 = 39 + 1 = 40

b) Total Group = Group df +1 = 4 + 1 = 5

c) Hypothesis:

H0: μ1 = μ2 = μ3 = μ4 = μ5

Ha: At least one means is different from the other.

d) Critical value

We use F-test; with alpha = 0.05. We use appendix F in the textbook. With the df1 = 4 (the
column); df2 = 35 (the row)

e)

Test statistic:

131
f) Since f = 1.8 < 2.64

We fail to reject the null hypothesis at 0.05 level.

So population means do not differ.

132
SESSION 11: SIMPLE REGRESSION

I - Visual displays and correlation analysis


1. Visual Displays
- Analysis of bivariate data (i.e., two variables) typically begins with a scatter plot that displays
each observed data pair (𝑥𝑖, 𝑦𝑖) as a dot on an X-Y grid.

➔ initial idea of the relationship between two random variables.

2. Correlation Coefficient
- Sample correlation coefficient (Pearson correlation coefficient) - denoted r - measures the
degree of linearity in the relationship between two random variables X and Y.
- Its value will fall in the interval [-1, 1].

- When r is near 0 there is little or no linear relationship between X and Y.


- An r-value near 1 indicates a strong positive relationship.
- An r-value near -1 indicates a strong negative relationship.
- An r-value = 1: one variable increases, the other tends to increase.
- An r-value = -1: one variable increases, the other tends to decrease.
=> An r-value = 1 or r-value = -1 is a perfect linear relationship.

133
- Negative correlation:
● 𝑥𝑖 is above its mean

● 𝑦𝑖 is below its mean

- Positive correlation: 𝑥𝑖 and 𝑦𝑖 are above/below their means at the same time

- Three terms called sums of squares:


● The sum of squares measures the deviation of data points away from the mean value.
● Investors can use the sum of squares to help make better decisions about their
investments.

134
- The formula for the sample correlation coefficient

Correlation coefficient only measures the degree of linear relationship between X and Y.

Figure: Scatter Plots Showing Various Correlation Coefficient Values (n = 100)


3. Critical Value for Correlation Coefficient
- Equivalent approach → Calculate a critical value for the correlation coefficient
- First: look up the critical value of t from Appendix D with d.f. = n - 2 degrees of freedom and
chosen α
- Then, the critical value of the correlation coefficient, r-critical:

● A benchmark for the correlation coefficient.


● No p-value.
● Inflexible when changing α.

135
● In very large samples, even very small correlations could be “significant”.
● A larger sample does not mean that the correlation is stronger nor does its increased
significance imply increased importance.

4. Tests for Significant Correlation Using Student’s t


- Example: In its admission decision process, a university’s MBA program examines an
applicant’s score on the GMAT (Graduate Management Aptitude Test), which has both verbal and
quantitative components. Figure A shows the scatter plot with the sample correlation coefficient for
30 MBA applicants randomly chosen from 1,961 MBA applicant records at a public university in the
Midwest. Is the correlation (r =.4356) between verbal and quantitative GMAT scores statistically
significant? It is not clear from the scatter plot shown in Figure A that there is a statistically significant
linear relationship.
Figure A: Scatter Plot for 30 MBA Applicants

- Step 1: State the Hypotheses


● We will use a two-tailed test for significance at α = .05. The hypotheses are:
+ H0: ρ = 0
+ H0: ρ ≠ 0
- Step 2: Specify the Decision Rule
● For a two-tailed test using d.f = n - 2 = 30 - 2 = 28 degrees of freedom, Appendix D gives t.025=
2.048. The decision rule is: reject H0 if tcacl > 2.048 or if tcacl < -2.048.
- Step 3: Calculate the Test Statistic

136
● To calculate the test statistic, we first need to calculate the value for r. Using Excel’s function
=CORREL(array1,array2), we find r = .4356 for the variables Quant GMAT and Verbal GMAT. We must
then calculate tcacl.

𝑛−2 30−2
𝑡𝑐𝑎𝑐𝑙 = r 2 = .4356 2 = 2.561
1−𝑟 1−(.4356)

- Step 4: Make a Decision


● The test statistic value (tcacl = 2.561) exceeds the critical value t.025= 2.048, so we reject the
hypothesis of zero correlation at α = .05. We can also find the p-value using the Excel function
=T.DIST.2T(t,deg_freedom). The two-tailed p-value for GMAT score is =T.DIST.2T(2.561,28) = .0161. We
would reject ρ = 0 since the p-value < .05.
- Step 5: Take Action
● The admissions officers recognize that these scores tend to vary together for applicants.

II - Simple regression
1. What is Simple Regression?
- The simple linear model in slope-intercept form: Y = slope × X + y-intercept. In statistics, this
straight-line model is referred as a simple regression equation.
● The Y variable as the response variable (the dependent variable)
● The X variable as the predictor variable (the independent variable)
- Only the dependent variable (not the independent variable) is treated as a random variable
2. Interpreting an Estimated Regression Equation
- Cause and effect are not proven by a simple regression
➔ cannot assume that the explanatory variable is “causing” the variation in the response
variable
● Example: There are 2 scenarios to consider
+ Third-variable problem: A third variable can influence both variables under study, making
them appear causally linked when they are not. For example, vitamin D and bone thinning
are closely correlated, but they are not directly causing each other. Instead, calcium, a
third variable, affects both variables independently. In this case, concluding a causal
relationship would be a research bias.

137
+ Directionality problem: Both variables A and B could have a causal relationship, but the
direction of influence is unclear. For example, vitamin D levels and depression are
correlated, but it's not clear whether low vitamin D causes depression, or whether
depression causes people to consume less vitamin D.

3. Prediction Using Regression


- Predictions from our fitted regression model are stronger within the range of our sample x
values.
- The relationship seen in the scatter plot may not be true for values far outside our observed x
range.
- Extrapolation outside the observed range of x is always tempting but should be cautiously
approached.
- Example: Businesses often use linear regression to understand the relationship between
advertising spending and revenue.

III - Regression models


1. Model and Parameters
- The regression model’s unknown population parameters are denoted by β0 (the intercept) and
β1 (the slope).

y = β0 + β1X+ ε (population regression model)

- Inclusion of a random error ε is necessary because other unspecified variables also may affect
Y
- The regression model without the error term represents the expected value of Y for a given x
value called simple regression equation

138
E(Y|x) = β0 + β1X (simple regression equation)

- The regression assumptions:


● Assumption 1: The errors are normally distributed with mean u and standard deviation σ.
● Assumption 2: The errors have constant variance, σ2.
● Assumption 3: The errors are independent of each other.
● Assumption 4: There is a linear relationship between the independent variable(s) x and the
dependent variable y.

- The regression equation used to predict the expected value of Y for a given value of X:

y = 𝑏0+ 𝑏1x (estimated regression equation)

● b0 (coefficients): estimated intercept


● b1: estimated slope
- The difference between the observed value 𝑦𝑖 and its estimated value y: a residual -еi, the

residual is the vertical distance between each 𝑦𝑖 and the estimated regression line on a scatter plot of

(𝑥𝑖,𝑦𝑖) values.

139
IV - Ordinary least squares formulas
1. Slope and Intercept
- The ordinary least squares method (OLS method): estimate a regression so as to ensure the
best fit
➔ selected the slope and intercept → residuals are as small as possible -> create a straight line as
close as possible to your data points.
- Residuals can be either positive or negative, and residuals around the regression line always
sum to zero

- The fitted coefficients b0 and b1 are chosen so that the fitted linear model y =b0+b1x has the
smallest possible sum of squared residuals (SSE):

- Differential calculus used to obtain the coefficient estimators b0 and b1 that minimize SSE

140
- The OLS formula for the slope can also be:

- OLS Regression Line Always Passes Through (x, y )

2. Sources of Variation in Y
- The total variation as a sum of squares (SST), split the SST into two parts:

● SST = total sum of squares


+ Measures the variation of the yi values around their mean, y
● SSR = regression sum of squares
+ Explained variation attributable to the linear relationship between x and y
● SSE = error sum of squares

141
+ Variation attributable to factors other than the linear relationship between x and y
3. Coefficient of Determination
- The coefficient of determination: the portion of the total variation in the dependent variable
that is explained by variation in the independent variable.
=> Measures how well a statistical model predicts an outcome
● The coefficient of determination called R-squared - denoted as R2.

2
Example: If 𝑅 = 0. 0867 => 8.67% of the variation in y can be explained by the x-variables.
- Examples of Approximate R2 Values
● The range of the coefficient of determination is 0 ≤ R2≤ 1

V - Tests for significance


1. Standard Error of Regression

- An estimator for the variance of the population model error

142
- Division by n – 2 → the simple regression model uses two estimated parameters, b0 and b1.
- The standard error of the estimate

- Comparing standard errors

- The magnitude of se should always be judged relative to the size of the y values in the sample
data
● Inferences about the regression model
- The variance of the regression slope coefficient (b1) is estimated by

where:
Sb1= Estimate of the standard error of the least squares slope

𝑆𝑆𝐸
𝑠𝑒= 𝑛−2
= Standard error of the estimate

143
Confidence Intervals for Slope and Intercept

These standard errors → construct confidence intervals for the true slope and intercept.
Using Student’s t with d.f. = n - 2 degrees of freedom and any desired confidence level.

2. Hypothesis Tests
- if β1 = 0 ➔ X does not influence Y
→ the regression model collapses to a constant 0 + a random error term:

- For either coefficient, we use a t test with d.f. = n - 2. The hypotheses and test statistics

144
3. Slope versus correlation
- The test for zero slope is the same as the test for zero correlation.
➔ The t test for zero slope will always yield exactly the same tcalc as the t test for zero correlation.

4. Analysis of variance: overall fit


- Decomposition of Variance

- F Statistic for Overall Fit


● To test a regression for overall significance → use an F test to compare the explained (SSR) and
unexplained (SSE) sums of squares
- ANOVA Table for simple regression

- The formula for the F test statistic

- F Test p-Value and t-Test p-Value


● The F test always yields the same p-value as a two-tailed t-test for zero slopes → same p-value
as a two-tailed test for zero correlation
2
● The relationship between the test statistics is 𝐹𝑐𝑎𝑐𝑙 = 𝑡𝑐𝑎𝑐𝑙

145
VI - Confidence and prediction intervals for y
- Construct an Interval Estimate for Y

- Quick Rules for Confidence and Prediction Intervals


● A really quick 95% interval → plug in t = 2 (since most 95 percent t-values are not far from 2)

146
Appendix D:

147
Exercise:
A hypothesis test is conducted at the 5 percent level of significance to test whether the
population correlation is zero. If the sample consists of 25 observations and the correlation coefficient
is 0.60, then the computed test statistic would be:

Key:
H0: ρ = 0
H0: ρ ≠ 0
ρ is the population correlation coefficient.

𝑛−2 25−2
𝑡𝑐𝑎𝑐𝑙 = r 2 = .60 2 = 3.597
1−𝑟 1−0.6

This is compared with a critical t (23 degrees of freedom two-tailed value) = 2.069 (see Appendix
D: d.f.=23, significance level for two-tailed test=.05)

Since computed t exceeds critical t, reject H0 and conclude the population correlation coefficient
significantly differs from 0 at the 5% level of significance.

148

You might also like