This action might not be possible to undo. Are you sure you want to continue?
values of another variable. Response variable is the outcome variable on which comparisons are made ex. Income level. Explanatory variable defines the groups to be compared with respect to values on the response variable. Positive Correlation (both up) Negative correlation (one up/one down) The closer to 1 in ABSOLUTE VALUE the stronger the correlation also the stronger the STRAIGHT LINE CORRELATION, the closer to 0 in abs. value the weaker the correlation. -.879 is still a stronger correlation than .750 Positive correlation indicates positive association. Negative correlation indicates neg. association. R^2 is the coefficient of determination, represents the % of data closest to the line of best fit. Ex What proportional reduction in error do we get by using the regression line to make predictions instead of simply using the mean? Answer use r^2 Denotes strength of linear association of x & y Correlation does not imply causation; can be expressed, as association does not imply causation. Correlation always falls between -1 & 1. Two variables have the same correlation no matter, which is treated as the response variable. Simpsons Paradox- direction of an association between two variables can change after including a 3 rd value and analyze the data at separate levels of that variable. It is possible for the association to reverse after adjusting for a 3rd variable. Regression analysis—often used for observations of a quantitative response variable over time. A regression equation if often called a prediction equation. Regression line predicts the response variable y as a straight-line function of the value x of the explanatory variable. *Construct a scatterplot before finding a correlation or regression line. Regression equation y^=a+bx find with LingRegTTest The correlation and regression line are NONRESISTANT they are prone to distortion by outliers. Prediction errors are called RESIDUALS. Extrapolation refers to using a regression line to predict y values for x values outside the observed range of data. Predictions about the future using time series data are called forecasts Regression outliers are well removed from the trend that the rest of the data follows Sampling Frame is the list of all subjects in the population from which the sample is taken. When an observation has a large effect on results of a regression analysis it is influential for it to be influential it has to be a regression OUTLIER. Sampling Design is the next step after sampling frame (a sound sampling design can prevent sampling bias but cannot prevent response or nonresponse bias. A lurking variable is a variable, (not measured in the study) usually unobserved, that influences the association between the variables of primary interest. A lurking variable may be a common cause of both the response and explanatory variable. Lurking variables have to potential to be Confounding—when two explanatory variables are both associated with a response variable but are also associated with each other. For a LURKING variable to be CONFOUNDING it must be included in the study and associated with neither the response nor explanatory variable. Randomization—in assigning experimental units (subjects) to the treatments helps to balance out lurking variables. ANECDOTAL EVIDENCE—come from personal observations. Not representative of the entire population. Systematic sample—take every kth one. Multiple causes (more common) the association between the two variables becomes difficult to study the effect of any single variable. Crossover Design (Really Good Design) a matched pair’s design in which subject’s crossover during the experiment from using one treatment to using another treatment. Matched pair—two observations for a particular subject, because they both come from the same person. Completely Randomized Design—subjects are randomly assigned to one of the treatments. Blocking (matching of subjects is a type of blocking) In experiments with matching, a set of matched experimental units. Randomized block—a block design with random assignment of treatments to units within blocks (to reduce possible bias treatments are usually randomly assigned within a block.) Design of StudiesBest way to collect data. REDUCE NOISE—INCREASE SIGNAL Observational Study—merely observes rather than experiments with study subjects. Some researchers use this term to refer only to studies that use available subjects (ex convenience sample) and not to sample surveys that randomly select people. Experimental Study—assigns to each subject a treatment; subjects in an experimental study are often referred to as EXPERIMENTAL UNITS. Researchers ―impose‖ a treatment or condition (such as exposure or non exposure to cell phone radiation.) CAN CONTROL for Lurking Variables, gives strongest INFERENCE. CAN ESTABLISH Cause & Effect. A simple random sample is often called a random sample. RANDOM SAMPLING IS THE BEST-all subjects in the frame have an equal chance of being selected. Simple Random Sampling—much more likely to get a representative sample if you let chance rather than convenience determine the sample; Random Sample Design—is implemented by using random numbers to select n subjects from the sampling frame. EU=the ―thing‖ to which treatment is applied. Replication involves more than one EU per condition (treatment) Methods of collecting sample surveys—personal interviews, telephone interviews, and self-administered questionnaire. Under coverage—having a sampling frame that lacks representation from parts of the population. Sampling Bias—Results from the sampling method (ex Non-random sampling or bias.) Subject does NOT have to be a person. Nonresponse Bias—when some sampled subjects cannot be reached or refuse to participate. To Reduce Bias—experiments should be double blind, with neither the subject nor the data collector knowing which treatment a subject was assigned. Response Bias—occurs when subjects give an incorrect response (ex lying) or the question wording or the way the interviewer asks the questions is confusing or misleading. Volunteer Sample—is the MOST COMMON type of CONVENIENCE SAMPLE (not ideal) however sometimes necessary in both observational studies and experiments. Convenience Sample—Not random, easy and cheap way to obtain data. Key Parts of a Sample Survey(most common use of SS is to estimate population percentages)—1) Identify the population of all subjects of interest 2) Construct a sampling frame (attempts to list all the subjects in the population) 3) Use a random sampling design; implemented using random #’s 4) Be cautious of sampling bias, due to non-random samples, under coverage, response bias, non-response bias. Stratified Random Sample—divides the population into separate groups, called STRATA, and then selects a simple random sample from each STRATUM. Cluster Random Sample—takes a simple random sample of clusters (such as city blocks) Most often by location. Factor—a categorical explanatory variable in an experiment, the categories are the treatments. Experimental Studies are preferable to non-experimental studies but are not always possible. Multi-Factor Experiments (Factorial Design) has at least two explanatory variables, allows you to test for a combination of treatments. Case-Control Study—an example of a retrospective study. Subjects who have a response outcome of interest, (ex cancer serves as cases) other subjects not having that outcome serve as (controls). The cases and controls are compared on an explanatory variable, like whether they were smokers, Case=Control Design
Census—a complete enumeration of an entire population. Also a survey that attempts to count the # of people in the population and to measure certain characteristics. Prospective Studies—follow subjects into the future, tracks exposure and disease status over time. Retrospective studies—look at the subjects past.
Cohort Study Design--at the beginning none have disease. Influential Observation—can strongly effect the correlation and regression equation. Cross-Sectional—at one point in time Contingency Table (used for two categorical variables) Scatterplot (used for two quantitative variables) displays the relationship and show a positive or a negative correlation. Probability Distribution—for it to be a valid the sum of all probabilities is 1, and each probability must fall between 0 &1. The probability distribution of a random variable specifies its possible values and their probabilities. It is the randomness of the variable that allows us to specify probabilities for the outcomes. Parameters—numerical summaries of probabilities, most are denoted by Greek letters ex mean and SD, and population mean or a population proportion The mean of a probability distribution for a discrete random variable can be interpreted as the expected value of that variable. It is the value that can be expected as the average in a long run of observations. (not unusual for the expected value of a random variable to equal a number that is not a possible outcome) SD=the larger the SD the greater the spread, describes how far that random variable falls, on the average, from the mean of its distribution. The mean for a continuous distribution is the value of X where the graph would be in balance. The mean is called a weighted average—used when x is not equally likely. If all possible values are equally likely, then the value of the probability distribution is constant and the curve of the constant will be straight-line A RANDOM VARIABLE—is a numerical measurement of the outcome of a random phenomenon. X-refers to the variable itself, x-refers to a particular value of the random variable. (ex X=number of heads in 3 flips; defines the random variable) x=2 represents a possible value for the random variable. A Discrete Random Variable—X has separate values such as (0, 1, 2, 3) X p(x)=the mean of a probability distribution for a discrete random variable. Continuous Random Variable—can take any value in an interval, for example time, age, and size measurements like height and weight. The interval containing all possible values has a probability equal to 1, are measured in discrete values because of rounding. Normal Distribution—(most important) is continuous, symmetric (symmetric around the mean), bell-shaped, and characterized by its mean the probability =0.68 within 1 SD, 0.95 within 2 SD’s and 0.997 within 3 SD’s of the mean. Z-sore for a value of x of a random variable is the number of sd’s that x falls from the mean. Z=x-mean/SD A STANDARD NORMAL DISTRIBUTION has a mean of ZERO and a Standard Deviation of ONE The mean and standard deviation completely describe the density curve. A negative (positive) z score indicates that the value is below (above) the mean. Probabilities for NORMAL CURVES are found using normalcdf(lower bound, upper bound, mean, standard deviation) also for normal random variables Invnorm function is used to find the value of z that corresponds to a certain probability. Invnorm(area under the curve, mean, sd) Finding probabilities for OTHER normally distributed random variables—1. State the problem in terms of the observed random variable P(X<x) 2. Draw a picture to show the desired probability under the given normal curve. 3. Find the area under the normal curve using normalcdf( Conditions for a BINOMIAL DISTRIBUTION—0) Counting the # of successes in a fixed # of trials. 1) Each trial has exactly two possible outcomes. 2) Each trial has the same probability of success 3) the trials are independent. 2&3 are the same thing. n * p = 17 * 0.6 = 10.2 expected success n * (1 - p) = 17 * 0.4 = 6.8 expected failures ALWAYS CHECK TO SEE IF BINOMIAL CONDITIONS APPLY—1) Binary data 2) the same probability of success for each trial 3) Independent trials. EX of Binomial conditions—Deal 10 cards from a shuffled deck and count the # of cards 1. Two categories? Yes, red card=success & black=failure 2. Fixed # n? Yes n=10 3. Independent observations? No,
cards not replaced-so they are not independent. 4. Probability is the same? No, cards are not replaced-so p will changed based as each new card is drawn. P(X=x)=binompdf(# of trials, probability, # of successes looking for) To find the probability of exactly X successes out of N trials. P(X<=)=binomcdf(# of trials, probability, # of successes looking for) cumulative distribution function (adds up all the probabilities of successes up to a certain number.
standard error=sqrt(.148g? Answer: normalcdf( -9999.15) EX—An exit poll in a recent election was conducted in order to predict the winner during the evening news.1+76..49 Ex—You are given $160 and told to pick one of two wagers for an outcome based on flipping a fair coin. . EX—Another (X>=1)binomcdf—if there was no racial profiling we would not be surprised if between about 87 and 135 of the 262 drivers stopped were negro.100. This theorem holds true no matter the shape of the population distribution. 219.600=$61. One home 3.20)=. 0.119.7)=1. where y is the percentage of the population that uses cell phones and x is gross domestic product (GDP in thousands of dollars per capita).p.315E-11 or 0.x-1) To find the probability of at least x successes out of n trials.15+X)g.120.25. and 53% say they voted for Cleedus Aardvark. If 100 customers have the characteristic of a random sample.206)=1-1=0 (not exactly zero but so close it rounds to zero.137. This gives X=. 0. What percentage of adults have systolic blood pressure is less than 100? Normalcdf(-1e99. the proportion of children having an index of at least 88? Normalcdf(88.885)=zscore of 1.621.15.53. apply CLT—normalcdf(-99999.6745)=.70. roughly the range of observed x values. Central Limit Theorem—states that for random samples of sufficiently large size (at least about 30 is usually enough) the sampling distribution of the sample mean is approximately normal. . You win $320 if it comes up heads and lose $80 if it comes up tails.5. It is described by PARAMETERS.78.36% residual EX tricky reg-line—There is a regression line y^=-2.43)=? P(-1.9648 Part2—Let X represent the number of people with Internet in the sample of 15.6)/50)=0. Confidence level is within 3 SE’s of the μ (mean) Ex—Population Distribution 7.2525.0.1)= -1. The mean of the sampling distribution of sample size 36 is 13.1+76.400 students are female.P(X ≤ x-1) (so if we did NOT get at least 3. because the sample proportion is a sample mean when the possible values are 0 and 1. ex.43)=normalcdf(-1. Find how the predicted murder rates increase as % with a college education increases from x=15% to x=40%.50+.25 then use normalcdf(90.1e99. cell phone usage increases by 2.53 and Standard Error of Square root(.50)/1000)=0.74 the probability that x=0 is binompdf(40. what is the probability that it weighs less than 0. If the POPULATION DISTRIBUTION is NORMAL.1e99.50+. sample proportion and sample mean. The mean of sample size 144 is still 13.885invNorm(.15g and standard deviation 0.148. SD=sqrt(40(. SE=$4/sqrt of 100=$.103.7 When the line is fitted with only 50 observations.148g is given by normalcdf( -9999. even with accounting for random variation.100 How much for 3.003)=0. EX—The normal random variable X is the number of successes in n trials.0289 EX—What is the probability that the sample mean from a randomly selected sample of 36 people will be > 200.14)=0. Answer -2. How much do you predict the house will sell for if it is 2.6)/0. Suppose you select a sample of 100 ball bearings at random.43. .43. such a random variable would be binomial.4 so it decreases from 5. MEAN=$8.0.0)=0. for a sample of n=50. Enter 320 & 80 in to L1 and .8472 TI83 defaults to 0.5 and SD=14.999999.500 also the correlation is positive because as the square footage increases the selling price increases. we take .7.0158 so then P(p^ 0. What proportion of children has an index of at least 125? Normalcdf(125. is probably skewed to the left. If 1000 people were randomly selected as they left the polls.90.600.003) = 0.20. so a 25% chance (one in four) b. The number of negro’s stopped is too high.2—Sunshine city was designed to attract retired people its current population is 55. p=0. where y is the selling price in thousands of dollars and x is the size of the house in thousands of square feet.25)=-. The news takes a random sample of 50 students and surveys them. what weight will have 95% of the distribution below it?) Answer: invnorm( . the sampling distribution of the mean is also normal for any sample size.5 & . the center of data distribution is 57.8580 Find the index score of the 96th percentile invNorm(. .25)(. mean=0. For a particular subject.123. b) 77% of the probability.L2= σ$120 Ex—At a university.15–X)g and (0. The distribution of ages is skewed to the left.26(40)=7. so the 5th percentile is: invnorm(.62%—The country with the maximum GDP.8 identify the random variable—answer: the # of yrs of education.73.0000101 EX Adult blood pressure is normally distributed with mean=120 & SD=20 What is the first quartile? (X<x)=0.8 and a SD of 4.000 increase in GDP. .P(X>=x)=1-binomcdf(n. it’s the distribution described by sample statistics. Answer: The central 90% would be between the 5th and 95th percentiles.5 the spread of data distribution is 14.003/sqrt(100)=.1% of its population using cell phones.2. The spread of the sampling distribution of the sample mean is 1. So. Find the number X such that 90% of ball bearings will have weights between (0.25.6 in fact it is z=(.0158)=0.0009 Ex (X=x)binompdf—A balanced die with 4 sides is rolled 40 times.0003)=1. Since we are told 10% is to the right 90% must be to the left* Ex—Adults systolic blood pressure is normally distributed with μ=120. Find the probability that 10 or more of those sample have Internet at home.65 & 8.65 to 1.000113 *remember 1 less than what your looking for. the spread of population distribution is 13. μ=0.20)=145.53)=normalcdf(0.7 so the predicted murder rate increases from 1. EX—Find the probability that a normal random variable takes a value greater than 1.20 Z score ex—find the z-score such that the probability that X is within z standard deviations of the mean is 0.0580. what is the probability the mean of the five is greater than 90? P(mean of x > 90)= to find new SD take 11.43)?=normalcdf(1.30. .69% answer (ii) y^=-0.25)=0.9886 where 50/sqrt(36) is the SE of X .2-0. (ii) maximum x=34.96.7+0. Suppose 15 people with computers were randomly and independently sampled. and the sampling distribution of the sample mean is approximately a normal distribution Ex—Jan’s all you can eat restaurant charges $8.0693=-1.5.00000000001315 way small.8 The SE is 4. Normalcdf(579.8.75)=2.1 for mean and sd EX-part1If we randomly select 1 individual. The standard deviation of the sampling distribution of the sample proportion is the standard error of the sample proportion. So there must be 25% below –z and 25% above z now use invNorm(. a.43 SD’s above the mean: P(z>1.0.2 answer y^=-0.14)=0.46%—Interpretation of the slope—For every $1. Applies to sample proportions as well. What is the probability that at least 1 of those sampled have Internet at home? (P>=1)=1-P(x<=0)1-binomcdf(15.003) = .422.1451 and solve for X. from a population the is 0.7+0.62x. A reading above 137 is high.46=44. .. 250.1)=0. as the sample size increases the standard error decreases Standard Error—(the standard deviation of the sampling distribution) describes how much a statistic varies from sample to sample. let x=1 if they get relief.62(34. . for the binomial dist.2.8. actually has 45.0. EX—a-d—The process of manufacturing a ball bearing results in weights that have an approximately normal distribution with mean 0. The data distribution describes the sample data.37 b) . Answer.5.04.43 SD’s of the mean.18 Ex (X>=1)binomcdf—Current estimates suggest that 20% of people in the U. and use the cumulative distn function (cdf): Margin of ERROR=1/sqrt(n)*100 Proportions are NEVER normal Histogram—graph that uses bars to portray the frequencies or relative frequencies for a quantitative variable.25)=..0. The center of the sampling distribution of the sample mean is 60.8..14)=127. what is the probability their pulse is greater than 90 with a mean of 73. Find how the predicted rate decreases as percent with a college education increases from 15% to 40%.6(1-0. prior to the polls officially closing. The actual # stopped (207) is well above these values. what proportion is between 117 & 137? Normalcdf(117. which are usually known. what is the probability that a poll of 1000 people could have a proportion of . written as P( X ≥ x).0648 this SE describes the SD of the sampling distribution.8/sqrt(144)=0.000 square feet Answer 9.2)=89.148.9)=0.5) 0.51.5.645invNorm(.0764 EX—Find the probability that a normal random variable assumes a value within 1. foot home sold for $300.8 with SE of 4.30(.1451.5(2)=$162. It has a distribution that is skewed to the right with a mean of $8. x=0 if they do not for a random sample of 50 people who suffer from migraines—state the probability distribution=for each observation the probability that the medicine helps is 0.1-89. not normal.29/2=.3 since sample size n=100 is large so the CLT is applicable. σ=20.4)=. Ex (X>x) 10% of adults have systolic blood pressure above what level? Given: adult systolic blood pressure is normally distributed with mean=120 & SD=20 P(X>x)=find x: invNorm(.70)/50)=. what does the central limit theorem tell us about the sampling distribution of the sample mean X of these 100 ball bearings? What is the probability that the sample mean is less than 0.e.25 find x: invNorm(0.000 sq.385.0.25=10. Answer: False. What is the 95th percentile of weights of ball bearings? (That is.600 so for every thousand square foot increase the price increases by $76.1e99.60 The standard error is se=sqrt(0. P(-1.0.5 EX WHAT IF (X>x)=0.4 The mean stays the same and SE decreases as n increases.8 this describes the variability of the mean for sample sizes of 36. 4% of the population have an index score below invNorm(. find the mean and standard error of the sampling distribution of the restaurant’s sample mean expense per customer.5(3)=$238.what is n and p? Find the mean and standard deviation of the distribution of X.1587 EX—Readings of blood pressure have a mean of 123 and a SD of 18. Answer 8.145=.0049 d.77/2=. we take advantage of the complementary probability rule.1)= -1. p=0. we must have gotten at most 2).50 females in a sample size of 50. also becomes normal as the sample size increases. 50/sqrt(36)) = . So the probability of getting 207 or more without profiling is essentially zero.400 so the house sold for $61. Predict cell phone use at the (i) minimum x value.0003 So the probability that the sample mean is less than 0.2-0.the mean is . Answer—The center of population distribution is 60.823 part2--If we select five individuals.0)=.1.26(15)=1.933 Ex—Probability distribution—For the population of people who suffer occasionally from migraine headaches.20.44 standard errors away.5x.1e99.1e99.2 to 7.18 EX—The probability that a Z is < z is 0.80 per customer to eat at the restaurant.2 & -2.7+0. Answer by P(X>=207) since 207 is evidence of profiling it would be any # above 207.120.15– X=0.43.4. Chapter 7.15 and SE=. Answer 9.14+2. but < 250 mg/dL? Answer: normalcdf(200. Sampling Distribution—is the probability distribution of a sample statistic.4 this makes the 51st observation a regression outlier because it pulls the line up on the right and suggests a positive correlation when without the 51st it is a negative correlation. 60% of 7. Suppose you select one ball bearing at random. Tells how a sample statistic falls to an unknown parameter.30 and does not is 0.95.53 or higher? Would you be willing to declare him the winner? Answer: In order to compute a probability of p^ we need to know its probability distribution.0. μ (mean)=40*. with computers have Internet.103.62(0.75? P(X>90)=normalcdf(90.14+2.60 female? Answer no since . that is skewed to the right with a mean of $600 and a SD of $150.003g.67 and SD of 11. 50% support Cleedus).400 more than the expected price.17x.17(40)=1.50 Ex—An index that is a standardized measure used in observing infants over time is approximately normal with a mean of 103 and a SD of 14.5 into L2 then 1-var stats L1.43<z<1.50 P(-z<X<z)=0.14+2.25. such as a sample proportion or sample mean.20 and a SD of $4.645)=zscore of .385=.20)=106. What is the z-score for a blood pressure reading above 137? 137-123/18=0. .43<z<1.2-0. EX—Suppose weekly income has a dist. Find the residualy-y^=45.8/sqrt(36)=0.103.63 *area entered is area to the left of the point desired.67.50 Basically what are the upper and lower bounds for the middle 50% of the area. and realize that P( X ≥ x) = 1 .000-238.50 is less than 2 standard errors away from .1549 c.50 (always use .15.14)=78.20. y^=8. P(X>=207)=1-P(X<=206)=1-binomcdf(262.4122 Z-score EX—Find the z-score such that the interval within z standard deviations of the mean for a normal distribution contains a) 29% of the prob. there isn’t a function to do it directly. A) Find the mean and standard error of the sampling distribution of the sample proportion of females. since the population distribution is skewed to the left the shape of the data dist.15.000 residents has a mean age of 60 years with a standard deviation of 13 years. They report that 25 of the 50 in the sample were female.745<X<.103.000 square feet. with mean .119 What is z? invNorm(0. 34. To find X.73.15.6745 So P(-6.881.0693 b) Is it ―unusual‖ to get a proportion of 0.75/sqrt(5)=5.5(.148g? Answer: The distribution of X would be normal.18)=0. if you observe x=0 would you be skeptical of the die? Find the probability that x=0—n=40. By the Central Limit Theorem it is approximately normal with a mean of p^=0. Population Distribution—is the probability distribution from which we take the sample. Ex—The population distribution of # of years of education for self-emlpoyed individuals in a certain region has a mean of 13. of X=number of 3’s.1)=0.S.0.05.145.26x for 51 observations on y^=murder rate and x=percent with a college education.17(15)=5.67.120. so it wont happen Ex—the selling price of homes can be predicted using y^=9.000 Find the residual the RESIDUAL is the difference of y-y^(hat) $300. They plan to randomly sample 100 farmers and use the sample mean weekly income to estimate the mean. If the truth in the population is that it is a tie (i. EX—The percent of the population in a country using cell phones can be predicted using y^=-0.1+76. A random sample of 100 residents of Sunshine city has a mean of 57.881 what is x? invNorm(1-0. . 1-binomcdf(15.30 is the proportion who get some relief from taking a certain medicine. What is the SE? SE=150/sqrt(100)=15 Find the probability that the sample mean is within $21 of $600.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.