Professional Documents
Culture Documents
•Percentiles
•The Bootstrap
•Confidence Intervals
•Using Confidence Intervals
Estimation - Introduction
• So far developed ways of inferential thinking. In particular, we learned how to use data
to decide between two hypotheses about the world.
– To assess the current economy, we might be interested in the median annual income of
households in India.
• But the value of any statistic depends on the sample, and the sample is
based on random draws. So every time data scientists come up with an
estimate based on a random sample, they are faced with a question:
"How different could this estimate have been, if the sample had come
out differently?"
• See one way of answering this question. The answer will give you the
tools to estimate a numerical parameter and quantify the amount of
error in your estimate.
Estimation
• But the value of any statistic depends on the sample, and the sample is
based on random draws. So every time data scientists come up with an
estimate based on a random sample, they are faced with a question:
"How different could this estimate have been, if the sample had come
out differently?"
• See one way of answering this question. The answer will give you the
tools to estimate a numerical parameter and quantify the amount of
error in your estimate.
Estimation
.
households in India
• But the value of any statistic depends on the sample, and the sample
is based on random draws. So every time data scientists come up with
an estimate based on a random sample, they are faced with a
question:
"How different could this estimate have been, if the sample had
come out differently?"
• See one way of answering this question. The answer will give you the
tools to estimate a numerical parameter and quantify the amount of error in
your estimate.
Estimation
• But the value of any statistic depends on the sample, and the
sample is based on random draws. So every time data scientists
come up with an estimate based on a random sample, they are
faced with a question:
"How different could this estimate have been, if the sample had
come out differently?"
• The median is the 50th percentile; it is commonly assumed that 50% the
values in a data set are above the median.
The General Definition - Percentiles
• Let be a number between 0 and 100. The pth percentile of a collection is the
smallest value in the collection that is at least as large as p% of all the values.
• By this definition, any percentile between 0 and 100 can be computed for
any collection of values, and it is always an element of the collection.
• In practical terms, suppose there are n elements in the collection. To find the
pth percentile:
– Sort the collection in increasing order.
– Find (p/100) *n. Call that k
• If is not an integer, round it up to the next integer, and take that element of
the sorted collection.
Quantile Plot
• Displays all of the data (allowing the user to assess both the overall
behavior and unusual occurrences)
• Allows the user to view whether there is a shift in going from one distribution to
another
Quantile-Quantile (Q-Q) Plot
• Each point corresponds to the same quantile for each data set and shows the unit price of items sold at
branch 1 Vs 2 for that quantile.
• For comparison, the straight line represents => for each given quantile, the unit price at each branch is
the same.
• The darker points - data for Q1, the median, and Q3.
• At Q1, the unit price of items sold at branch 1 < at branch 2. ie, 25% of items sold at branch 1 were <=
$60, Vs 25% of items at branch 2 <= $64.
• At Q2, the 50th percentile (marked by the median), 50% of items sold at branch 1 <= $75, Vs branch 2 <=
$85.
• In general, a shift in the distribution of branch 1 Vs 2 in that the unit prices of items sold at branch 1 < at branch 2.
Bootstrap
Bootstrap
• One sample - One estimate
• But the random sample could have come out
differently.
=> Then the estimate would have been different.
Main question:
• How different could the estimate have been?
• The variability of the estimate tells us something
about how accurate the estimate is.
Where to Get Another Sample?