You are on page 1of 45

UNIT II

Describing and Summarizing the Data


tools
Previous Lecture

2
What is a “Sample Size”?
• A sample size is a part of the population chosen for a survey or experiment.

• For example, you might take a survey of dog owner’s brand preferences. You won’t
want to survey all the millions of dog owners in the country (either because it’s too
expensive or time consuming), so you take a sample size. That may be several
thousand owners. The sample size is a representation of all dog owner’s brand
preferences. If you choose your sample wisely, it will be a good representation.

3
When Error can Creep in
• When survey on a small sample of the population, uncertainty creeps in to
statistics.

• When to survey a certain percentage of the true population, it can never be 100%
sure that statistics are a complete and accurate representation of the population.

• This uncertainty is called sampling error and is usually measured by a confidence


interval.

• For example, you might state that your results are at a 90% confidence level. That
means if you were to repeat your survey over and over, 90% of the time your would
get the same results.

4
How to Find a Sample Size in Statistics
• A sample is a percentage of the total population in statistics.

• It can be used to make inferences about a population as a whole.

• For example, the standard deviation of a sample can be used to approximate the


standard deviation of a population.

• Finding a sample size can be one of the most challenging tasks in statistics and
depends upon many factors including the size of your original population.

5
Some methods for estimating the size of sample

1.  Conduct a census if population is small. A “small” population will depend on


budget and time constraints. For example, it may take a day to take a census of a
student body at a small private university of 1,000 students but you may not have
the time to survey 10,000 students at a large state university.
2.  Use a sample size from a similar study. If similar type of study has already
been undertaken by someone else, then need access to academic databases to
search for a study (usually your school or college will have access). A pitfall: it’ll
be relying on someone else correctly calculating the sample size. Any errors they
have made in their calculations will transfer over to current study.
3. Use a formula. There are many different formulas can be used, depending on
what is known (or not known) about population. e.g. If some parameters about
population is known (like a known standard deviation), Cochran formula can use
to estimate the sample size. If not known much about population, use Slovin’s
formula.
6
Probabilty
• Probability: what is the chance that a given event will occur?
• Probability is expressed in numbers between 0 and 1. Probability = 0 means the
event never happens; probability = 1 means it always happens.
• The total probability of all possible event always sums to 1.
• Probability distributions: Permutations
• What is the probability distribution of number of girls in families with two children?
2 GG
1 BG
1 GB
0 BB

7
Histogram of above example

8
How about family of three?

9
10
11
12
13
Probability Distributions (PD)

DiscretePD Continuous PD
Exponentia
Bernoulli Binomial Poisson Geometric Uniform Normal Triangular Lognormal Beta
l
Distributio Distributio Distributio Distributio Distributio Distributio Distributio Distributio Distributio
Distributio
n n n n n n n n n
n

14
Discrete Probability Distributions
Probability Mass Function (PMF)
a mathematical function f(x) specifying the probability of the random
variable X.
xi represents the i th value of X.
Properties:

Cumulative distribution function:


F(x) = P(X ≤ x)
Discrete Probability Distributions
Example: Probability Mass Function (PMF) for Rolling Two
Dice
f(x2) = 1/36
f(x3) = 2/36
f(x4) = 3/36
f(x5) = 4/36
f(x6) = 5/36
:
f(x12) = 1/36
Discrete Probability Distributions
Example: Using the Cumulative Distribution
Function(CDF)
•Probability of rolling between 4 and 8:
= P(4 ≤ X ≤ 8)
= P(3 < X ≤ 8)
= F(x8) – F(x3)
=5/36-2/36
=1/12
Discrete Probability Distributions
Example: Computing the Expected Value of the sum of values on 2 die
rolls

E[X] = 2(1/36) + 3(1/18) + … 12(1/36) = 7


Discrete Probability Distributions
Example: Expected Value on Television

Apprentice example
• Teams were required to select an artist (mainstream or avant-garde) and
sell their art for the most money possible.

Deal or No Deal example


• Contestant had 5 briefcases left with $100, $400, $1000, $50,000 or
$300,000 in them.
• Expected value of briefcases is $70,300.
• Banker offered contestant $80,000 to quit.
Discrete Probability Distributions

Example: Airline Revenue Management


• Full and discount airfares are available for a flight.
• Full-fare ticket costs $560
• Discount ticket costs $400
• X = ticket price paid
• p = 0.75 (the probability of selling a full-fare ticket/Discounted)
• E[X] = 0.75($560) + 0.25(0) = $420
• The airline should not discount full-fare tickets because the expected value
of a full-fare ticket is greater than the cost of a discount ticket.
•Example: Computing the Variance of a Random Variable

21
Discrete Probability Distributions
Bernoulli Distribution
• two possible outcomes each with a constant probability of
occurrence
• typically “success” is x = 1 and “failure” x = 0
• p is the probability of a success outcome

E[X] = p
Var[X] = p(1 − p)
Discrete Probability Distributions
Example: Using the Bernoulli Distribution
Model whether an individual responds positively to a telemarketing
promotion.
• You have a box with 20 red and 80 white marbles.
• You ask individuals exposed to the telemarketing promotion to select a
marble and then replace it.
• If the customer selects a red marble, the customer makes a purchase.
• If the customer selects a white marble, the customer does not make a
purchase.
Discrete Probability Distributions
Binomial Distribution
• Models n independent replications of a Bernoulli experiment
• X represents the number of successes in these n experiments
Example: Computing Binomial Probabilities
• Suppose 10 individuals receive the telemarking promotion.
• Each individual has a 0.2 probability of making a purchase.
• Find the probability that exactly 3 of the 10 individuals make a
purchase.
Discrete Probability Distributions
Poisson Distribution
• Models the number of occurrences in some unit of measure (often
time or distance).
• There is no limit on the number of occurrences.
• The average number of occurrence per unit is a constant denoted as λ.
Discrete Probability Distributions
Example: Computing Poisson Probabilities
Suppose the average number of customers arriving at a Subway restaurant
during lunch hour is λ =12 per hour. The probability that exactly x customers
arrive during the hour is given by the Poisson distribution. Find the
probability that exactly 5 arrive during lunch hour:

f(5) = e-12(125)/5!
= (0.000006144)(248,832)/120
= 0.1274
Continuous Probability Distributions
Probability density function
• A curve described by a mathematical function that characterizes a
continuous random variable

Properties of a probability density function


• f(x) ≥ 0 for all values of x
• Total area under the density function equals 1.
• P(X = x) = 0
• Probabilities are only defined over an interval.
• P(a ≤ X ≤ b) is the area under the density function between a
• and b.
Continuous Probability Distributions
Uniform Distribution
All outcomes between a minimum (a) and a maximum (b) are equally likely.
• f(x) is PDF and F(x) is CDF

Area = 1
x
Continuous Probability Distributions
Uniform Distribution

• Expected Value =

• Variance =
Continuous Probability Distributions
Example: Computing Uniform Probabilities
• Sales revenue for a product varies uniformly each week between $1000 and
$2000.
•f(x) = 1/(2000-1000)
= 1/1000

Area = 1
Continuous Probability Distributions
Example: (continued) Computing Uniform Probabilities
• Find the probability sales revenue will be less
than $1,300.
• P(X < 1300) = (1300-1000)/1000) = 0.30
Continuous Probability Distributions
Example(continued): Uniform Probabilities
• Find the probability that revenue will be between
$1,500 and $1,700.

• P(1500 ≤ X ≤ 1700) = P(X ≤ 1700) − P(X ≤ 1500)


= F(1700) − F(1500)
= 700/1000 − 500/1000
=0.20
Continuous Probability Distributions

Normal Distribution
- f(x) is a bell-shaped curve
- Characterized by 2 parameters
 (mean)
σ2 (variance)
- Properties
1. Symmetric
2. Mean = Median = Mode
3. Unbounded
4. Empirical rules apply
Continuous Probability Distributions
Example: Using Z-table to Compute Normal Probabilities
• The distribution for customer demand (units per month) is normal with:
mean = 750
stdev. = 100
• Find the probability that demand will be:
a) at most 900 units/month
b) exceed 700 units/month
c) be between 700 and 900 units/month
Continuous Probability Distributions
Example: Computing Probabilities with the Standard Normal Tables

• Use the equation:

• Use Z table on next slide

1. P(X < 900) = Z = (900 − 750)/100 = 1.50 using Z- table the value is 0.9332

2. P(X > 700) = 1− 0.3085 = 0.6915 [z= (700-750)/100 = -0.50 using z table value
is 0.3085]

3. c) P(700 < X < 900) = 0.9332−0.3085= 0.6247


Using Z Table

37
Continuous Probability Distributions

Example continued):

0.9332
Continuous Probability Distributions
Example (continued):

0.6915
Continuous Probability Distributions
Example(continued):

0.6247
Continuous Probability Distributions
Exponential Distribution
• Models the time between randomly occurring events (arrivals, machine failures,
etc.)
with λ=1

• where λ is the mean rate of occurrences (from the


discrete Poisson distribution)
Keywords
• Probability
• Laws of probability
• Probability distribution
• Binomial distribution
• Poisson Distribution
• Normal distribution
Sampling distributions
• A sampling distribution is the probability distribution for all possible values of the
sample statistic.
• Each sample contains different elements so the value of the sample statistic differs for each
sample selected.
• These statistics provide different estimates of the parameter.

• The sampling distribution describes how these different values are distributed.  Technically,
it could choose any statistic to paint a picture, some common ones are:
• Mean
• Mean absolute value of the deviation from the mean
• Range
• Standard deviation of the sample
• Unbiased estimate of variance
• Variance of the sample
• Proportion
43
How measures of central tendency and spread are affected by changes to the data set

What happens to measures of central tendency and spread when we add a constant
value to every value in the data set? To answer this question, let’s pretend we have
the data set 3, 3, 7, 9, 13, and let’s calculate our measures for the set.
• Mean: (3+3+7+9+13)/5=7
• Median: 7
• Mode: 3
• Range: 13-3=10
• IQR: 11-3=8
What we see is that adding 6 to the entire data set also adds 6 to the mean,
median, and mode, but that the range and IQR stay the same.

44

You might also like