You are on page 1of 26

IPS

Introduction to Probability & Statistics


Lecture - 13

Prof. Pritam Ranjan


IPS

Data

• Population data

• Sample data

Pritam Ranjan / OM&QT 1


IPS

Why sampling?

• Collection of population data may be infeasible or expensive

• Analysis of population data would be


computationally/technically/financially more intensive

• The sample results are often adequate

Pritam Ranjan / OM&QT 2


IPS

Sampling techniques
• A superficial classification of sampling techniques

Pritam Ranjan / OM&QT 3


IPS

Sampling technique - 1

Pritam Ranjan / OM&QT 4


IPS

Simple random sampling


• SRSWR / SRSWOR (each item has an equal chance to be chosen)

– Draw a random sample of 10 m&m’s out of 1000

– Form a placement committee of 5 students

– To determine the quality of items produced on a factory line, a group of items


would be removed from the line, each receiving a unique number, and selected
randomly one at-a-time for testing.

Pritam Ranjan / OM&QT 5


IPS

Sampling technique - 2

Pritam Ranjan / OM&QT 6


IPS

Stratified Sampling
• What is stratified sampling?

Pritam Ranjan / OM&QT 7


IPS

Stratified sampling - example


• Form a placement committee of 5 students

– Total number of strata M =

– What should be a good choice of weight wi ?

Pritam Ranjan / OM&QT 8


IPS

Sampling technique - 3

Pritam Ranjan / OM&QT 9


IPS

Sampling techniques - example


• Suppose you wish conduct a random inspection of the faculty
quarters (numbered 1 - 80)
• You decided to survey only 10 faculty quarters.
• How should you choose your samples?

Pritam Ranjan / OM&QT 10


IPS

Systematic sampling
• Suppose you wish conduct a random inspection of the faculty
quarters (numbered 1 - 80)
• You decided to survey only 10 faculty quarters.
• How should you choose your samples?

Pritam Ranjan / OM&QT 11


IPS

Sampling technique - 4

Pritam Ranjan / OM&QT 12


IPS

Cluster sampling-1
• Example: A sociologist wants to estimate the average per
capita income in a certain small city. As no list of resident adults
is available, she decides that each of the city blocks will be
considered one cluster. The clusters are numbered on a city
map from 1 to 415, and the experimenter decides she has
enough time and money to sample n = 25 clusters where every
household will be interviewed within the clusters (blocks)
chosen. The objective is to estimate the average per capita
income in the city.

Pritam Ranjan / OM&QT 13


IPS

Cluster sampling - 2
We first divide the population into “clusters”, then select a simple
random sample (SRS) of these clusters, and further use SRS within
these clusters to choose individual units.

• Consider sampling children in an elementary school.


• Take a random sample of classes
• Take random samples of students from the chosen classes.

Pritam Ranjan / OM&QT 14


IPS

Multistage sampling
• Example: In a survey of students from a city, we first select a
sample of schools, then we select a sample of classrooms
within the selected schools, and finally we select a sample of
students within the selected classes.

• Example: Estimate the average number of households in India.

Pritam Ranjan / OM&QT 15


IPS

Sampling technique - 5

Pritam Ranjan / OM&QT 16


IPS

Non-probability sampling
Market researchers often use this sampling approach to quickly and
easily gather data while minimizing costs in time and money.
• Quota sampling
• Snowball sampling (chain sampling)
– e.g., wish to understand the behaviour of the doctors
pretend you have a complex disease, follow the chain

Pritam Ranjan / OM&QT 17


IPS

Skill Development

• How to choose a sample


of size 1000 from 20L (approx)?
Female : 1200659
Male.... : 800310
Transg.. : 27

Pritam Ranjan / OM&QT 18


IPS

• What to do with this data (sample)?

– Data analysis
– Modelling
– Inference (Prediction / forecasting)

Pritam Ranjan / OM&QT 19


IPS

Data examples - Applications


• Actual net content of 12 randomly chosen bottle of “Versace
Eros” (labelled net content: 3.4 oz)

• Number of successful SBI credit card subscription at Big Bazaar


– special holiday (e.g., Independence day) or not
– day of the week
– age of the customer

• S&P 500 index

Pritam Ranjan / OM&QT 20


IPS

Which Distribution

• Discrete: Bernoulli, Binomial, Poisson, Hypergeometric,


geometric, negative binomial, etc..

• Continuous: uniform, beta, exponential, gamma, chi-square, F,


normal, t, Cauchy, etc.

Pritam Ranjan / OM&QT 21


IPS

Which Distribution

• QQ-plot (rough idea):


– sorted data {X(1) , X(2) , ..., X(n) }
– Generate n random observations from N(0, 1)
– sorted normal data {Y(1) , Y(2) , ..., Y(n) }
– If data is normal, then Corr (X(i) , Y(i) ) ≈ 1
– plot (X(i) , Y(i) ) for i = 1, 2, ..., n:
– if data is normal, then points fall roughly on a straight line.

• Kolmogorov-Smirnov test: test if {X1 , X2 , ..., Xn } is a random sample from


normal distribution:
D = Sup|Fn (x) − Φ(x)|,
x

where Fn (x) is the empirical CDF.

Pritam Ranjan / OM&QT 22


IPS

Inference

Objective:
– Guess a distribution of data: fX (θ)
– Estimate θ using a random sample {X1 , ..., Xn }

Pritam Ranjan / OM&QT 23


IPS

Sampling Distribution

– Sampling distribution of θ̂ = S(X)

Pritam Ranjan / OM&QT 24


IPS

Next

• Estimation
– how do you choose good S(X) for a θ?

Pritam Ranjan / OM&QT 25

You might also like