Ph.D. B.A. Program Class RES600:

Introductory to Data Analysis

Professor: Dr. Truel

Student: Anh Tran

E-mail: Anh.NTran@my.trident.edu

Phone: 714-904-6209

Subject

Date

From

SLP #3 for Module 3: Describing data

statistically: Randomness, probability, and

distributions

18-Nov-2013

A. Tran

References

Module 3 - SLP

1. Comment on the following sampling designs. Are they appropriate? If not, what are the potential

problems?

A. A citizens group interested in generating public and financial support for a new

university basketball arena printed a questionnaire in area newspapers. Readers return the

questionnaires by mail.

In this survey, the sampling frame is defined as the people that read these particular area

newspapers, where the survey questionnaires are printed. This group of people is a subset of the

general population. The survey may introduce a population specification error where researchers

select an inappropriate population from which to collect data.

B. A department store that wishes to examine whether the store is losing or gaining

customers draws a sample from its list of credit card holders by selecting every 10th name.

First, this survey method is subjected to sampling error. The sampling error occurs when the

resulting sample is not representative of the population concern. In this case, customers that used

credit cards to purchase at the department store is a subset of all customers, which can pay by cash

or other means.

Second, the name selection scheme (i.e., every 10

th

name) can be used for probability

sampling but it must involve a random start and then proceeds with the selection of every 10

th

element from then onwards. This selection is subjected to systematic sampling error.

C. A motorcycle manufacturer decided to research consumer characteristics by sending 100

questionnaires to each of its dealers. The dealers would then use their sales records to trace down

buyers of their brand of motorcycle and distribute the questionnaires.

This is a non-probability sampling procedure where the targeted group is the people have

expertise on motorcycles and use certain brands of motorcycles. Its also called Expert Sampling

procedure.

D. A research company obtains a sample for a focus group through organized groups such as

church groups, clubs, schools, etc. The organizations are paid for securing a respondent and no

individual is directly compensated.

This is a non-probability sampling procedure where the targeted group is the people have

expertise on motorcycles and using certain brands of motorcycles. Its also called Expert Sampling

procedure.

E. A banner ad on a business-oriented web site read Are you a large company Sr.

Executive? Qualified execs receive $50 for under 10 minutes of time. Take the survey now! Is this

an appropriate way to select a sample of business executives?

This type of sampling is one of the non-probability sampling called convenient sampling,

which doesnt allow the findings to be applied from the sample to the population. Therefore, the

sampling methodology is not an appropriate way to select a sample of business executives to

represent the whole population of business executives.

2. Jury duty is supposed to be a totally random process. Comment on the following computer

selection procedures and determine if they are indeed random processes.

A. A program instructs the computer to scan the list of names and pluck names that are next

to those from the last scan.

This sampling process is not random because it doesnt include a random start by selecting

the list of the names for scan randomly before picking the next to the last from these scans.

B. Three-digit numbers were randomly generated to select jurors from a list of licensed

drivers. If the weight information listed on the license matched the random number, the person was

selected.

Using the computer program to generate random numbers and using these numbers to match

with the weight information of the selected jurors is an appropriate method to select persons for jury

service randomly.

C. The juror source list was obtained by merging a list of registered voters with a list of

licensed drivers.

This is not a random sampling process because the merging of these two lists are very static

(repeatable) and not random at all.

3.Why is the standard deviation typically utilized rather than the average deviation (sum all the

deviations and divided by the sample size)?

When drawing repeated large samples from a normally distributed population, the standard

deviation of their individual mean deviations is 14% higher than the standard deviations of their

individual standard deviations (Stigler 1973). Thus, the SD of such a sample is a more consistent

estimate of the SD for a population, and is considered better than its plausible alternatives as a way

of estimating the standard deviation in a population using measurements from a sample (Hinton

1995, p.50). That is the main reason why SD has subsequently been preferred, and why much of

subsequent statistical theory is based on it.

4. What is the sampling distribution? How does it differ from the sample distribution? Please

explain with one or two examples.

A sampling distributions is a distribution of a statistics of a group of samples. The sample

distribution is a statistics of the data in a sample.

For example, the Table 1 contain the sample distribution statistics of 8 samples shown in

columns, which include sample means, sample standard deviation, sample skewness, and sample

kurtosis. If we do the descriptive statistics of the sample means,

the sampling distribution of the mean with a sampling mean of 19.961 and the sampling standard

deviation of 0.406 as shown in Table 2.

Table 1

Table 2

Descriptive Statistics: Sampling Distribution of The Sample Means

Variable N Mean Median TrMean StDev SE Mean

C1 8 19.961 19.864 19.961 0.406 0.143

Variable Minimum Maximum Q1 Q3

C1 19.392 20.730 19.768 20.242

5. As long as the sample size is large enough, we will get a normal distribution. Is this statement

true?

The Central Limit Theorem states that if you draw a large enough sample, the way the

sample mean varies around the population mean can be described by a normal distribution but its

not a normal distribution. With a large sample size, a sampling distribution will approach a normal

distribution and make a good approximation using the normal distribution characteristics as shown

in the below figure 4-19. Therefore, I think the statement is not absolutely correct.

Reference: Montgomery, D.C.,Runger, G.C., Applied Statistics and Probability for Engineers, 5

th

edition, John Wiley & Sons, 2011

