# Statistical Tools

Dr. Katherine Sauer Metropolitan State College of Denver Health Economics

Outline: I. Hypothesis Testing II. Difference of Means III. Regression Analysis

I. Hypothesis Testing A. Simple Hypothesis ³Men and women smoke different numbers of cigarettes´ State the hypothesis: Null hypothesis (hypothesis we wish to disprove): H 0: c m = c w ex: men and women smoke the same number of cigarettes

Alternative hypothesis (hypothesis that theory suggests to be the case) H1: cm  cw ex: men and women do not smoke the same number of cigarettes

B. Composite Hypothesis ³Rich people spend more on health care than do poor people´ State the hypothesis Null hypothesis (hypothesis we wish to disprove): H0: Er = Ep ex: the rich and poor spend the same amount

Alternative hypothesis (hypothesis that theory suggests to be the case) H1: Er > Ep ex: the rich spend more than the poor

II. Difference in Means Consider the example of men¶s and women¶s smoking. To compare men¶s and women¶s smoking rates we could ask people from the population at-large how many cigarettes they smoke per day. Since we can¶t ask everyone, how do we decide upon the sample to use?

Since many things other than gender may affect the number of cigarettes a person smokes, we can account for this by selecting a sample of people randomly from the universe of all people. We could also select a sample of people from a relatively homogeneous group, like, college sophomores from a given college.

Types of Data Continuous - natural measures that in principle could take on different values for each observation ex: height, weight, income, price Categorical - refer to arbitrary categories ex: gender (male or female) race (black, white, or other) location (urban or rural) Is the number of cigarettes smoked continuous or categorical?

Using NIH data for smokers from 2001 and 2002 it was found that: For 4,714 men, cm = 15.60 cigarettes per day For 4,841 women, cw = 13.47 cigarettes per day the difference is = cm ± cw = 2.13 cigarettes per day

The data shows a difference in the average number of cigarettes smoked per day by men and women. Does the difference represent a true difference between men and women smoking? or Did the sample randomly draw a higher average level for men (15.60) than for women (13.47)?

Let¶s look at the sample distribution.

Based on the distribution, some men and some women smoked far fewer and some smoked far more than the average. Variance is a measure of the dispersion of cigarettes smoked around the average.
mean: men (15.60) , women (13.47)

The larger the variance, the dispersion around the mean is large. - another observation may be far from the sample mean The smaller the variance, the dispersion around the mean is small. - another observation is likely close to the sample mean

In testing a hypothesis, would you rather see a large or small variance in your sample data?

The square root of the variance is called the standard deviation, s. A larger standard deviation indicates more dispersion around the mean. A smaller standard deviation indicates less dispersion around the mean.

The standard error of the mean is the standard deviation divided by the square root of the number of observations.

To test our smoking hypothesis formally, we can construct a ³difference of means´ test. - good for continuous data that can be broken up by categories We wish to compare the value, difference = cm ± cw to zero, which was the original hypothesis. Recall: difference = 2.13 The standard error of the difference is calculated to be equal to 0.216.

About 68 percent of a distribution lies within 1 standard error 2.13 ± 0.216 =1.91 2.13 + 0.216 =2.35 About 95 percent of a distribution lies within 2 standard errors 2.13 ± (2)(0.216) =1.69 2.13 +(2)(0.216) =2.56 How does this compare to our null hypothesis that the value difference is zero?

The t test: The t ± statistic is calculated as the value divided by the standard error. In our example: 2.13 / 0.216 = 9.86

As a rule of thumb, if the t-statistic is greater than 2, you have statistical significance.

This experiment would find very good evidence that among smokers, women smoke fewer cigarettes than men. The males have higher levels than the females, and the probability is well over 95 percent that this difference is statistically significant.

III. Regression Analysis - good for data that is continuous

Suppose we wish to explore the relationship between the cigarette tax and the amount of cigarettes smoked per day. null hypothesis: no effect (b = 0) alternative hypothesis: tax is inversely related to the quantity smoked (b < 0)

We want to know if the coefficient of -3.24 is significantly different from zero.

A coefficient of -3.24 means: A \$1 increase in the tax is correlated with a change in quantity demanded of 3.24 fewer cigarettes.

The elasticity is -0.09. This means a 1% increase in the tax will lead to a 0.09% reduction in quantity demanded.

A multiple regression includes more than one explanatory variable. ex: gender, race, age, education, income Some of the variables may be continuous, some may be categories. - interpretation is different

C C

C C

Continuous variables Notice how adding more variables changes the coefficient on excise tax. Is it still significant?

C C

C C

Income: Age: Education:

When using categorical variables in a regression, we need to assign them a numerical value. - dummy variables Dummy variables are used in regression analysis to determine whether groups of people differ from others. For example, maybe we would want to know if African Americans smoke more than other groups. We can create a dummy variable that assigns the value 1 if the person is African American or 0 otherwise.

D D

D

Because male appears as a variable, we know it was assigned a value of 1. (female =0) Is the male coefficient significant?

The interpretation of a dummy variable is different than that of a continuous variable. An African American female smokes 5.05 fewer cigarettes than white females. A white male smokes 2.23 more cigarettes than a white female.
African American No=0 Yes =1 No=0 Male Yes =1

0

-5.05

2.23

2.23 -5.05 = - 2.82

An African American male smokes 2.82 fewer cigarettes than a white female.

Summary of Statistical Competencies: Formulate questions in terms of hypotheses. Read statistical test results to determine if the result is significant. Understand statistical significance. Interpret reported regression results.