You are on page 1of 24

Surve Data Anal sis

Surve Data Anal sis Daniel Lo a Taft College

Survey Data Analysis

Ab

ac

Data was collected through a survey and fed into a database. From this database a sample of 50 respondents was taken and the data analysed using techniques learned in an introductory statistics course. This analysis was done in order to answer a series of questions regarding the relationship of different random variables such as gender, income, political association, handedness and opinion on several issues. Some results were inconclusive due to the size of the sample as explained in the results section.

In od c ion The purpose of our research here was to better understand how conduct a scientific study and write a proper report. We also wanted to get some practice using the statistical techniques we have learned throughout this course. We accomplished this by conducting a study on a population constructed by collecting data from random individuals using a survey. We took a sample from this survey and set out to answer several questions based on the data collected. Our first question was whether or not there is an appropriate relationship between a person s height, weight, shoe size and ring size. It was hypothesized that such a relationship exists since larger people will probably have all of these be greater than a smaller person. The second question was if there was a difference in income based on gender for which it was hypothesized that such a difference would be found. The third part had several parts, all regarding the relationship between party association and three other variables. The first part was to see if there was a relationship between party association and the respondent believing Obama would be reelected. The second was if they were in favor of the health care bill as passed and the third variable was their stance on the death penalty. For all of these I hypothesized that a relationship existed since these issues are

Survey Data Analysis

often divided along party lines. The next question also had more than one part to it. It was whether there was a relationship between handedness and a persons stance on the death penalty, and the amount of water they drank. For both it was hypothesized that no relationship would be found. After this I wanted to see who was more likely to switch to the tea party, Democrats or Republicans. It was hypothesized that Republicans are more likely to switch because of the Tea Partys clearly conservative ideals. The last question that will be answered from the data collected was whether students were less likely to work more than 30 hours a week than non-students. The hypothesis for this was that students were less likely to work more than 30 hours a week. How the data to answer these questions, and how the data was analyzed to answer them, is explained in the following sections.

Me hod Pa icipan The people that took the survey were individuals randomly selected by students in several different statistics classes taught in Bakersfield, CA and Taft, CA. It is likely a majority of the individuals that took the survey reside this this general area. Ma e ial Survey: This was a survey that consisted of 23 simple questions the participants were asked to answer. A copy of this survey can be found in the appendix. Database: We used an online database to store the data we collected.

P oced e

e Da a A a

Fi a e he

ff, e e

ei

he c a de

e ed 10 a d

e. The c e

d gi e he

e a

a ed. O ce each a ge i

had c ec ed hei 10 a

he added he hi da a a a d 16 a

da aba e ha e e 1-i a

ha i g da a f

2,628 i di id a . F i h =52 a d g

e f =50 a

a e . I hi ca e e e ded

a d

i ege . We he hi a ceeded a a di e hi da a i i g ec i Re Each f he The h agai he i e e i a da i add e i ed be ca be f l i h he da a di he a ed e i a d a - a e. - a e eec a ed de . e h he e . The a a i f

e da a ca be f

he f

e di . A

a a ha a e f 0.05. I edic ed ha he e de e d be a e a i hi be ee a e a f ci ' heigh , eigh , i g i e a d a ic e f he a d f he e ed

a d h e i e. I e .T ge he a c ea

ge he g a h I

ed he

a i ic , I e e ed he da a i a e i ed be a hi e he

a TI-84 g a hi g ca c a .I i ae h i g ha

he 1- a S a a iab e a de. A f U i gac

a d. The e be di ib ed

e ed e ha e a ha e a i g e idea ha e ead f 2. e.

e i gf ea i

a ea i

hi be ee a he a iab e I had a e beca e he e

ee ed i ade

e e 4 a iab e i

e Da a Anal i

Heigh Hi og am and

mma

a i ic

Mean 66.7800

S 4.0972

Min 53.0000

Q1 64.0000

M 67.0000

Q3 70.0000

Ma 74.0000

To anal e he di p e io a iable.

ib ion of eigh

in he ample I

ed he ame p oced e a

he

Surve Data Anal sis

Weight Histogram and summar statistics

Mean 167.9400

S 40.9775

Min 85

Q1 140

M 169

Q3 187

Ma 290

e Da a Anal i

Ring i e hi og am and

mma

a i ic

Mean 7.4167

S 1.7092

Min 3

Q1 6

M 8

Q3 9

Ma 10

The hoe i e da a a al o anal ed in he ame manne .

e Da a A a

Sh e i e hi

ga

a d

a i ic

Mea 8.8000

S 2.0677

Mi 3

Q1 7.5

M 9

Q3 11

Ma 12.5

The ec g hi i c ga

e i

I a ed

e he i

i h hi a

a ,i

he e a diffe e ce i de a d he da a I d e a ed

e ba ed

ge de ? M h e ed i

e . T be e he i c

a d ca c a ed he a e Ra ha e

a i ic f

e f each ge de . The I e f he h

a Wi c

- a e=0.7604 hich d e

he i .

Surve Data Anal sis

Male income histogram and summar statistics

Mean 53068.181 8

S 688832.83 68

Min 0

Q1 13000

M 30000

Q3 69000

Ma 300000

Survey Data Analysis

10

Female income histogram and summary statistics

Mean 32259.428 6

S 29747.421 4

Min 0

Q1 4000

M 30000

Q3 48500

Max 110000

Our next question was to see if there was a relationship between political party and a respondent s opinion on a series of issues. The data is summarized in a series of tables bellow comparing the respondent s party and their response to an issue. Reporting the data in this manner makes it easier to perform tests of independence. The first part was to see if there was a relationship between party and whether or not the respondent believed Obama would be reelected. The hypothesis for this was that the two variables were dependent, however, the results were inconclusive when the Independent and

Survey Data Analysis

11

Other columns were included, even if they were combined. Further collapsing of the table was not possible. Obama reelection Republican Yes No 4 15 Democrat 15 7 Independent 0 3 Other 4 2

The next part was to see if there was also a relationship between party and the respondent s approval of the health care bill. The hypothesis was that a relationship existed. In order to meet the assumptions for the test we collapsed the Independent and Other columns together. This returned a X2 of 10.9271 and a p-value=0.0042 which supports the hypothesis. health care Yes No Republican 5 14 Democrat 13 9 Other 0 9

The same was done about the question regarding party and stance on the death penalty. The Independent and Other columns were combined and a test of independence was performed. This produced a X2 of 0.1062 and a p-value=0.9483 which contradicts the hypothesis. death Yes No Republican 13 6 Democrat 14 8 Other 6 3

Survey Data Analysis

12

I then wanted to see if a person s handedness had an effect on their opinion about the death penalty and the amount of water they drank. I predicted that a person s handedness should have no effect on either of these. To check the first part of this I collected the data on a table in order to perform a test of independence. This resulted in X2=2.6364 and p-value=0.2676 which supports my hypothesis, however, these values had to be disregarded because the assumptions of the test were not met. Our results are inconclusive. Right Yes No 18 25 Left 4 1 Ambidextrous 1 1

For the next question, I drew histograms and calculated the summary statistics for the water drank by people of each handedness. The question was whether there was a relationship between handedness and water drank. The hypothesis was that there was no difference between handedness. The ambidextrous people were not considered because there were only 2 in the sample, which did not represent them properly. A rank-sum test resulted in a p-value of 0.5375 which supported the hypothesis.

Surve Data Anal sis

13

Water drank b left-handed people histogram and summar statistics

Mean 55.4000

S 39.1127

Min 8

Q1 16

M 60

Q3 92.5

Ma 100

Survey Data Analysis

14

Water drank by right-handed people histogram and summary statistics

Mean 76.9767

S 60.5941

Min 1.0000

Q1 32.0000

M 60.0000

Q3 120.0000

Max 256.0000

The next question was, what party were people in the Tea Party most likely to come from. My hypothesis was that most of the members of the Tea Party would come from the Republican party. The former party membership is show in the bar graph bellow. This question was slightly tricky. I wanted to perform a goodness of fit test but for that I needed to find the true proportions, which I didn t have. So I decided to use the proportion of each party s membership as a replacement since it seems reasonable to test against these values. The Independent and Other category were grouped together and the test was performed resulting in a p-value=0.8453 that had no meaning due to a violation of assumptions. This test was inconclusive.

Surve Data Anal sis

15

Democrats members Proportion of sample 22 44%

Republicans 19 38%

Other 9 18%

Former part association of Tea Part member

Our last question was whether students are less likel to work 30 hours or more than non-students. I predicted that students would me more likel to work less than 30 hours a week. This data was then put into a table and a test of independence was performed. The X2 was 13.1335 and the p-value was 0.0003. This supports the h pothesis.

16

L S N 18 6

30

30 8 23

Disc ssion O .W . T .T T 0.7604, . T T .T I I .T .T .T .M . A A , .F . I O . I .I , I . ,

Survey Data Analysis

17

of this, however, a rank-sum test proved my hypothesis that handedness has nothing to do with how much water somebody drinks. The results of the following question should be taken lightly since estimates were used in liue of true proportions. The question was whether individuals from one party were more likely to switch to the tea party. My hypothesis that members of one party were more likely to join the tea party could not be proved or discredited because the assumptions for the test failed so the p-value was practically useless. The last question was answered using a test of independence that showed being a student and working more that 30 hours a week were linked. Shortly after beginning the analysis it became clear that 50 was not a big enough sample size to reliably answer all of the questions proposed in the introduction. This sample size was selected because it appeared to be a manageable size. However, it turned out to be too small to the point that some statistical tests could not be performed. A better sample size would have been 75 or 100. This would have substantially increased the effort needed to analyse the data but at the same time produced better results. The main purpose of this project was to be a learning experience and provide a meaningful way to practice the techniques learned in the course. Appendi

The raw data is available upon request. Here are the assumptions and work for each test performed on the data. All of the following p-values were compared against alpha=0.05. 2: For the second question I performed a t-test with Ho: M=
F

HA: M>

Alpha= 0.05

Then we begin to check the assumptions which are violated by normal plots. This causes us to switch to a non-parametric test, the wilcoxon rank-sum. This results in a p-value of 0.7604.

Survey Data Analysis

18

Ho:

HA:

>

Alpha= 0.05

There was not enough evidence to suggest median female income was less than median male income.

Female Income

Male Income

3a: For the next section I did a test of independence for each of the tables. The first one, which was between party and opinion about Obamas reelection. HA: Party and belief that Obama will be reelected are independent. HO: Party and belief that Obama will be reelected are dependent. This first test of independence returned X2 =12.7055 and p-value=0.0053. However the assumptions were violated because half of the expected values were bellow 5.

8.74 10.26

10.12 11.88

1.38 1.62

2.76 3.24

Even when the table was collapsed to have the Other and Independent columns together the assumptions were still violated by the expected values since a third were bellow 5. This

e Da a A a

19

e e ed a c

bei g eached.

8.74 10.26

10.12 11.88

4.14 4.86

3b: The e HA: Pa HO: Pa

a be ee

a da

a f he hea h ca e bi .

a da a da

a f he hea h ca e bi a e i de e de . a f he hea h ca e bi a e de e de . a d ee a i i ce 3.24 5.76 a i h a e be 5.

The e ec ed a e a e i ed be 6.84 12.16 7.92

14.08

3c: The HA: Pa HO: Pa

e ef a d a d a ce

ed he a e e he dea h e a he dea h e a i a ce 14.52 7.48

he a

ab e.

a e i de e de . a e de e de . - a e f 0.9483 i he dea h e a a id. The e i e gh

a ce

The e ec ed a e f fi he a e ide ce 12.54 6.46 gge a a d

a e de e de . 5.94 3.06

4a: F i i

he e

e i

, he fi

a i

ed a e

f i de e de ce be ee ha ded e

a d

he dea h e a . a d a d a ce a ce he dea h e a he dea h e a a e i de e de . a e de e de .

HA: Ha ded e HO: Ha ded e

Survey Data Analysis

20

The expected values clearly show that the assumptions are violated so the p-value has no meaning. This test was inconclusive. 19.78 23.22 2.3 2.7 0.92 1.08

4b: The next part was finding a relationship between handedness and the amount of water drank. I decided to only compare left and right handed people because there were only two ambidextrous people in the sample. I was unable to perform a t-test because the normality assumption was violated by normal plots. Ho:R=L HA:R =/=L Alpha= 0.05

Left handed

Right handed

This forced me to switch to a rank-sum test which required no assumptions and returned a p-value of 0.5375. This meant the null hypothesis was not rejected. There is enough evidence to suggest right handed people drink different amounts of water than left handed people. Ho:
R

HA:

=/=

Alpha= 0.05

6: For the sixth and final question I performed a test of independence. HA: Being a student and working more than 30 hours are independent.

Survey Data Analysis

21

HO: Being a student and working more than 30 hours are dependent. The test resulted in X2 =13.1335 and the p-value was 0.0003 and the assumptions are met by the expected values. There is not enough evidence to suggest being a student and working are independent. 11.245 12.655 14.655 16.345

Questions contained in the survey

1. Gender: 2. Ethnicity:

Male White

Female Black Hispanic Other

3. Age (years): 4. Height (in inches, so 5 ft 7 inches would be 67): 5. Weight (pounds): 6. Hours worked per week: 7. Are you currently a Student? Yes No

8. Education Level (completed, not in progress): (High School Grad = 12, Associate Degree = 14, BA/BS = 16, MA/MS = 18, PhD = 20) 9. Annual Gross Income (numbers only, $39000 is 39000, not 39,000 or $39K): 10. Eye color: Brown Black Black Blue Blond Hazel Each eye is a different color Grey Red Other

11. Natural hair color. Other

Brown or Brunette

Silver

12. Number of ounces of water you drank for the two days prior to submitting this survey. One cup of water is 8 ounces whereas most glasses are 12 or 16 ounces.

Survey Data Analysis

22

13. Are you in favor of the death penalty?

Yes

No

In this context in favor means you are for the death penalty for at least one, but not necessarily all, crimes that are currently punishable by death.7 14. What political party do you most closely associate yourself with? By associate, I mean what party are you registered to vote with or if you are not registered, which party would you register with if you had to chose. Other 15. Are you registered to vote? Yes No Democrat Republican Independent

16. Do you personally know anyone that has been infected with the HIV virus? Yes No

By personally know, I mean a personal knowledge of that person. As an example, we all probably know Magic Johnson, but I doubt any of us have actually met him or are friends with him. If Magic is the only person you know who has been infected with HIV, then your answer to this question would be no.HIV, then your answer to this question would be no. 17. Are you in favor of the health care bill as passed? 18. Are you leftRhanded, rightRhanded, or Yes No Undecided

ambidextrous? Yes No Uncertain

19. Do you believe President Obama will be reelected? 20. Change of Party Affiliation

I considered myself a Democrat or Liberal and now associate myself with the Tea Party. I considered myself a Republican or Conservative and now associate myself with the Tea Party. I considered myself other than Democrat or Republican and now associate myself with the Tea Party.

Survey Data Analysis

23

Not applicable to me. 21. Consider Proposition 8, the 2008 proposition regarding marriage for same sex couples. The proposition defined marriage as between a man and a woman and prohibits same sex couples from marrying. Do you agree with Proposition 8? 22. What is your shoe size? 23. What is your ring size? 6. Hours worked per week: 7. Are you currently a Student? Yes No Yes No

8. Education Level (completed, not in progress): (High School Grad = 12, Associate Degree = 14, BA/BS = 16, MA/MS = 18, PhD = 20) 9. Annual Gross Income (numbers only, $39000 is 39000, not 39,000 or $39K): 10. Eye color: Brown Black Black Blue Blond Hazel Each eye is a different color Grey Red Other

11. Natural hair color. Other

Brown or Brunette

Silver

12. Number of ounces of water you drank for the two days prior to submitting this survey. One cup of water is 8 ounces whereas most glasses are 12 or 16 ounces. 13. Are you in favor of the death penalty? Yes No

In this context in favor means you are for the death penalty for at least one, but not necessarily all, crimes that are currently punishable by death.7 14. What political party do you most closely associate yourself with? By associate, I mean what party are you registered to vote with or if you are not registered, which party would you register with if you had to chose. Other 15. Are you registered to vote? Yes No Democrat Republican Independent

Survey Data Analysis

24

16. Do you personally know anyone that has been infected with the HIV virus? Yes No

By personally know, I mean a personal knowledge of that person. As an example, we all probably know Magic Johnson, but I doubt any of us have actually met him or are friends with him. If Magic is the only person you know who has been infected with HIV, then your answer to this question would be no. 17. Are you in favor of the health care bill as passed? 18. Are you leftRhanded, rightRhanded, or Yes No Undecided

ambidextrous? Yes No Uncertain

19. Do you believe President Obama will be reelected? 20. Change of Party Affiliation

I considered myself a Democrat or Liberal and now associate myself with the Tea Party. I considered myself a Republican or Conservative and now associate myself with the Tea Party. I considered myself other than Democrat or Republican and now associate myself with the Tea Party. Not applicable to me. 21. Consider Proposition 8, the 2008 proposition regarding marriage for same sex couples. The proposition defined marriage as between a man and a woman and prohibits same sex couples from marrying. Do you agree with Proposition 8? Yes No