Data Analysis Basic

Attribution Non-Commercial (BY-NC)

25 views

Data Analysis Basic

Attribution Non-Commercial (BY-NC)

- Statistics for Research With a Guide to Spss
- PASW Categories 18
- Regression With SPSS Chapter 5_ Additional Coding Systems for Categorical Variables in Regression Analysis
- 11890 Chapter 3 Statistical Parameters
- Factors Relating to Labor Productivity Affecting the Project Schedule Performance in Indonesia
- Data and Representation
- Residential Satisfaction in Students Housing
- Assignment MB0050 Research Methodology Set 1 Set 2
- 2 klmpk
- Logistic & Multinomial Logistic
- docslide.us_statistics-s1-revision-papers-with-answers.pdf
- Test 1 Notes
- Statistics Project
- 10-044
- Measurement Scales 13-7-10
- Spatial Median Filter
- grade 6 unit 6 mini touchstone - sp 5
- The 10 Minute Talk
- correia1996.pdf
- building a histogram task

You are on page 1of 29

Goals

Describe the steps of descriptive data analysis Be able to define variables Understand basic coding principles Learn simple univariate data analysis

Types of Variables

Continuous variables:

Always numeric Can be any number, positive or negative Examples: age in years, weight, blood pressure readings, temperature, concentrations of pollutants and other measurements Information that can be sorted into categories Types of categorical variables ordinal, nominal and dichotomous (binary)

Categorical variables:

Categorical Variables:

Ordinal Variables

Ordinal variablea categorical variable with some intrinsic order or numeric value Examples of ordinal variables:

Education (no high school degree, HS degree, some college, college degree) Agreement (strongly disagree, disagree, neutral, agree, strongly agree) Rating (excellent, good, fair, poor) Frequency (always, often, sometimes, never) Any other scale (On a scale of 1 to 5...)

Categorical Variables:

Nominal Variables

Nominal variable a categorical variable without an intrinsic order Examples of nominal variables:

Where a person lives in the U.S. (Northeast, South, Midwest, etc.) Sex (male, female) Nationality (American, Mexican, French) Race/ethnicity (African American, Hispanic, White, Asian American) Favorite pet (dog, cat, fish, snake)

Categorical Variables:

Dichotomous Variables

Dichotomous (or binary) variables a categorical variable with only 2 levels of categories

Often represents the answer to a yes or no question Did you attend the church picnic on May 24? Did you eat potato salad at the picnic? Anything with only 2 categories

For example:

Coding

Coding process of translating information gathered from questionnaires or other sources into something that can be analyzed Involves assigning a value to the information givenoften value is given a label Coding can make data more consistent:

Example: Question = Sex Answers = Male, Female, M, or F Coding will avoid such inconsistencies

Coding Systems

0=No 1=Yes (1 = value assigned, Yes= label of value) OR: 1=No 2=Yes

When you assign a value you must also make it clear what that value means

In first example above, 1=Yes but in second example 1=No As long as it is clear how the data are coded, either is fine

You can make it clear by creating a data dictionary to accompany the dataset

A dummy variable is any variable that is coded to have 2 levels (yes/no, male/female, etc.) Dummy variables may be used to represent more complicated variables

Example: # of cigarettes smoked per week--answers total 75 different responses ranging from 0 cigarettes to 3 packs per week Can be recoded as a dummy variable: 1=smokes (at all) 0=non-smoker

Many analysis software packages allow you to attach a label to the variable values Makes reading data output easier:

Without label: Variable SEX 0 1 Variable SEX Male Female Example: Label 0s as male and 1s as female

Frequency 21 14 Frequency 21 14

With label:

Coding process is similar with other categorical variables Example: variable EDUCATION, possible coding:

0 1 2 3 = = = = Did not graduate from high school High school graduate Some college or post-high school education College graduate

Could be coded in reverse order (0=college graduate, 3=did not graduate high school) For this ordinal categorical variable we want to be consistent with numbering because the value of the code assigned has significance

0 1 2 3 = = = = Some college or post-high school education High school graduate College graduate Did not graduate from high school

Data has an inherent order but coding does not follow that orderNOT appropriate coding for an ordinal categorical variable

For coding nominal variables, order makes no difference Example: variable RESIDE

1 2 3 4 5 = = = = = Northeast South Northwest Midwest Southwest

Order does not matter, no ordered value associated with each response

Creating categories from a continuous variable (ex. age) is common May break down a continuous variable into chosen categories by creating an ordinal categorical variable Example: variable = AGECAT

1 2 3 4 5 = = = = = 09 years old 1019 years old 2039 years old 4059 years old 60 years or older

Example: Why did you choose not to see a doctor about this illness?

Example: didnt feel sick enough to see a doctor, symptoms stopped, and illness didnt last very long Could all be grouped together as illness was not severe Typically, dont know is coded as 9

Coding Tip

Though you do not code until the data is gathered, you should think about how you are going to code while designing your questionnaire, before you gather any data. This will help you to collect the data in a format you can use.

Data Cleaning

One of the first steps in analyzing data is to clean it of any obvious data entry errors:

Outliers? (really high or low numbers) Example: Age = 110 (really 10 or 11?) Value entered that doesnt exist for variable? Example: 2 entered where 1=male, 0=female Missing values? Did the person not give an answer? Was answer accidentally not entered into the database?

Many data entry systems allow double-entry ie., entering the data twice and then comparing both entries for discrepancies Univariate data analysis is a useful way to check the quality of the data

Examples: Only allowing 3 digits for age, limiting words that can be entered, assigning field types (e.g. formatting dates as mm/dd/yyyy or specifying numeric values or text)

Frequencies can tell you if many study participants share a characteristic of interest (age, gender, etc.)

Serves as a good method to check the quality of the data Inconsistencies or unexpected results should be investigated using the original data as the reference point

Do all subjects have data, or are values missing? Are most values clumped together, or is there a lot of variation? Are there outliers? Do the minimum and maximum values make sense, or could there be mistakes in the coding?

Mean average of all values of this variable in the dataset Median the middle of the distribution, the number where half of the values are above and half are below Mode the value that occurs the most times Range of values from minimum value to maximum value

Example Scatter Chart: Age 90

84 = Maximum (an outlier)

80

70

60

Age (in years) ,

50

36 = Median (50th Percentile) 33 = Mean

40

30

20

10

2 = Minimum

Standard Deviation

Example Scatter Chart 2: Age

90 80 70 60 50 40 30 20 10 0

90 80 70 60

Age (in years) ,

50 40 30 20 10 0

Figure left: narrowly distributed age values (SD = 7.6) Figure right: widely distributed age values (SD = 20.4)

Distribution

14 12

whether most values occur low in the range, high in the range, or grouped in the middle Percentiles the percent of the distribution that is equal to or below a certain value

Frequency

10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11

Age (years)

14 12

Frequency

10 8 6 4 2 0 1 2 3

10

11

Age (years)

Number of people answering example questionnaire who reside in 5 regions of the United States

30

Number of People

Another way to look at the data is to list the data categories in tables Table shown gives same information as in previous figure but in a different format

Table: Number of people answering sample questionnaire who reside in 5 regions of the United States Frequency Percent Midwest Northeast Northwest South Southwest Total 16 13 19 24 8 80 20% 16% 24% 30% 10% 100%

Education variable

Distribution of Education Level Observed dataExample on level Questionnaire of education Data from a hypothetical questionnaire

35 30 25 20 15 10 5 0 Less than high school High school graduate Some college College graduate

Observed distribution of education levels (top) Expected distribution of education (bottom) (1) Comparing graphs shows a more educated study population than expected

Percent

variable: EDUCATION

Data on the education level of theLevels US population aged 20 Expected Education years or older, from the US Census US Population aged 20 Years Bureau or Older

35 30 25 20 15 10 5 0 Less than high school High school graduate Some college College graduate

Percent

Are the observed data really that different from the expected data? Answer would require further exploration with statistical tests

variable: EDUCATION

Conclusion

Defining variables and basic coding are basic steps in data analysis Simple univariate analysis may be used with continuous and categorical variables Further analysis may require statistical tests such as chi-squares and other more extensive data analysis

References

1. US Census Bureau. Educational Attainment in the United States: 2003---Detailed Tables for Current Population Report, P20-550 (All Races). Available at: http://www.census.gov/population/www/socdemo/edu cation/cps2003.html. Accessed December 11, 2006.

- Statistics for Research With a Guide to SpssUploaded byEr. Rajendra Acharaya
- PASW Categories 18Uploaded byJairo Vargas Caleño
- Regression With SPSS Chapter 5_ Additional Coding Systems for Categorical Variables in Regression AnalysisUploaded byNadia Alam
- 11890 Chapter 3 Statistical ParametersUploaded byJosé Juan Góngora Cortés
- Factors Relating to Labor Productivity Affecting the Project Schedule Performance in IndonesiaUploaded byGeorge Nunes
- Data and RepresentationUploaded byMohammed Ahmed Ali
- Residential Satisfaction in Students HousingUploaded byShaiful Hussain
- Assignment MB0050 Research Methodology Set 1 Set 2Uploaded byChhavi Bajaj Khurana
- 2 klmpkUploaded bydellajuliatum
- Logistic & Multinomial LogisticUploaded byprakash19791234
- docslide.us_statistics-s1-revision-papers-with-answers.pdfUploaded byashique
- Test 1 NotesUploaded byDan Vu Pham
- Statistics ProjectUploaded bynist2017
- 10-044Uploaded byChandan Rungta
- Measurement Scales 13-7-10Uploaded byAnkur Dixit
- Spatial Median FilterUploaded by4164
- grade 6 unit 6 mini touchstone - sp 5Uploaded byapi-302577842
- The 10 Minute TalkUploaded byCristian Peña
- correia1996.pdfUploaded byNelson G. Souza
- building a histogram taskUploaded byapi-298215585
- stata stathisticsUploaded bymelonpp
- STUDY HABIT AT CLASS ROOMUploaded bynaveeen jose
- F3T4 - StatisticsUploaded byAteef Hatifa
- EpdintroUploaded byTiza Queenchy
- 6262833Uploaded byAgustín Rodríguez Aké
- 1. Basic StatisticsUploaded byNURUL HIKMAH ARSHAD
- Basic Data Analysis in Action Research With ComputerUploaded byIroal Joanne Narag Person
- GridDataReport MagnetUploaded byluluapriyantiinas
- 1-Descriptive Statistics (2).pptxUploaded bySuchismita Sahu
- 1-Descriptive Statistics (2).pptxUploaded bySuchismita Sahu

- Word Families TableUploaded bysanduong175
- Wordstress RulesUploaded byvivre_pooh
- Brownson - Applied Epidemiology - Theory to Practice.pdfUploaded byHung Tran
- Professional English for Students of Logistics DisclaimerUploaded byKhaled Samir
- Health Economics for Developing Countries - A Practical Guide 2000Uploaded byTrần Anh Tú
- An Introduction to ROC AnalysisUploaded byTrần Anh Tú
- Workflow Slides JSLong 110410Uploaded byTrần Anh Tú
- WISN_Eng_UsersManual.pdfUploaded byTrần Anh Tú
- Useful Stata CommandsUploaded bySuman Chattopadhyay
- The Sexual and Reproductive Health of Young Adolescents in Developing CountriesUploaded byTrần Anh Tú
- English for Mathematics (HoThiPhuong)Uploaded byHocLieuMo
- English is Not a Big Deal!Uploaded byTrần Anh Tú
- Thesis GuideUploaded byTrần Anh Tú
- Measles elimination programUploaded byTrần Anh Tú
- Get Ready for IELTS Reading Pre-Intermediate A2 (RED).pdfUploaded byKẹo Bông
- Health Research MethodologyUploaded byApollo Institute of Hospital Administration
- Sample of ReportUploaded byTrần Anh Tú
- Sampling and Descriptive StaticsUploaded byTrần Anh Tú
- SampleSize.tascott.handoutUploaded byTrần Anh Tú

- lesson 2 low levelUploaded byapi-297113268
- Nur Razimah (940907015834)Uploaded byNur Razimah Razali
- Case WritingUploaded byRoyce Untalan
- UNIT 4 Statistical Methods BTEC NATIONALS LEVEL 3Uploaded byJohn David Hoyos Marmolejo
- odvguideUploaded byRafael Estiadi Setiawan
- PTG Analysis and Reporting Quick RefUploaded byMuskegon ISD PowerSchool Support Consortium
- A STUDY ON CAREER MATURITY OF SECONDARY SCHOOL STUDENTS IN RELATION TO SCHOOL MANAGEMENTUploaded byAnonymous CwJeBCAXp
- Data Preprocessing -DWMUploaded byValan Pradeep
- UntitledUploaded byapi-262707463
- algebra ii unit 7 statisticsUploaded byapi-287816312
- Assessment and Evaluation of LearningUploaded byClifford Borromeo
- Chapter 2Uploaded byhemant023
- Advances in Binder-Treatment Technology Statistical Data on Ancorbond™ PlusUploaded byEdwar Villavicencio Jaimes
- CT3_QP_0507Uploaded byNitai Chandra Ganguly
- MBA SEMESTER 1 ASSIGNMENT QAMR.docUploaded bySumaya Tambal Almalik
- OCR S1 Revision SheetUploaded byDKS215
- Statistics MeanUploaded byGary Nugas
- PORTFOLIO in ASSESSMENT OF LEARNING 1Uploaded byLouie King Joaquin
- PROBABILITY DISTRIBUTIONUploaded byalborz99
- Descriptive Statistics W2Uploaded byKhusairi Anwar
- As Statistics 1Uploaded byrtb101
- New Microsoft Office Word Document (6)Uploaded byAnonymous N6Wk3B
- AlgebraUploaded byRonel Fillomena
- 1991 Mathematics Paper2Uploaded byGeorge Chiu
- 13. ISC EconomicsUploaded byPrincess Soniya
- phd assign.Uploaded byNiño R. Felix
- 5. Michael S. Lewis-Beck-Data Analysis_ an Introduction, Issue 103-SAGE (1995)Uploaded bymichelglais
- FinalUploaded byjay11493
- Maths Yr 9 KH T3 Data ProjectUploaded bymathsattrafalgar
- Chapter 8 I STATISTICS I+II (Enhance)Uploaded byciknuyuhuda