You are on page 1of 5
Introduction to Statistics: Topie2: Exploratory Data Analysis “BEB © loravsepowe2 Analysis Introduction (eoursewrets! sat va} 42. summa. datfutm0 exploratory dal. analyssed) (oursewurelsl sas v/u__u2.summarzing datafut_m0_explerstoy data analysisfedainoloe) Introduction (EDA) 1 of 2 Learning Objectives © Undersand he strucure ofa dataset andidentyaferentypesofvarables. Before we dive into Exploratory Data Analysis, and really appreciate its importance in the process of statistical analysis, let's take a step back for a minute and ask: ‘What do we really mean by data?” Data are pieces of Information about individuals organized into variables. By an individual, we mean a particular person or object. By a variable, we mean a particular characteristic of the individual Adataset is a set of data identified with particular circumstances. Datasets are typically displayed in tables, in which rows represent individuals and columns represent variables, Medical Records ‘The following dataset shows medical records from a particular survey: Variables Gender [Age | Weight | Heiaht | Smoking [Race] me ‘ibs tow 1=Yor) Patent] | | Pater #2| oF |e? | ta | ee 1 Back )Pauentss) oF | 7a | tes fae | 0 fate pateweers| iy | ae | son | rp | ona In this example, the individuals (sometimes also called ‘Units) are the patients, and the six variables are Gender, Age, Weight, Height, Smoking, and Race. Each row, then, gives information about a particular individual (in this case, patient), and each column gives us information about a particular characteristic of all the patients. In this example, the sample size (the number of individuals) is 75, Hostal Core Physician | Other Prot. Services| Prescriptions Nursing Home Care| Dental Senices Alaska 2570 7 seat siz 88 asizona sis sis7 sms $300, caltomis si S26 08 S09 Colorase si sor S325 66 Deisrre sis78 267 657 sos Georgia 22) $1478 sis rave 2, sis sa unos 70 08 inion S16 sis Keneae 161 sxe Kentucky z sis0 Louisiana ies Nese eey Oregon South Dakota washington West virgins wyoming Types of Variables and Levels of Measurement Variables can broadly be divided into two types: Quantitative and Categorical. * Quantitative variables represent a measurement or count and generally answer the question: *how much’, or *how many’. Examples of quantitative variables are the time you wait in line, the distance between a person's horne and wark, the number of fext messages a person sends in a day, a person's income, a person's foot length, the outside temperature (in degrees F°) + Categorical variables represent labels or ranks and places/classifies an individual into one of several groups. Examples of categorical variables are a person's eye color, a person's socioeconomic status (low, medium, or high), a person's political affiliation (Democrat, Republican, or Independent), a person handedness (right-hand or left-hand), a person's view of the death penalty (‘strongly agree, "agree", neutral’, "disagree", or strongly disagree) Comments: 1. Quantitative variables always take numerical values, For example, the outside temperature (in degrees Fo) can be 50, 66,-20, etc. the time you wait inline (in minutes) can be 5, 10, or 60, Itis important to mention that categorical variables may take ‘numerical values, however those are only ranks or codes. For example, a person's handedness might be coded as 1 = right-hand, 2=left hand, The numbers 1 and 2 have no numeric meaning but just represent codes. Another example would be when a person is asked about his/her view of the death penalty: 1 = strongly agree, 2 = agree, 3 = neutral, 4= disagree, 5 = strongly disagree. Again, this is clearly a categorical variable, and the numbers 1—5 are only codes for the different values. 2. Categorical variables are sometimes called qualitative variables, Throughout this course, we will use the term categorical and refer to them as categorical variables. Variables ‘On the next page you will have a chance to explore a dataset in the context of a clinical study and have hands-on practice of all the concepts we have learned on this page. + 4Erobatiq © 2017 Acrobatiq | soto « (courseware/s| stats.v1/ut__u2_summarizing data/ut_m0_exploratory_data_analysis/eda) (courseware/s|_stats_v1/u1__u2_summarizing data/u1_m0_exploratory_data_analysis/eda_introlbd)

You might also like