You are on page 1of 11

Running head: Analytics Internship: Data Mining Individual 1

Analytics Internship: Data Mining Individual

Hal Hagood

u02a1
Analytics Internship: Data Mining Individual 2

Applies Descriptive Statistics to a Survey Data Set Three Variables, Central


Tendency, Measure of Dispersion, and a Graphical Display of the Data

Data Import
Analytics Internship: Data Mining Individual 3
Analytics Internship: Data Mining Individual 4

For this particular data set, three variables were chosen i.e. Age, Sex and Race. Measure of

central tendency or mean, the median and the mode as well as measure of dispersion of data or

variability, scatter, or spread are given in the tables above and below. These are often called descriptive

statistics because they can help you describe your data.

The mean value for Age was 8.32 with a mode of 12.00 and standard deviation of 4.03

The mean value for Sex was 1.64 with a mode of 2.0 and standard deviation of 0.477

The mean value for Race was 1.30 with a mode of 1.0 and standard deviation of 1.30

Mean, Median and Mode

These are all measures of central tendency. They help summarize a bunch of scores with a

single number. Suppose you want to describe a bunch of data that you collected to a friend for a

particular variable like height of students in your class. One way would be to read each height you

recorded to your friend. Your friend would listen to all of the heights and then come to a conclusion about

how tall students generally are in your class but this would take too much time. Especially if you are in a

class of 200 or 300 students! Another way to communicate with your friend would be to use measures of

central tendency like the mean, median and mode. They help you summarize bunches of numbers with

one or just a few numbers. They make telling people about your data easy (Statistics, 2017).

Range, Variance and Standard Deviation

These are all measures of dispersion. These help you to know the spread of scores within a

bunch of scores. Are the scores close together or are they really far apart? For example, if you were

describing the heights of students in your class to a friend, they might want to know how much the heights

vary. Are all the men about 5 feet 11 inches within a few centimeters or so? Or is there a lot of variation

where some men are 5 feet and others are 6 foot 5 inches? Measures of dispersion like the range,

variance and standard deviation tell you about the spread of scores in a data set. Like central tendency,

they help you summarize a bunch of numbers with one or just a few numbers (Statistics, 2017).
Analytics Internship: Data Mining Individual 5
Analytics Internship: Data Mining Individual 6
Analytics Internship: Data Mining Individual 7
Analytics Internship: Data Mining Individual 8

Presents Results of Descriptive Statistics Analysis and Data Exploration of


a Data Set in a Format Appropriate for a Key Stakeholder Audience
From a stakeholder perspective looking at our variables the recorded age at admission initially

spikes at around 2 years of age then drops around 5 years of age. The highest spike occurs in the 12

years of age area and then drops to lowest point on the graph at around 15 years of age.

The graph showing gender i.e. male or female shows a higher frequency in the second category

of which we assume to be female. The recorded ethnicity consists of White, Black, Asian, Pacific Islander

and Indian. We can assume that the highest racial category is White with the lowest being Asian or

Pacific Islander.

Population Density is included as a general control variable for predicting response rates. In

terms of socioeconomics and demographics, population density can impact anything from resource

availability to commuting travel time. Thus, for the purpose of this relationship, it is considered a general

proxy for the hustle and bustle of modern life. In addition, relationships between population density is

included as a control when examining relationships between other variables and hospital-level VBP

scores. In this context, population density is treated as a loose proxy for patient volume. This operates on

the assumption that, generally speaking, hospitals primarily receive patients from surrounding

populations, and thus more dense populations lead to a higher influx of patients. It is recognized that this

is not a perfect proxy for patient volume, however, though this will be further discussed in the limitations

section below.

Measures of % White and % Hispanic are included to test for possible population demographic

effects on hospital performance and HCAHPS response rates. Again, this operates on the principle that

hospitals likely draw in patient populations that are reflective of their surrounding environment.

Furthermore, previous research has indicated that differences and similarities in both race and ethnicity

impact patient physician relationships, patient communication, care delivery, and perceptions of bias.

Thus, potential impacts of race and ethnicity should be included in the analytical model (cdn2, 2017).

Please see attached appendix for a further description of the HCAHPS data. The HCAHPS

survey contains 21 patient perspectives on care and patient rating items that encompass nine key topics
Analytics Internship: Data Mining Individual 9

Reference

Cdn2, (2017). HCAHPS Surveys Response Rates, Demographics, and Performance. Retrieved

January 20, 2017 from http://cdn2.hubspot.net/hub/249362/file-1678371031-

pdf/HCAHPS_Surveys_-_Response_Rates_Demographics

Hcahpsonline.org, 2017. CAHPS Hospital Survey. Retrieved January 20, 2017 from

http://www.hcahpsonline.org/home.aspx

Statistics, (2017). What are measures of central tendency and dispersion? Retrieved January 18, 2017

from http://statistics-help-for students.com/

What_are_measures_of_central_tendency_and_dispersion.htm#.WH-zURsrJQJ
Analytics Internship: Data Mining Individual 10

Appendix

The intent of the HCAHPS initiative is to provide a standardized survey instrument and data

collection methodology for measuring patients' perspectives on hospital care. While many hospitals have

collected information on patient satisfaction, prior to HCAHPS there was no national standard for

collecting or publicly reporting patients' perspectives of care information that would enable valid

comparisons to be made across all hospitals. In order to make "apples to apples" comparisons to support

consumer choice, it was necessary to introduce a standard measurement approach: the HCAHPS survey,

which is also known as the CAHPS Hospital Survey, or Hospital CAHPS. HCAHPS is a core set of

questions that can be combined with a broader, customized set of hospital-specific items. HCAHPS

survey items complement the data hospitals currently collect to support improvements in internal

customer services and quality related activities.

Three broad goals have shaped the HCAHPS survey. First, the survey is designed to produce

comparable data on the patient's perspective on care that allows objective and meaningful comparisons

between hospitals on domains that are important to consumers. Second, public reporting of the survey

results is designed to create incentives for hospitals to improve their quality of care. Third, public reporting

will serve to enhance public accountability in health care by increasing the transparency of the quality of

hospital care provided in return for the public investment. With these goals in mind, the HCAHPS project

has taken substantial steps to assure that the survey is credible, useful, and practical. This methodology

and the information it generates are available to the public.

In May 2005, the National Quality Forum (NQF), an organization established to standardize

health care quality measurement and reporting, formally endorsed the CAHPS Hospital Survey. The

NQF endorsement represents the consensus of many health care providers, consumer groups,

professional associations, purchasers, federal agencies, and research and quality organizations.

The HCAHPS survey contains 21 patient perspectives on care and patient rating items that

encompass nine key topics: communication with doctors, communication with nurses, responsiveness of

hospital staff, pain management, communication about medicines, discharge information, cleanliness of

the hospital environment, quietness of the hospital environment, and transition of care. The survey also
Analytics Internship: Data Mining Individual 11

includes four screener questions and seven demographic items, which are used for adjusting the mix of

patients across hospitals and for analytical purposes. The survey is 32 questions in length

(hcahpsonline.org, 2017