Data Analysis

Florenda F. Cabatit RN MA Facilitator

Data analysis is the process by which information is rendered meaningful and intelligible (Polit and Hungler, 1995). It is the systematic organization and synthesis of research data and the testing of research hypotheses using those data (2004).

Statistical Analysis
Quantitative analysis deals with numerical analysis of information. It is the manipulation of numeric data through statistical procedures for the purpose of describing phenomena or assessing the magnitude and reliability of relationships among them. Statistics is the scientific method used in quantitative analysis.

Statistics helps to:  Organize data  Summarize data  Evaluate data  Present data in an easily understood form.

Two branches of Statistics:  Descriptive statistics statistics used to describe and summarize data  Inferential Statistics – statistics that permit inferences on whether relationships observed in a sample are likely to occur in the larger population.

Considerations in the choice of appropriate statistical methods
 The purpose of the research  The level of measurement of the

variables  The number of groups/variables involved  The type of groups being studied

Levels of Measurement
 Nominal - the lowest level

- involves assigning numbers to classify characteristics into categories - numeric codes assigned in nominal measurement do not convey quantitative information. - the numbers are merely symbols that represent different values. - categories must be mutually exclusive and collectively exhaustive.

Ordinal Measurement
 This involves sorting objects on the basis

of their relative standing or ranking on an attribute.  The numbers are not arbitrary-they signify incremental values but does not however, tell anything about how much greater one level is than another.

Interval Measurement
 A measurement in which

an attribute of a variable is rank ordered on a scale that has equal distances between points on that scale.

Ratio Scale
 A quantitative measurement in which intervals

are equal and there is a true zero point.  The highest level of measurement  All arithmetic operations are permissible with this measurement (add, subtract, multiply, and divide numbers on this scale).

Descriptive Statistics
Three characteristics to fully describe a set of data: • shape of the distribution values • central tendency • Variability

Review of Descriptive Stats.
 Descriptive Statistics are used to present

quantitative descriptions in a manageable form.  This method works by reducing lots of data into a simpler summary.  Example:  370 Centigrade as average adult body

temperature  SU’s quality-point system

Univariate Analysis
 This is the examination across cases of one

variable at a time.  Frequency distributions are used to group data.  One may set up margins that allow us to group cases into categories.  Examples include  Age categories  Price categories  Temperature categories.

Two ways to describe a univariate distribution  A table  A graph (histogram, bar chart)



 Distributions may also be displayed

using percentages.  For example, one could use percentages to describe the following:  Percentage of people under the poverty level  Over a certain age  Over a certain score on a standardized test

Distributions (cont.)
A Frequency Distribution Table
Category Under 35 36-45 46-55 56-65 66+ Percent 9% 21 45 19 6

Distributions (cont.)
A Histogram
45 40 35 30 25 20 15 10 5 0



Under 35




Central Tendency
 An estimate of the “center” of a

distribution  Three different types of estimates:  Mean  Median  Mode

 The most commonly used method of

describing central tendency.  One basically totals all the results and then divides by the number of units or “n” of the sample.  Example: The NCM 104 Quiz mean was determined by the sum of all the scores divided by the number of students taking the exam.

 The median is the score found at the

exact middle of the set.  One must list all scores in numerical order and then locate the score in the center of the sample.  Example: If there are 500 scores in the list, score #250 would be the median.  This is useful in weeding out outliers.

 The mode is the most repeated score    

in the set of results. Lets take the set of scores: 15,20,21,20,36,15, 25,15 Again we first line up the scores 15,15,15,20,20,21,25,36 15 is the most repeated score and is therefore labeled the mode.

Central Tendency
 If the distribution is normal (i.e., bell-

shaped), the mean, median and mode are all equal.  In our analyses, we’ll use the mean.

 Two estimates types:  Range

 Standard deviation
 Standard deviation is more

accurate/detailed because an outlier can greatly extend the range.

 The range is used to identify the

highest and lowest scores.  Lets take the set of scores:15,20,21,20,36,15, 25,15.  The range would be 15-36. This identifies the fact that 21 points separates the highest to the lowest score.

Standard Deviation
 The standard deviation is a

value that shows the relation that individual scores have to the mean of the sample.  If scores are said to be standardized to a normal curve, there are several statistical manipulations that can be performed to analyze the data set.

Standard Dev. (con’t)
 Assumptions may be made about

the percentage of scores as they deviate from the mean.  If scores are normally distributed, one can assume that approximately 69% of the scores in the sample fall within one standard deviation of the mean. Approximately 95% of the scores would then fall within two standard deviations of the mean.

Standard Dev. (con’t)
 The standard deviation calculates

the square root of the sum of the squared deviations from the mean of all the scores, divided by the number of scores.  This process accounts for both positive and negative deviations from the mean.


Distribution NOMINAL Central Tendency

Frequency distribution Contingency Table Mode



Frequency Distribution Contingency Table Scatterpoint Mode, Median

Central Tendency

Distribution RATIO/INTERVAL Central Tendency Variability

Frequency Distribution Contingency Table Scatterpoint Mode, Median, Mean Range, Variance, Standard Deviation

Inferential statistics

 Based on the law of probability  It provides a means for drawing

conclusions about a population, given data from a sample  It estimates population parameters from sample statistics

Inferential Statisticsconsists of two Statistical Inference
techniques: 2.Estimation of parameters 3.Hypothesis testing

Hypothesis Testing
Statistical hypothesis testing provides objective criteria for deciding whether hypotheses are supported by empirical evidence.  It is a process of disproof or rejection.  Researchers seek to reject the null hypothesis through various statistical tests.  Hypothesis testing uses samples to draw conclusions about relationships within the population.

Type I and Type II Errors
Type I Error - researchers make a type I error when a true null hypothesis is rejected. Type II Error – researchers make a type II error when a false null hypothesis is accepted

Level of Significance
This refers to the risk of making a type I error in a statistical analysis. The value selected beforehand signifies the risk or the probability of rejecting of rejecting a true null hypothesis. The two most frequently used significance levels (referred to as alpha or α) are: .05 .01

Level of Significance
 With .05 significance level, we are

accepting the risk that out of 100 samples drawn from a population, a true null hypothesis would be rejected only 5 times.

 With a .01 level of significance, the risk of

a type I error is lower: in only 1 sample out of 100 would we erroneously reject the null hypothesis.

Critical Region
This refers to the area in the sampling distribution representing values that are “improbable” if the null hypothesis is true. It is defined by the level of significance

Statistical Tests
Two-tailed test- this means that both ends or tails of the sampling distribution are used to determine improbable values. In one-tailed tests, the critical region of improbable values is entirely in one tail of the distribution-the tail corresponding to the direction of the hypothesis

An example of Critical Regions of a two -tailed test

Types of Statistical Tests

Parametric Tests – a class of inferential statistical tests that involve: a. Assumptions about the distribution of the variables b. The estimation of a parameter c. The use of interval or ratio measures.

Statistical Tests
Non-parametric Tests –statistical tests that do not estimate parameters - also called distribution-free statistics.

Steps in Hypothesis testing 1. State the alternative hypothesis
State the null hypothesis Establish the level of significance Select a one-tailed or two-tailed test Compute a test statistic Calculate the degrees of freedom Obtain a tabled value for the statistical test 8. Compare the test statistic with the tabled value.
2. 3. 4. 5. 6. 7.

The Decision Matrix
In reality What we conclude
Accept null Reject alternative We say... • There is no real program effect • There is no difference, gain • Our theory is wrong Reject null Accept alternative We say...
• • • There is a real program effect There is a difference, gain Our theory is correct • • •

Null true Alternative false In reality...
There is no real program effect There is no difference, gain Our theory is wrong • • •

Null false Alternative true In reality...
There is a real program effect There is a difference, gain Our theory is correct

1-α THE CONFIDENCE LEVEL The odds of saying there is no effect or gain when in fact there is none # of times out of 100 when there is no effect, we’ll say there is none α TYPE I ERROR The odds of saying there is an effect or gain when in fact there is none # of times out of 100 when there is no effect, we’ll say there is one

β TYPE II ERROR The odds of saying there is no effect or gain when in fact there is one # of times out of 100 when there is an effect, we’ll say there is none 1-β POWER The odds of saying there is an effect or gain when in fact there is one # of times out of 100 when there is an effect, we’ll say there is one

Decision Matrix
If you try to increase power, you increase the chance of winding up in the bottom row and of Type I error. If you try to decrease Type I errors, you increase the chance of winding up in the top row and of Type II error.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.