Professional Documents
Culture Documents
One example here is histological abbreviations. If one metric is histology, your column name would be listed as "histology" and
instead of listing "ductal carcinoma in situ" in the data entry, you would use the abbreviation DCIS and include a table legend in the
Statistics in Research Analysis excel file for the statisticians to use as needed
Many of you will be using the UW-L stat center for help on your data analysis. It is very
important that you submit your data to them in a universal format to avoid misunderstandings
and time delays in analysis.
7 steps in preparing data for analysis
1. Check data for accuracy and completeness
The data should be checked other than someone who entered the data. Ideally, the data should be checked by more than one person
(if groups of 3, the other 2 group members should check the data; if groups of 2, the other group member should check the data)
Label each variable in a code book, on the instrument or direct data entry program (Excel, Statistical Analysis Software (SAS),
Statistica and the Statistical Package for the Social Sciences (SPSS) (the latter is used by the UW-L statistical center)
For many of you, this will mean entering in each variable you are intending to measure or anything that has value to your study
results. For example, if you are measuring 8 metrics in planning, you would have 8 varaibles to list in your data entry program. If
those metrics were dependent on age or histology of disease, those would also be noted per data point. However, it's important to
think about the goal of your research. If you want to study incidence of breast cancer among 20-30 year olds, age is important to
include. If you're only looking at incidence of breast cancer, age is not relevant.
Assign variable labels to computer locations using naming convensions of the particular statistical application you plan to use.
Develop a comprehensive Codebook
3 and 4 go hand in hand. It is your choice to include the labels and codebook in one file or you can separate them out.
8/14/23, 12:06 PM Statistics in Research Analysis 8/14/23, 12:06 PM Statistics in Research Analysis
Preparing and Organizing Data-Quantitative Continued 1. Transcription-involves transforimgin an audio recording into a written record
2. Organizing data
Similar to the quantitative approach, you must determine the best way to organize your data for ease of statistical interpretation.
Manual data entry such as Excel is useful but many survey tools such as Qualtrics and Survey Monkey offer transcription and data
organization directly in their software.
# of programs
Nominal= means "name", simply to label attributes, way to classify (eg. cancer site, eye color Year with open Avg % open # open seats
yes/no, 1,2 seats
No ordered relationship, numbers are discrete eg. 2.5, numbers assigned have no
meaning when added or subtracted 2000 12 52% 107
Ordinal: Ordinal means to provide order. This value is also arbitrary or symbolic, but does
show the magnitude of differences between levels.
Rank or Likert scales---grade of skin reaction 2001 17 40.5% 105
Interval data: similar to ordinal and nominal data, but has equal spacing between categories.
Fits temp readings, IQ scores
2002 17 34% 86
Ratio: highest order of measurement.
Frequencies
Usually displayed in tabular or graphic form- most basic descriptive statistic
8/14/23, 12:06 PM Statistics in Research Analysis 8/14/23, 12:06 PM Statistics in Research Analysis
Charts
Charts also show frequencies
Central Tendencies
Mode
https://softchalkcloud.com/lesson/files/iLo901ScPqgMQH/StatisticsInResearch_2020_print.html 7/16 https://softchalkcloud.com/lesson/files/iLo901ScPqgMQH/StatisticsInResearch_2020_print.html 8/16
8/14/23, 12:06 PM Statistics in Research Analysis 8/14/23, 12:06 PM Statistics in Research Analysis
Median
Score that divides the sample in half
Mean
the arithmetic average = Σ x/n
Percentile=percent of scores below a particular score (would you rather be in the 98th percentile or
2nd percentile of IQ scores? What percentile is the median score in a distribution?
8/14/23, 12:06 PM Statistics in Research Analysis 8/14/23, 12:06 PM Statistics in Research Analysis
In measuring anything in the natural world or any large population a normal (bell shaped) curve Standard deviation tells the researcher how scores deviate on average from the mean. On a normal
is described. In other words it is normal to see fewer individuals at either end of a continuum curve, 68% of scores will fall within one standard deviation of the mean.
than in the middle.
Frequencies can be asymmetrical, symmetrical, or skewed Correlational Analysis
Based on the mean and standard deviations.
In general, 68% of values will fall within 1 SD above or below the mean
In a normal curve- the mean, median, and mode are in the same location. In a skewed Expresses quantitatively the degree and direction of the relationship between variables.
distribution the three measures fall in different places. Helps determine validity and reliability
Cause and effect
May be positive or negative (-1 to +1)
1=perfect (each variable changes at the same rate) positive correlation, -1 perfect negative
Variance correlation
How scores are spread out from center e.g., Reading level goes up one level with each year of school=perfect positive correlation
The range
Difference between highest and lowest scores
Variance=sum of the squared deviations from the mean
Scatter Diagrams
Scatter diagrams show correlation
Positive
As value of one variable increases or decreases, the other variable changes in the same
direction
Standard deviation: square root of variance
Negative
Each variable is in an opposing direction
Variance Non-linear
Variance is measure of dispersion, or how much the individual scores differ from the mean, if there
is little dispersion, the scores are similar. Shows how scores or numbers vary and provide
information about scoring patterns of entire group.
T-Test, ANOVA
Nonparametric includes nominal and ordinal data.
Chi-square, one way analysis of variance
Category affects the type of statistical tests that can be utilized with that data
Parametric vs Nonparametric
Rules that must apply to use parametric tests:
Correlations Coefficients Sample must represent the target population so that the variables fall within the normal
distribution for that population
Compare pairs of numbers Must generate interval or ratio data
Pearson's r for parametric data Random assignment to groups or matching must have occurred
Spearman's rho (ρ) for non-parametric
Can be used to predict the value of one variable given the value of the other once calculated
Steps in the Inferential Process
I. State the hypothesis: restate the working hypothesis into the null hypothesis
Inferential Analysis (Level Two) II. Select a significance level: convention is p<0.05
Tools to determine the extent to which the observations of the sample are representative of the III. Compute a calculated value: choose a test related to type of data
population
Also determines whether conclusions about the population can be drawn from the sample IV. Obtain a critical value from a table
Statistical difference: Answers the question: What is the probability that this change occurred
because of events in the research study and what is the probability that this change would V. Reject or fail to reject the null hypothesis
have occurred anyway, by chance?
Steps in the Inferential Process
Parametric vs Nonparametric
Null Hypothesis
Parametric includes data that is either interval or ratio type Hypothesis, say NO relation between groups; easier to prove no relation than that there is
definitely a relationship.
https://softchalkcloud.com/lesson/files/iLo901ScPqgMQH/StatisticsInResearch_2020_print.html 13/16 https://softchalkcloud.com/lesson/files/iLo901ScPqgMQH/StatisticsInResearch_2020_print.html 14/16
8/14/23, 12:06 PM Statistics in Research Analysis 8/14/23, 12:06 PM Statistics in Research Analysis
Significance level, researcher decides if it is significant or not Identify relationships between one set of variables then determine if this relationship can be
Choose a test, parametric or non-parametric (Again parametric for ratio and interval data, non- applied to other sets of data.
parametric for nominal or ordinal data)
Multiple regression, factor analyses, discriminant function analysis
YOU NEVER USE THE WORD "ACCEPT" TO DESCRIBE HYPOTHESES BECAUSE THERE IS THE POTENTIAL FOR TYPE 1 OR TYPE
Chi-square: tests to see if observed frequencies of events in certain categories fall within the range 2 ERRORS. REFER TO THE SAMPLE PAPERS FOR EXAMPLES.
of frequencies expected to fall there.
Can compare two groups or same group pre and post test
Other tests include the Mann-Whitney U test, Kruskal-Wallis One-Way analysis of variance, and
Friedman Two-Way analysis of variance by ranks