20231005-RESEACH METHODS-Ch1

IBM
Dünya 524
Seni Bekliyor
ÜNİVERSİTE SIRALAMALARI
Academic Research
UNIVERSITY Methods
RANKINGS and Ethics
Looking at Data
• INSTRUCTOR : Prof. Dr. Ufuk Türen
• E-MAIL : ufuk.turen@ostimteknik.edu.tr
• SHEDULE : Friday - 18.00-20.50
• PLACE : Class no - 424
3/53
Course Objective
• The main purpose of this course is to examine the research process (problem identification, data
collection, data analysis and interpretation of results), to review certain scientific research methods
(experimental method, descriptive method, historical method, etc.) literature research, collecting data,
evaluating data and writing reports is to enable them to learn practically. Statistics and software
packages (SPSS 25.0) required for data evaluation and report writing will also be used during the
course.
• This course covers the structure of science and scientific research, scientific methods and different
views on these methods, problem, research model, universe and sample, data collection and data
collection methods (quantitative and qualitative data collection techniques), data recording, analysis,
interpretation and reporting. It includes the explanation of research and writing techniques
accompanied by basic concepts related to social sciences and social sciences. This course will also
discuss ethical considerations related to conducting scientific research and reporting.
4/53
Course Content
WEEK 1 Introduction WEEK 9 Introduction to SPSS
WEEK 2 Introduction to Scientific Research WEEK 10

Methods SPSS-I
WEEK 3
WEEK 11
WEEK 4 Qualitative Research Methods
WEEK 12 SPSS-II
Hypothesis Development
WEEK 5
Questionnaire Design
WEEK 13 Sample Anaysis
The Concept of Measurement,
WEEK 6 Attitude Measurement and Attitude Ethics in Scientific Research
WEEK 14
Scales Research Report Preparation
WEEK 7 Sampling Fundamentals WEEK 15 In class presentations
WEEK 8 Midterm Exam WEEK 16 Final Exam

5/53
Grading Homeworks
• Midterm Exam (30%)

Problem based
• Final Exam (50%)
homeworks – Each week
• Homework (20%)
6/53
Clinical Data Example
• 1. Kline et al. (2002)
– The researchers analyzed data from 934 emergency room patients

with suspected pulmonary embolism (PE). Only about 1 in 5 actually
had PE. The researchers wanted to know what clinical factors
predicted PE.
– I will use four variables from their dataset today:

• Pulmonary embolism (yes/no)
• Age (years)
• Shock index = heart rate/systolic BP
• Shock index categories = take shock index and divide it into 10 groups (lowest
to highest shock index)
7/53
Descriptive Statistics
8/43
Types of Variables: Overview
Categorical Quantitative
Binary Nominal Ordinal Discrete Continuous

2 categories +
more categories +
order matters +
numerical +
uninterrupted
9/43
Categorical Variables
Also known as “qualitative.”
Categories.
• Treatment groups
• Exposure groups
• Disease status
10/43
• Dichotomous (binary) – two levels
• Dead/alive
• Treatment/placebo
• Disease/no disease
• Exposed/Unexposed
• Heads/Tails
• Pulmonary Embolism (yes/no)
• Male/female
11/43
• Nominal variables – Named categories Order

doesn’t matter!
• The blood type of a patient (O, A, B, AB)

• Marital status
• Occupation
12/43
• Ordinal variable – Ordered categories. Order matters!
• Staging in breast cancer as I, II, III, or IV

• Birth order—1st, 2nd, 3rd, etc.
• Letter grades (A, B, C, D, F)
• Ratings on a scale from 1-5
• Ratings on: always; usually; many times; once in a while; almost never; never
• Age in categories (10-20, 20-30, etc.)
• Shock index categories (Kline et al.)
13/43
Quantitative Variables
• Numerical variables; may be arithmetically
manipulated.
– Counts
– Time
– Age
– Height
14/43
• Discrete Numbers – a limited set of distinct values, such as
whole numbers.
• Number of new AIDS cases in CA in a year (counts)

• Years of school completed
• The number of children in the family (cannot have a half a child!)
• The number of deaths in a defined time period (cannot have a partial death!)
• Roll of a die
15/43
• Continuous Variables - Can take on any number within a
defined range.
• Time-to-event (survival time)

• Age
• Blood pressure
• Serum insulin
• Speed of a car
• Income
• Shock index (Kline et al.)
16/43
Looking at Data
• How are the data distributed?
– Where is the center?

– What is the range?
– What’s the shape of the distribution (e.g., Gaussian, binomial,
exponential, skewed)?
• Are there “outliers”?
• Are there data points that don’t make sense?
17/43
The first rule of statistics:
USE COMMON SENSE!
90% of the information is contained in the

graph.
18/43
Frequency Plots (univariate)
Categorical variables
– Bar Chart
Continuous variables
– Box Plot
– Histogram
19/43
Bar Chart
• Used for categorical variables to show frequency or

proportion in each category.
• Translate the data from frequency tables into a

pictorial representation.
20/43
Bar Chart: categorical
variables
NO
YES
21/43
Bar Chart for SI categories
200.0
183.3
Number of Patients 166.7
150.0
133.3
116.7
100.0 Much easier to
83.3 extract information
66.7 from a bar chart
50.0 than from a table!
33.3
16.7
0.0
1 2 3 4 5 6 7 8 9 10
Shock Index Category
22/43
Box plot and histograms: for
continuous variables
To show the distribution (shape, center, range,

variation) of continuous variables.
23/43
Box Plot: Shock Index
2.0
Shock Index Units
maximum (1.7)
Outliers
1.3
Q3 + 1.5IQR =
.8+1.5(.25)=1.175
“whisker”
75th percentile (0.8)
0.7 interquartile range median (.66)
(IQR) = .8-.55 = .25 25th percentile (0.55)
minimum (or Q1-

1.5IQR)
0.0
SI 24/43
Histogram of SI
25.0
16.7
Percent
8.3
0.0
0.0 0.7 1.3 2.0
SI
25/43
Histogram
6.0 100 bins (too much detail)
4.0
Percent
2.0
0.0
0.0 0.7 1.3 2.0
SI 26/43
Histogram
200.0
2 bins (too little detail)
133.3
Percent
66.7
0.0
0.0 0.7 1.3 2.0
SI
27/43
Box Plot: Shock Index
2.0
Shock Index Units

1.3
0.7
0.0
SI
28/43
Box Plot: Age
100.0
maximum
More symmetric
66.7 75th percentile
interquartile range
Years
median
25th percentile
33.3
minimum
0.0
AGE
Variables 29/43
Histogram: Age
14.0
9.3
Percent
4.7
0.0
0.0 33.3 66.7 100.0
AGE (Years) 30/43
Some histograms from your class
(n=24)
Starting with politics.
31/43
32/43
33/43
Feelings about math and writing
34/43
Optimism
35/43
Diet
36/43
Habits
37/43
Measures of central tendency
• Mean
• Median
• Mode
38/43
Central Tendency
• Mean – the average; the balancing point
calculation: the sum of values divided by the sample size
In math ∑x X1 + X 2 +  + X n
shorthand: i =1
X= =
n n
39/43
Mean: example
Some data:
Age of participants: 17 19 21 22 23 23 23 38
∑X i
17 + 19 + 21 + 22 + 23 + 23 + 23 + 38
i =1
X= = = 23.25
n 8
40/43
Mean of age in Kline’s data
Means Section of AGE
Geometric Harmonic
Parameter Mean Median Mean Mean Sum Mode
Value 50.19334 49 46.66865 43.00606 46730 49
556.9546
14.0
Percent 9.3
4.7
0.0
0.0 33.3 66.7 100.0
Mean of age in Kline’s data
14.0
9.3
Percent
4.7
0.0
0.0 33.3 66.7 100.0
The balancing point
42/43
Mean
• The mean is affected by extreme values
(outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
Central Tendency
• Median – the exact middle value
Calculation:
• If there are an odd number of observations, find the middle value
• If there are an even number of observations, find the middle two
values and average them.
Median: example
Some data:
Median = (22+23)/2 = 22.5

Median of age in Kline’s data
GeometricHarmonic
Value 50.19334 49 46.66865 43.00606 46730 49
14.0
Percent
9.3
4.7
0.0 33.3 66.7 100.0

AGE (Years)
Median of age in Kline’s data
14.0
50% 50%
of mass of mass
9.3
Percent
4.7
0.0
0.0 33.3 66.7 100.0
Does PE have a median?
• Yes, if you line up the 0’s and 1’s, the middle number is 0.
Median
• The median is not affected by extreme
values (outliers).
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Central Tendency
• Mode – the value that occurs most frequently
Mode: example
Some data:
Mode = 23 (occurs 3 times)

Mode of age in Kline’s data

GeometricHarmonic
Value 50.19334 49 46.66865 43.00606 46730 49

Mode of PE?
• 0 appears more than 1, so 0 is the mode.
Measures of Variation/Dispersion
• Range
• Percentiles/quartiles
• Interquartile range
• Standard deviation/Variance
Range
• Difference between the largest and the

smallest observations.
Range of age: 94 years-15 years = 79 years
14.0
9.3
Percent
4.7
0.0
0.0 33.3 66.7 100.0
AGE (Years)
Range of PE?
• 1-0 = 1
Quartiles
25% 25% 25% 25%
Q Q Q
1 2 3
◼ The first quartile, Q1, is the value for which
25% of the observations are smaller and 75%
are larger
◼ Q2 is the same as the median (50% are
smaller, 50% are larger)
◼ Only 25% of the observations are greater than
the third quartile
Interquartile Range
• Interquartile range = 3rd quartile – 1st

quartile = Q3 – Q1
Interquartile Range: age
Median
Q1 (Q2) Q3 maximum
minimum
25% 25% 25% 25%
15 35 49 65 94
Interquartile range
= 65 – 35 = 30
Variance
• Average (roughly) of squared deviations of values from
the mean
 (x − X )
i
2
S =
2 i
n −1
Why squared deviations?
• Adding deviations will yield a sum of 0.
• Absolute values are tricky!
• Squares eliminate the negatives.
• Result:
– Increasing contribution to the variance as you go farther from
the mean.
Standard Deviation
• Most commonly used measure of variation

• Shows variation about the mean
• Has the same units as the original data
n
 (x − X )
i
2
S= i
n −1
Calculation Example:
Sample Standard Deviation
Age data (n=8) : 17 19 21 22 23 23 23 38
n=8 Mean = X = 23.25
(17 − 23.25) 2 + (19 − 23.25) 2 +  + (38 − 23.25) 2

S=
8 −1
280
= = 6.3
7
Std. dev is a measure of the
14.0 “average” scatter around the mean.
Estimation method: if the

distribution is bell shaped, the
9.3 range is around 6 SD, so here
rough guess for SD is 79/6 =
Percent
13
4.7
0.0
0.0 33.3 66.7 100.0
AGE (Years)
Std. Deviation age
Variation Section of AGE

Standard
Parameter Variance Deviation
Value 333.1884 18.25345
250.0 Std Dev of Shock Index
Std. dev is a measure of the

187.5
Count “average” scatter around the mean.
Estimation method: if the

125.0 distribution is bell shaped, the
range is around 6 SD, so here
rough guess for SD is 1.4/6
=.23
62.5
0.0
0.0 0.5 1.0 1.5 2.0
SI
Std. Deviation SI
Variation Section of SI
Parameter Variance Standard Deviation Std Error of Mean Interquartile Range
Value 4.155749E-02 0.2038566 6.681129E-03 0.2460432

1.430856
Std. Dev of binary variable, PE
181 * (1 − .1944) 2 + 750 * (0 − .1944) 2
S=
931 − 1
145.8 Std. dev is a measure of the
= = .3959
“average” scatter around the mean.
930
80.56%
19.44%
Std. Deviation PE
Variation Section of PE
Standard
Parameter Variance Deviation
Value 0.156786 0.3959621

Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.570
◼ SSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Bienaymé-Chebyshev Rule
• Regardless of how the data are distributed,
a certain percentage of values must fall
within K standard deviations from the mean:
Note use of  (sigma) to represent

Note use of  (mu) to “standard deviation.”
represent “mean”.
At least within
(1 - 1/12) = 0% …….….. k=1 (μ ± 1σ)

(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ………....k=3 (μ ± 3σ)
Symbol Clarification
• S = Sample standard deviation (example of a “sample
statistic”)
•  = Standard deviation of the entire population (example
of a “population parameter”) or from a theoretical
probability distribution
• X = Sample mean
• µ = Population or theoretical mean
**The beauty of the normal curve:
No matter what  and  are, the area between - and

+ is about 68%; the area between -2 and +2 is
about 95%; and the area between -3 and +3 is
about 99.7%. Almost all values fall within 3 standard
deviations.
68-95-99.7 Rule
68% of the
data
95% of the data
99.7% of the data

Summary of Symbols
• S2= Sample variance
• S = Sample standard dev
• 2 = Population (true or theoretical) variance
•  = Population standard dev.
• X = Sample mean
• µ = Population mean
• IQR = interquartile range (middle 50%)
What’s wrong with this
graph?
from: ER Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut,
1983, p.69
Notice the X-axis
Correctly scaled X-axis…
Report of the Presidential Commission on the Space Shuttle Challenger Accident, 1986
(vol 1, p. 145)
The graph excludes the observations where no O-rings failed.
Smooth curve at least shows the trend toward failure at high and low temperatures…
◼ http://www.math.yorku.ca/SCS/Gallery/
Even better: graph all the data (including non-failures) using a logistic
regression model
Tappin, L. (1994). "Analyzing data relating to the Challenger disaster". Mathematics Teacher, 87, 423-426
What’s wrong with
this graph?
from: ER Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut,
1983, p.74
What’s the message here?
Diagraphics II, 1994

Diagraphics II, 1994
From: Johnson
R. Just the
Essentials of
Statistics.
Duxbury Press,
1995.
From:
Johnson R.
Just the
Essentials of
Statistics.
Duxbury
Press, 1995.
From: Johnson
R. Just the
Essentials of
Statistics.
Duxbury Press,
1995.
From: Johnson R.
Just the Essentials
of Statistics.
Duxbury Press,
1995.
For more examples…
• http://www.math.yorku.ca/SCS/Gallery/
“Lying” with statistics
• More accurately, misleading with statistics…
Example 1: projected statistics
Lifetime risk of melanoma:
1935: 1/1500
1960: 1/600
1985: 1/150
2000: 1/74
2006: 1/60
http://www.melanoma.org/mrf_facts.pdf
• How do you think these statistics are calculated?
• How do we know what the lifetime risk of a person born in 2006

will be?
Interestingly, a clever clinical researcher
recently went back and calculated (using
SEER data) the actual lifetime risk (or risk up
to 70 years) of melanoma for a person born in
1935.
The answer?
Closer to 1/150 (one order of magnitude off)
(Martin Weinstock of Brown University, AAD conference 2006)

Example 2: propagation of statistics
• In many papers and reviews of eating

disorders in women athletes, authors cite
the statistic that 15 to 62% of female
athletes have disordered eating.
• I’ve found that this statistic is attributed to
about 50 different sources in the literature
and cited all over the place with or without
citations...
For example…
• In a recent review (Hobart and Smucker, The Female Athlete
Triad, American Family Physician, 2000):
• “Although the exact prevalence of the female athlete triad is

unknown, studies have reported disordered eating behavior in 15
to 62 percent of female college athletes.”
• No citations given.
And…
• Fact Sheet on eating disorders:
• “Among female athletes, the prevalence of eating

disorders is reported to be between 15% and
62%.”
Citation given: Costin, Carolyn. (1999) The Eating
Disorder Source Book: A comprehensive guide to the
causes, treatment, and prevention of eating disorders. 2nd
edition. Lowell House: Los Angeles.
And…
• From a Fact Sheet on disordered eating from a college
website:
• “Eating disorders are significantly higher (15 to 62

percent) in the athletic population than the general
population.”
• No citation given.
And…
• “Studies report between 15% and 62% of college
women engage in problematic weight control behaviors
(Berry & Howe, 2000).” (in The Sport Journal, 2004)
• Citation: Berry, T.R. & Howe, B.L. (2000, Sept). Risk

factors for disordered eating in female university
athletes. Journal of Sport Behavior, 23(3), 207-219.
And…
• 1999 NY Times article
• “But informal surveys suggest that 15 percent to 62

percent of female athletes are affected by disordered
behavior that ranges from a preoccupation with losing
weight to anorexia or bulimia.”
And
• “It has been estimated that the prevalence of disordered
eating in female athletes ranges from 15% to 62%.” (in
Journal of General Internal Medicine 15 (8), 577-590.)
• Citations:
Steen SN. The competitive athlete. In: Rickert VI, ed.
Adolescent Nutrition: Assessment and Management. New
York, NY: Chapman and Hall; 1996:223 47.
Tofler IR, Stryer BK, Micheli LJ. Physical and emotional
problems of elite female gymnasts. N Engl J Med.
1996;335:281 3.
Where did the statistics come
from?
• The 15%: Dummer GM, Rosen LW, Heusner WW, Roberts PJ, and Counsilman
JE. Pathogenic weight-control behaviors of young competitive swimmers.
Physician Sportsmed 1987; 15: 75-84.
• The “to”: Rosen LW, McKeag DB, O’Hough D, Curley VC. Pathogenic weight-
control behaviors in female athletes. Physician Sportsmed. 1986; 14: 79-86.
• The 62%:Rosen LW, Hough DO. Pathogenic weight-control behaviors of female

college gymnasts. Physician Sportsmed 1988; 16:140-146.
from?
• Study design? Control group?
– Cross-sectional survey (all)
– No non-athlete control groups
• Population/sample size?
– Convenience samples
– Rosen et al. 1986: 182 varsity athletes from two midwestern universities
(basketball, field hockey, golf, running, swimming, gymnastics, volleyball,
etc.)
– Dummer et al. 1987: 486 9-18 year old swimmers at a swim camp
– Rosen et al. 1988: 42 college gymnasts from 5 teams at an athletic
conference
from?
• Measurement?
– Instrument: Michigan State University Weight Control Survey
– Disordered eating = at least one pathogenic weight control behavior:
• Self-induced vomiting
• fasting
• Laxatives
• Diet pills
• Diuretics
• In the 1986 survey, they required use 1/month; in the 1988 survey, they required use
twice-weekly
• In the 1988 survey, they added fluid restriction
from?
• Findings?
– Rosen et al. 1986: 32% used at least one “pathogenic weight-
control behavior” (ranges: 8% of 13 basketball players to 73.7%
of 19 gymnasts)
– Dummer et al. 1987: 15.4% of swimmers used at least one of
these behaviors
– Rosen et al. 1988: 62% of gymnasts used at least one of these
behaviors
References
• http://www.math.yorku.ca/SCS/Gallery/
• Kline et al. Annals of Emergency Medicine 2002; 39: 144-152.
• Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
• Tappin, L. (1994). "Analyzing data relating to the Challenger disaster".
Mathematics Teacher, 87, 423-426
• Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire,
Connecticut, 1983.
• Visual Revelations: Graphical Tales of Fate and Deception from Napoleon
Bonaparte to Ross Perot Wainer, H. 1997.
Mean of Pulmonary Embolism? (Binary
variable?)
n
X
i =1
i
181 * 1 + 750 * 0 181
X= = = = .1944
n 931
Histogram 931
100.0
80.56%
(750)
66.7
Percent
33.3
19.44% (181)
0.0
0.0 0.3 0.7 1.0
PE
ÜÇÜNCÜ NESİL, YENİLİKÇİ VE GİRİŞİMCİ
ÜNİVERSİTE MODELİ
www.ostimteknik.edu.tr

20231005-RESEACH METHODS-Ch1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

20231005-RESEACH METHODS-Ch1

Uploaded by

Copyright:

Available Formats

IBM

• INSTRUCTOR : Prof. Dr. Ufuk Türen

• SHEDULE : Friday - 18.00-20.50

• PLACE : Class no - 424

WEEK 2 Introduction to Scientific Research WEEK 10

WEEK 7 Sampling Fundamentals WEEK 15 In class presentations

WEEK 8 Midterm Exam WEEK 16 Final Exam

• Midterm Exam (30%)

• 1. Kline et al. (2002)

– The researchers analyzed data from 934 emergency room patients

– I will use four variables from their dataset today:

Binary Nominal Ordinal Discrete Continuous

• Nominal variables – Named categories Order

• The blood type of a patient (O, A, B, AB)

• Staging in breast cancer as I, II, III, or IV

• Number of new AIDS cases in CA in a year (counts)

• Time-to-event (survival time)

– Where is the center?

• Are there “outliers”?

• Are there data points that don’t make sense?

90% of the information is contained in the

• Used for categorical variables to show frequency or

• Translate the data from frequency tables into a

To show the distribution (shape, center, range,

minimum (or Q1-

Shock Index Units

66.7 75th percentile

calculation: the sum of values divided by the sample size

Parameter Mean Median Mean Mean Sum Mode

Value 50.19334 49 46.66865 43.00606 46730 49

Median = (22+23)/2 = 22.5

Value 50.19334 49 46.66865 43.00606 46730 49

0.0 33.3 66.7 100.0

Mode = 23 (occurs 3 times)

Means Section of AGE

Parameter Mean Median Mean Mean Sum Mode

Value 50.19334 49 46.66865 43.00606 46730 49

• Difference between the largest and the

• Interquartile range = 3rd quartile – 1st

• Most commonly used measure of variation

(17 − 23.25) 2 + (19 − 23.25) 2 +  + (38 − 23.25) 2

Estimation method: if the

Variation Section of AGE

Std. dev is a measure of the

Estimation method: if the

Parameter Variance Standard Deviation Std Error of Mean Interquartile Range

Value 4.155749E-02 0.2038566 6.681129E-03 0.2460432

Value 0.156786 0.3959621

Note use of  (sigma) to represent

(1 - 1/12) = 0% …….….. k=1 (μ ± 1σ)

No matter what  and  are, the area between - and

95% of the data

99.7% of the data

Diagraphics II, 1994

• How do we know what the lifetime risk of a person born in 2006

(Martin Weinstock of Brown University, AAD conference 2006)

• In many papers and reviews of eating

• “Although the exact prevalence of the female athlete triad is

• “Among female athletes, the prevalence of eating

• “Eating disorders are significantly higher (15 to 62

• Citation: Berry, T.R. & Howe, B.L. (2000, Sept). Risk

• “But informal surveys suggest that 15 percent to 62

• The 62%:Rosen LW, Hough DO. Pathogenic weight-control behaviors of female