MCRS Partial Exam 1 - Notes

1A
Advertisement
 Does it work?
 What are the effects?
 For whom does it work?
 Different effects for different people
 Why does it work (or not)?
What scientific research is
*a systematic process of gathering theoretical knowledge through observation
Observation= empiricism
 Empirical
- based on social reality
 Systematic and cumulative
-Builds on previous research
-In search for patterns and associations
 Public
- verifiable
- open for criticism
- results are always preliminary
 Objective
- findings are not personal/subjective
-Uniform rules (comparable results)
 Predictive
- results in predictions
Communication research
Communication scientistc study an aspect of communication
 Social media
 Organizational/group
 Interpersonal
 Rhetoric or persuasion
 Communication technology

Corporate persuasive entertainment political communication

Reasons for research
Theory -> knowledge gap -> fundamental research (purpose of fundamental (or basic or
pure) research -> contribute to science

Assignment - problem of practice

Applied research - purpose of applied research acquire knowledge to solve a practical
problem

Research is a systematic process of
 Posing questions
 Answering questions
 Demonstration that your results are valid
 Sharing your research results

Communication research is a systematic process of asking and answering questions about
human communication

Research strategies
 Quantitative -measuring with numbers
-measurements
-numbers
- testing theory
 Qualitative - measuring with words
- no measurements
-words
-generating theory

Research design
 Experimental - used to look into effects
 Cross-sectional - used to look into association correlation
 Longitudinal - association/relationship/correlation

Data collection methods
 Observation - during experiment in real life
 Posing questions
- survey,
- in-depth interviews ,
- focus groups interviews
 Content analysis
- content of existing sources (text, images)

2A
Conducting research

World view I- human communication is objectively measurable and can be summarized in
rules - nomothetic approach

World view II- human communication is subjective, individualistic and must be described as
such - idiographic approach

ONTOLOGY - what is out there ?
what is real? How can we view them?
Can we compare social behavior and non-material things with physical objects? Does the
study process is similar?
 Assumptions based on ontology
 Objectivism
the underlying reality (attitude) has the characteristics of an objects (objective reality).
Generalization.
 Constructionism
social entities such as attitude are considered social constructions, not objects.
Researchers prefer specific versions of social reality rather than one that can be
regarded as definite.

EPISTEMOLOGY- how should we know that ?
questions of how communication should be understood, What is knowledge, what counts as
knowledge ?
 Assumptions based on epistemology
 Positivism - generalizing
- application of methods of natural sciences for the study of social reality
- knowledge is what can be perceived by the senses
- knowledge is arrived at through the gathering of facts that provide the basics for
rules and laws
- Theory should lead to hypothesis
 Interpretivism
- there is a difference between people and the objects of the natural sciences.
Researchers should focus on the subjective meaning of social action

Two scientific approaches
 Empirical-analytical
- objectivism
-positivism
- observe, measure from researcher's perspective
- empiricism rather than logic
- Explaining- from researchers perspective
- rule out alternative explanations
- nomothetic approach (worldview I)
- quantitative and experiment, survey, content analysis

 Empirical-Interpretive (less often)
-constructionism
-interpretivism
-observating, interpret from participants' perspectives
- starting without theory, but observations
- idiographic (worldview II)
- qualitative, individual interviews, focus groups, ethnography

Empirical cycle
Induction- empirical-interpretive approach
Deduction- empirical-analytical approach

Scientific method:
Hypothesis should be
 Empirically testable - systematic, collect empirical data, research plan
 Replicable
 Objective
 Transparent
 Logically consistent
- don’t change interpretation/hypothesis after data collection
- be consistent in what confirms and disconfirms your hypothesis
 Falsifiable - a statement that is not falsifiable usually needs some sort of exhaustive
search of all possibilities to disprove it
- verification - confirm and interpretative
- falsification - refute and analytic
- provisional truth - I found the support for my hypothesis but not ACCEPTED it fully
and forever

Problem definition and hypotheses

Problem definition - observation and induction stages
Based on the research objective and research question
 Research objective (the aim of the study)
- Indicates the scientific and/or societal impact of study
- indicates the goal of the study (exploration, explanation, prediction, control,
description, interpretation, criticism )
- fundamental (about science with literature) or applied research (practice) - not
mutually exclusive they require each other

Formulating research objectives
- complete
- not vague
- clear (unambiguous)
- not too broad

- Societal relevance (more complete understanding is required, penetrating social life)
- scientific relevance (little is known, results will help us to understand it)

Formulation research questions (what does the researches want to know ), starting point
for the research
- clear
- researchable
 Connection with established research and theory
 Linked to other RQs
 Original contribution unless replication study
 Not too broad/too narrow
 Always end with ?
 Should fit the research objectives
 May not contain unsubstantiated or incorrect assumptions
 Is not the same as a survey question
 Clear not vague

Open-ended (is there a relationship between variables?) or closed-ended (what is the
direction of the relationship between variables? Decline/growth)

RQs
 Descriptive (how)
 Explanatory (why?) - 2 or more variables
 Predictive - 2 or more variables
 Relational (to what extend /are variables related to each other - we do not know what
is the cause
 Causal ( is one variable affects/influences another )

Hypothesis
 Two-tailed - there is a relationship (non-directional)
 One-tailed - there is a direction - one growths and another declines etc.
 Null- there is no relationship - about statistics no affect in study

Sometimes hypotheses are non-testable (definitions, speculative/normative(should)
statements or statements without specific definitions of place and time)
3A
Conceptual model - schematic overview of my hypothesis (theoretical assumptions)
 Includes main concepts(variables)
 Independent (predictor) and dependent (outcome) variable
 Indicate relationships with arrows - used to indicate relationships ((a)symmetrical)
 Indicate the type of the relationship - positive/negative

POSITIVE RELATIONSHIP
 More of one, more of another and vice versa

NEGATIVE RELATIONSHIP
 More of one, less of the other and vice versa

asymmetrical
 Casual or predictive : asymmetrical
 One variable affects or predicts the other variable
 Independent variable ------------------------> dependent one
 It does not work vice versa

Symmetrical
 Relationship between two variables , we don't know where it starts
 Vice versa <-------------> (the less you practice, the more afraid you are of the exam)

How to draft the conceptual model ?
 Identify variables and state their nature ( predictor and outcome)
 Identify the relationships between them (positive/negative,
symmetrical/asymmetrical)

Building blocks of hypothesis
 Units of analysis / cases
 All subjects or objects that are mentioned
 E.g., people, organisations, newspapers, countries...
 Variables
 Characteristics of the units of analysis
 E.g., age, sex, level of education, aggression
 Variables vary (units of analysis vary on these variables)
 Values
 Possible categories per characteristic (variable)
 0‐99 years, male/female, low/high, elementary school/university
 You observe these categories

Operationalization- making my concept measurable
1. Come up with measurable definition of a concept(construct/variable)
2. Choose indicators
3. Develop questions/statements (items)
4. Assign measurement scales

Why is it important?
 I want my research to be replicable
 I want to be able to compare it with results from different studies

Definition - is measurable, it's a description of a concept in MY research that is as accurate
as possible
 You can't use the concept itself in the description
 You can't use synonyms of the concept in the description
 You can't include concepts that need their own definition to be understood correctly
(unless you provide it)
 It should contain all info needed (completeness)
 It should exclude what is not a part of the concept
 The definition should not have a normative character

Measurement scale
 Likert answer scale
- 5/7/10-point
- completely disagree- completely agree
- neutral option possible
- several Likert items (indicators) that intend to measure the same concept - Likert
SCALE

 Semantic differential scale
- semantic= word, differential = distinction
- different words that define/describe the concept/variable
- several semantic differential items together form a SEMANTIC DIFFERENTIAL SCALE

Manifest variables
 Directly measurable
 E.g. Sex, age, education level

Latent variables
 Not directly measurable
 E.g. trust in media, gender (complex), attitude towards something, opinions
 Concept/variable often measure by means of several aspects (questions)

What can go wrong?
 You don’t measure what you define
 You measure what you define but not in a consistent way

Validity
 Degree to which our measure is free from systematic error (bias)
E.g. not measuring what we intended to measure, socially desirable answers, respondents
know the goal of the study, respondent wants to meet expectations of researchers

Reliability
 Degree to which our measure is free from random error (noise)
E.g. not measuring something in a consistent way, as many negative as positive errors (they
cancel each other out)

Summary

4A
Measurement validity and reliability

 Are you measuring what you intend to measure? Are you reliably measuring
something?
Is your measure consistent?

Measurement validity - degree to which our measure is free from systematic error
(bias)
 Measurements must measure what they are supposed to measure (they must
measure the definition as stated in research project)
 Affecting all scores on a test

Reliability of measurement - degree to which our measure is free from random error
(noise)
 Measures should produce the same results each time. Reliability is the degree to
which a measurement (scale) produces stable and consistent results.
 Affecting one score on a test (not the whole group but individual)
 Adds variability to data, does not affect average performance
 Always use one answer scale!

Research validity and reliability
 How valid is your study? How valid are your results? How reliable are your
results?

Validity of your research project
 Internal validity
 Causality- can what we observe/conclude indeed be attributed to our
explanatory factor?
-> To show that A caused B we must show:
 Time order (A must proceed B)
 Meaningful co-variance (B must change as A changes)
 Non-spuriousness (nothing except A can change B)
 Research design
 Experimental (looks at the effect/casual)
 Cross sectional and longitudinal research (looks at associations and correlations)
Valid conclusions?
 Alternative explanations
 Unobserved variable (third variable problem)
 In any correlation, causality between two variables cannot be associated because
there may be other measured or unmeasured variable affecting the results -
SPURIOUS RELATIONSHIP
 Wrong measure/design
 External validity
- Can what we observe in our research context be generalized to a wider context
(population)?
Population validity -is our sample representative for our target population?
Ecological validity - Is the research context true to life? Applicable in other cultures,
circumstances?

MODERATION
Moderate relationship- referred to as an interaction or specification
o Moderator variables influence the strength of a relationship between two other
variables
o The relationship between two variables is different for different levels of the
moderator
o It is a third variable
How a third variables influences the relationship that we are interested in:
 No moderation- hypothesis : exposure to media violence positively affects
aggression
 Moderation- hypothesis: the positive influences of exposure to media violence
on aggression is stronger for females than for the males.( influenced by sex )

MODERATOR INFLUENCES the relationship between variables

MEDIATION
Mediated relationship-
o Mediator variable explains the relationship between the other two variables
o Mediator variable is influenced by the independent variable(A) and mediator variable
influences the depended variable (B).
o The mediator variable influences the dependent variable but not the independent
one!

How it works
 No mediation- hypothesis: exposure to media violence positively affects aggression
 Mediation- hypothesis: the positive influence of exposure to media violence on
aggression is explained by social norms. Media violence influences social norms and
social norms influence aggression.
 There is a direct relationship between the independent variable and dependent
variable ( C )
 The relationship is mediated by a third variable, indicating an indirect relationship
between the independent and dependent variable
 There is an asymmetrical relationship between the independent variable and the third
variable
 There is an asymmetrical relationship between the third variable and the dependent
variable
MEDIATOR EXPLAINS THE VARIABLES AND THEIR RELATIONSHIP

Partial mediation - the direct relationship between the independent variable and the
dependent variable becomes less strong when you add the mediator to the model

Full mediation - the direct relationship between the independent variable and the
dependent variable disappears when you add the mediator to the model.

The mediator variable influences the independent variable and dependent variable -
SPURIOUS RELATIONSHIP - there seems like there is a relationship between X and Y
but there is really not.

1B
Statistics- quantitative methods of analysis

Types of statistics
1. Descriptive statistics - techniques that are used to summarize a set of numbers
E.g. Mean, pie charts, percentage etc.
2. Inferential statistics - techniques that are used in making decisions based on data

The research process

Hypothesis- A statement about the relationship that we except to find between variables

Variables
 Operationalize variables
 Based on theory or previous studies
 Use these variables in your predictions
 Type of statistical analysis depends on type of variables you use

Predictor---> Outcome
Independent ---> Dependent
X---> Y

Asymmetric relationship
- when the proposed relationship is based on an independent (predictor) variable and a
dependent (outcome) variable

Symmetric Relationship
- when it is not clear which variable is independent or dependent e.g. is there a relation
between checking fb and understanding of a teacher

Types of analysis
 Univariate analysis - statements about only 1 variable - we measure only one thing
e.g. How many hours a day are students online on social media ?- HOURS/DAY only
variable
 Bivariate analysis - statements about 2 variables
E.g. Are students more hours a day online on social media compared to lecturers ?-
STUDENT/LECTURER and HOURS/ DAY
 Multivariable analysis - statements about more than 2 variables
e.g. Is the influence of drinking coffee on feeling cranky different for students than the
lecturers? - DRINKING COFEE (yes/no)
STUDENT/LECTURER, FEELING CRANKY (yes/ no)

Types of measurements
NOIR
 Nominal -lowest
- labels e.g. Germany is 2 and 20% of students are Dutch
-assign individuals to categories
-magnitude of number not meaningful
- calculations are not possible
 Ordinal
-labels again - how do you feel today 1-5 scale but what is the difference
-calculations not possible
- magnitude of values indicates order of events
-magnitude of differences between events not meaningful
 Interval
-magnitude of values indicates order of events
- magnitude of differences between events is meaningful
- addition and subtraction is possible
- No meaningful zero point - you cannot multiply/divide scores
e.g. the difference between c and f degrees
 Ratio - highest
- magnitude of values indicates orders of events
- magnitude of differences between evets is meaningful
- addition and subtraction is possible
- meaningful zero point - you can multiply and divide

SUMMARY
Level of measurement determined by operationalization of variables
Choose level as high as possible = more types of statistical analyses
Sometimes fixed level e.g. sex or eye color
Interval and ratio data are numerical
Nominal and ordinal data are not numerical
Measures of central tendency
MMM
 Mode
the most frequent score
nominal level of measurement
can be used for every level of measurement but for nominal only mode
E.g. THE MODE IS 3 ! Although she tested dragon fruit 4 times the assigned value is 3
1. Acai- 2
2. Pineapple - 3
3. Dragon fruit - 4
4. Coconut - 3
 Median
the middle score when scores are ranked - ordered
If the number of scores is odd you get the middle one
If the number of scores in even you have to count the average of two in the middle
 Mean
The sum of the scores divided by the number of scores
only used with interval
Can be easily affected by extreme scores
IS NOT INFORMATIVE WITH NOMINAL LEVEL VARIBALES

Best measures of central tendency for specific types of measurements

 Nominal - mode
 Ordinal - mode MEDIAN
 Interval - mode, median MEAN
 Ration - mode, median , MEAN
SUMMARY
The difference between descriptive and inferential statistics
The difference between independent/predictor variables and dependent/outcome variables
The difference between symmetrical and asymmetrical relationships
About different levels of measurements
The nominal and ordinal level variables are not numerical and cannot be used in calculations
The difference between univariate, bivariate and multivariate analysis
How to calculate and interpret 3 measures of central tendency

I need to know about frequency tables graphs- pie, bar, histogram and the data matrix

2B
Median is not informative with nominal variables
Best measures of central tendency for specific types of measurements
 Nominal - mode
 Ordinal - mode MEDIAN
 Interval - mode, median MEAN
 Ration - mode, median , MEAN

The dispersion in a distribution
Dispersion for different types of variables
 At least ordinal - range, IQR
 At least interval - deviance, variance, SD
 Nominal - no dispersion

RANGE - the smallest score subtracted from the largest
The age ranges from q0 to q4
Affected by extreme scores/ outliers

IQR - interquartile range - in ordinal data represents variability
 The three values that split the stored data into 4 equal parts
 Put data in order, split in half (Q2) and count the mean
 Second quartile Q2- median
 Lower Quartile - Q1- median of the lower half of the data
 Upper Quartile Q3- median of the upper half of the data
 A lot of data is loss but is not affected by extreme scores
IQR= Q3-Q1

Dispersion - Indicate the spread of scores , How different is each score from the center of
distribution
most often used measures of dispersion are
 Deviance
- size is dependent on the number of cores in the data
- Individual score minus mean
- Total deviance - calculate deviance for every person and add them up

- If there are negative signs, all the individual deviances have to be squared before
adding up
- Total sum of squares -> take the average

 Variance
- more useful to work with the average dispersion known as the variance
- SS divided by N-1 ( number of cases)

 Standard deviation (SD)
-The variance gives us a measure in units squared, SD is easier in interpretation
- is square root of variance

Always include average (mean) and SD

APA6 guidelines: M and SD in italics and round to 2 decimals (calculate with three decimals)

Z-scores - Comparing scores on exams from different courses , level of measurement at
least interval
Standardizing a score with respect to the other scores in the group
Expresses a score in terms of how many SDs it is away from mean
The distribution of z-scores has a mean of 0 and SD of 1
Take each score and subtract mean from It, then divide by SD
We can use it to determinate he chance of sth occurring

The number of SD that a variable is below or above the mean (positive or negative z-score)
If we have a negative z-score we flip the smaller and larger portion values from the
table(from the appendix)

1.96 cuts off the top2.5% of the distribution
-1.96 cuts off the bottom 2.5% of the distribution
As such, 95% of z-scores lie between -1.96 and 1.96.
99% of z-scores lie between -2.58 and 2.58
99.9% of z-scores lie between -3.29 and 3.29
See appendix A in Field and memorize the above and try to figure out why it is so

Standard normal distribution
 When a sample is large enough (>100), data are normally distributed
 Many data, like IQR, exam grades, age, income are normally distributed
 To visualize all the data in the same type of distribution, we standardize scores
- standard normal distribution
- we standardize with z-scores

In normal distribution the mean, mode and median are exactly in the middle. - bell shape
distribution

Normal distribution can be skewed to the left or right - then median is in the middle

3B
Degrees of freedom - n-1 in that many scores my results may vary

Inferential statistics
 Techniques that are used in making decisions based on data
 Enables you to draw inferences about your data
 You try to estimate what happens in a population based on data from sample
 We estimate, we are not confident - mostly to 95% confident (sometimes 99%)
Can we generalize our data to wider population?

Population - the collection of units to which we want to generalize a set of findings or a
statistical model
Sample- a smaller (hopefully representative) collection of units from a population used to
determinate truths about that population

Fit of the Model to the data
 The mean is a model of what happens in real world but it is never perfect (good fit or
poor fit)
 It's hypothetical value
 It’s a measure that we use to summarize our data

How can we assess how well the mean represents reality?

Inferences: from a Sample to a Population
 Mean and SD describe the sample
 Sample Mean and SD are used to estimate the mean and SD of the population
 The better our model fits the data the better sample statistics (𝑿) estimates the
population parameter (μ)
 Sample mean is the best guess of population mean (but we are never 100% certain)
 If we draw an infinite number of means from the samples, we get the population mean
-sampling distribution

Sample distribution- frequency distribution of sample data (from 1 sample)
Sampling distribution - frequency distributions of sample means (form many samples)
Approaches normality, means from all samples are normally distributed IF we sample
infinite times and do it with replacement.
Problem- it's not possible to draw infinite samples , typically it's just one sample THUS we do
not know the mean of samples
Then we use:
CENTRAL LIMIT THEOREM
 If the sample size is large enough (>30 at least but better >100)
 Our sample mean is the best estimate of the population mean (parameter)
 Sample mean = best estimate of population mean , SD informs us of fit of the sample
mean to the data we collected
 HOW GOOD IS THE FIT------> Calculate STANDARD ERROR (the SD of the sample means
in a sampling distribution)

Standard error - we use the sample statistics (mean) as a point estimate for the population
parameter (mean in the population)
 Tells us the size of the estimation error we are likely to make
 Difference between sample and population mean

Confidence
 Mean is a point estimate of population parameter
 How confident are we that this point estimate precisely reflects the population
parameter? ---->Calculate an interval around our point estimate CONFIDENCE
INTERVAL

Confidence interval - interval we think contains population mean, sample as estimate so
true value is unknown
Sample mean - starting point
Variation in samples indicated by standard error of the mean
Confidence interval = sample mean+/- (Z-score*SE)

Z-distribution = probability distribution
Frequency distribution : distribution of values that you measure (empirically observed)
Probability distribution : distribution of all possible outcomes by the likelihood of each one
occurring
The difference
- probability distribution based on expected values
-frequency distribution based on real/measured values
Z-score = critical value

Sampling distribution= normal distribution
95% confidence - to a certain (here 95%) degree we are certain that our interval will contain
the population mean
Z= +1.96 cuts off the top 2.5% of the distribution
Z= -1.96 cuts off the bottom 2.5% of the distribution
To get certain confidence level interval
CI = [sample mean] +/‐ (z‐score*SE)

 If CI small: sample mean close to population mean (true mean)
 If CI wide: bad representation of the population

99% of z-score lie between -2.58 and 2.58
The values below and above -1.96 and 1.96 are in the region of rejection- those are very
special and extraordinary values

Hypothesis testing and statistical significance.

Types of hypothesis:
 null hypothesis H0 - there is no effect/difference/association in the population
 The alternative hypothesis - H1 - there is a effect/difference/association in the
population
When we reject null hypothesis is give us some support for alternative hypothesis but does
not prove it in 100%!

Statistical significance
 A measure of how unlikely it is that an event occurred by chance
 Refers to probability
 It is the chance of rejecting the null hypothesis while in reality the hull hypothesis is
true (false positive)
 We want it to be small !
 Does not inform us about how practically relevant or important an outcome is

Significance NHST - null hypothesis significance testing
 How unlikely (or special) are the sample data assuming that null hypothesis is true?
 Significant result: REJECT the null hypothesis
 The chances of obtaining the data we've collected assuming that the null hypothesis
is true

P-value - we set alp h a(significance level) at 0.5
 The chance of making a mistake (rejecting null hypothesis while we should not) is 5%
or less
 We want the chance of obtaining the data we've collected assuming that the null
hypothesis is true to be 5% or less
 P< (or equal) 0.5= significant (reject H0)
 P>0.5 = not significant (retain H0)

Retain :
 If difference between the values based on sample is not statistically significant, we
can't assume they are different in population

Reject:
 If difference between values based on sample is statistically significant, we found
support for the alternative hypothesis.

P‐value = .032 (p ≤ .05, reject H0)
Interpretation:
It is very unlikely (3.2%) that you found this difference between means
if you assume that the null hypothesis (no difference between means)
is true

One-sample t-test
 You can see if the mean age in your sample is the same as mean age in the population
 We want to compare a sample mean to a known or hypothetical population mean
 does our sample mean differ significantly from a known or hypothesized value?
(meaning it does not differ due to random effects of change)

Steps for testing hypotheses
1. Specify the statistical hypotheses
2. Select the significance level of the test (0.05)
3. Execute the test
4. Decide on the null hypothesis
5. Estimate effect size
6. Report the test results

Degrees of freedom -> N-1 always
If we have more than 100 cases -> Critical value for α =.05 = 1.96 ---> Normal distribution

If t (test statistics) is below or above the critical values we have it in the region of rejection->
then it's very special (unlikely to occur if null hypothesis were true)and thus it is significant
at p <(or equal) 0.5
To calculate the CI we add the values to the test value ->
 Interval does not include the value of the null hypothesis:
The test is significant
 Interval includes the value of the null hypothesis:
The test is NOT significant

Effect size Cohen's d - is the difference between means practically relevant?

Cohen’s d effect sizes:
0.2 = small
0.5 = medium
0.8 = large
Some important points of attention
Check APA6 guidelines:
• Calculate with 3 decimals
• Round all results to 2 decimals
• Exception: p‐values
report the exact p‐value and round to 3 decimals (e.g., p = .231)
But…if your p‐value is smaller than .001, then report as: p<.001
• M, SD, p, t, d: italics
4B

Pearson's r
Correlation
 Way of measuring the extent to which two variables are related
 A measure of the degree of association among variables
 Indicates whether a variable changes in a predictable manner
 Examines whether as one variable increases, the other one increases, decreases or
stays the same
 Pearson Product-Moment Correlation: degree of association between two
interval/ratio variables
 Sign of the correlation indicates the direction of the relationship
o It always vary from -1 to 1, with 0 meaning there is no relationship between the
variables.
o -1 or +1 indicated a perfect correlation. Positive - both decrease or increase. Negative-
one increases and other decreases or vice versa.
o +/- .1=weak correlation. +/- .3= moderate relationship +/- .5=strong relationship
o Magnitude of correlation indicates the strength of the association

Correlation- Covariance
 When two variables covary: knowing how one variable changes helps you in predicting
how another variable changes.
When variables covary, they are related in some way: correlation among the variables.
Coefficient of determination r2
 By squaring the value of r you get the proportion of variance in one variable shared by
the other. The higher % the more variance is shared.
 % we get can be reported as the percentage of the variance in the extent to which one
likes ' the support act' is explained by the extend to which one likes the band (and vice
versa)

Correlation analysis
 Bivariate analysis- two variables that I measure
 Both measured on interval level
 Dependent/independent relationship?
 Statistical test : Pearman's coefficient
 Test statistics t

Steps for testing hypothesis
1. Specify the statistical hypothesis
2. Select the significance level of the test ( alpha at 5% -> looking for p < .05)
3. Execute the test (run the scatter plot and interpret it)
4. Decide on the null hypothesis
5. Estimate effect size
6. Repot the test results

Spearman's rho- non-parametric test used with nonlinear association.
 Curve linear - you cannot determine Pearson's correlation coefficient but the
spearman's rho.
 Used also with associations between one ordinal and one interval or ratio level interval
 Interpretation - the results revealed some association between RANKED variable A and
RANKED variable B
NOMINAL LEVEL VARIABLE CANNOT BE RANKED AND IT CANNOT BE USED IN THE SCATTER
PLOT OR ANY ASSOCIATION COERRELATION

Correlation and causality
Correlation does not imply causation - third variable problem
 The third variable problem
- in any correlation, causality between two variables cannot be assumed because there
may be other measured or unmeasured variables affecting the results
 Direction of causality
- Correlation coefficients say nothing about the direction of the association
- Symmetrical relationships

Simple regression analysis
i. Correlation= symmetrical relationship
ii. A correlations does not indicate how a prediction (asymmetrical) will be quantified
iii. Regression analysis - when we fit a linear model to our data and use it to predict
values of an outcome variable from one or more predictor variables
iv. One predictor variable: simple regression
v. More predictor variables : multiple regression

Simple regression
 A step beyond mere correlation
 Not cause and effect
 Predict Y from X (predict outcome variable from predictor variable)
 Model the asymmetrical relationship between 2 variables
 Linear model
 Relationship: equation for straight line (y=ax+b)

How to define a line : Intercept and slopes
 Where does the line cross the y-axis? = Intercept at X=0 the value of Y is the intercept
 Does the line go down or up (positive/negative)
 By which degree does it go up or down (SLOPE/GRADIENT)

Sum of Squared residuals-
 The regression line is only a model based on the data
 This model may not reflect reality
 Large SSr indicates poor fit: line not representative of data
 Small SSr indicates good fit: line representative of data
 We need some way of testing how well the model fits the observed data (SPSS)



Model fit R2- what is proportion of improvement?
 R2= amount of variance in the outcome explained by the model
 Coefficient of determination
 R2 = size of model fit how well does the line fit the data ?
 Only for simple regression:
 Pearson's r=square root of R2
 R2 between 0 and 1 (may be presented as a %)
 R2 = 0: no variance explained (useless model)
 R2 = 1: variance perfectly explained; model predicts Y(outcome variable) perfectly
 The variance in X explains the variance in Y for ….%.
Model fit: ANOVA

F: test statistics - you need to use the F value and whether it is significant or not (if it is
significant is a good fit to the data) (IN THE POPULATION !) if sig. is less then .05 (p <0.5 then
we reject the null hypothesis)
F-test indicates significance of whole regression model
Large F= good model, very special and falls in region of rejection

B value for constant and independent variable - UNSTANDARDIZED COEFFICIENT
 Prediction for every increase in the unit of the independent variable and how it
changes the dependent variable (here CONSTANT)
 B for constant is the intercept
 Allows us to understand the variance and actual variable we are looking at
(independent one)

T= test statistics= significant : b is significantly different from 0
 We can also predict for the population

Confidence interval : we are 95% certain that the confidence interval contains
unstandardized population regression coefficient ( SLOPE)- B value for independent variable

Standardized coefficient (beta b ) is the same as R - it tells us about the strength and
direction of the predictive effect/ association of the variables . Which aspects of our model
is affecting us more.

The regression model with smelliness of surf shirt as dependent variable and
number of days surfing in the past two weeks as independent variable was
significant, F(1, 154) = 69.50, p < .001. Number of days surfing in the past two
weeks has a significant association with smelliness of the shirt, b* = 0.56, t =
8.34, p < .001, 95% CI [0.26, 0.42] and explains 31.1% of the variance in
smelliness of the shirt (R2 = .31). For every unit increase in number of days
surfing, smelliness of shirt increases with 0.34 unit.
5B

Dichotomous predictors
 Can we include a dichotomous predictor in a regression analysis?
 Dichotomous=categorical
 Regression analysis= interval/ratio variable
 E.g. biological sex, experimental group
 If we assign the values 0 and 1 to the dichotomous variable, we can include it in a
regression model (treat it as a ratio variable with meaningful zero point - DUMMY
VARIABLE)

B0- also a or constant
 intercept ( value of Y when X=0)
 Point at which the regression line crosses y axis

B1 - regression coefficient
 slope/ gradient
 strength and direction

B- values for constant and independent variable - unstandardized coefficients
We cannot compare them because they are based on different units of measurements
They tell us the intercept (for constant) and the value of change if the independent
variable increases by one unit
R- strong predictive or weak model

R2- how much variance is explained

The t test - tells me if it significant and whether the b1 is different from 0.

Confidence interval -
If the confidence interval does not included 0 we are 95% sure it does not contain 0, so
the test is significant and the slope should not be with gradient 0.
Can tell me weather positive or negative (values include the B (unstandardized
coefficient).

Assess ‘influence’ of X1 while controlling for ‘influence’ of variables X2, X3, Xi…
Benefit :
 Better prediction of Y
 Examine unique predictors
 Compare predictors

Controlling for /holding constant
 Examine how the unique relationship between calorie intake and weight while
controlling for height

Null hypothesis for the multiple regression
 In the population general level of anxiety of earthquakes, number of previous
earthquakes and place of living , together do not explain the variance in fear of
earthquakes
Alternative hypothesis:
 At least one of the three perditions explains the variance in fear of earthquakes

Null hypothesis for predictors:
 In the population predictor 1 is not a predictor of fear of earthquakes.

Alternative hypothesis for predictors:
 In the population predictor 1 is a predictor of fear of earthquakes.

Standardized coefficient - we can compare standardized regression coefficient beta
cause the level of measurement is standardized
b* coefficients expressed in standard deviation of variables
 Reflect the relative importance of each predictor X
 A bigger b* -> variable more influential than variable with smaller b

 Use b-values (unstandardized regression coefficient) to calculate predicted value of Y
 Use b* (beta; standardized regression coefficient) values to compare the different
independent predictor variables

Dummy variable
 If we recode the dichotomous variable into 0 and 1 we can include it in a regression
model (treat it as numerical)
 The unstandardized regression coefficient of a dummy variable indicates the
difference between the 0 category and the 1 category.
 If it is negative - the higher one score on dummy variable the lower the score on the
dependent variable

MCRS Partial Exam 1 - Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MCRS Partial Exam 1 - Notes

Uploaded by

Copyright:

Available Formats

1A

 It does not work vice versa

Measurement validity and reliability

Best measures of central tendency for specific types of measurements

Model fit: ANOVA

R- strong predictive or weak model

You might also like