Professional Documents
Culture Documents
Advertisement
Does it work?
What are the effects?
For whom does it work?
Different effects for different people
Why does it work (or not)?
What scientific research is
*a systematic process of gathering theoretical knowledge through observation
Observation= empiricism
Empirical
- based on social reality
Systematic and cumulative
-Builds on previous research
-In search for patterns and associations
Public
- verifiable
- open for criticism
- results are always preliminary
Objective
- findings are not personal/subjective
-Uniform rules (comparable results)
Predictive
- results in predictions
Communication research
Communication scientistc study an aspect of communication
Social media
Organizational/group
Interpersonal
Rhetoric or persuasion
Communication technology
Corporate persuasive entertainment political communication
Reasons for research
Theory -> knowledge gap -> fundamental research (purpose of fundamental (or basic or
pure) research -> contribute to science
Assignment - problem of practice
Applied research - purpose of applied research acquire knowledge to solve a practical
problem
Research is a systematic process of
Posing questions
Answering questions
Demonstration that your results are valid
Sharing your research results
Communication research is a systematic process of asking and answering questions about
human communication
Research strategies
Quantitative -measuring with numbers
-measurements
-numbers
- testing theory
Qualitative - measuring with words
- no measurements
-words
-generating theory
Research design
Experimental - used to look into effects
Cross-sectional - used to look into association correlation
Longitudinal - association/relationship/correlation
Data collection methods
Observation - during experiment in real life
Posing questions
- survey,
- in-depth interviews ,
- focus groups interviews
Content analysis
- content of existing sources (text, images)
2A
Conducting research
World view I- human communication is objectively measurable and can be summarized in
rules - nomothetic approach
World view II- human communication is subjective, individualistic and must be described as
such - idiographic approach
ONTOLOGY - what is out there ?
what is real? How can we view them?
Can we compare social behavior and non-material things with physical objects? Does the
study process is similar?
Assumptions based on ontology
Objectivism
the underlying reality (attitude) has the characteristics of an objects (objective reality).
Generalization.
Constructionism
social entities such as attitude are considered social constructions, not objects.
Researchers prefer specific versions of social reality rather than one that can be
regarded as definite.
EPISTEMOLOGY- how should we know that ?
questions of how communication should be understood, What is knowledge, what counts as
knowledge ?
Assumptions based on epistemology
Positivism - generalizing
- application of methods of natural sciences for the study of social reality
- knowledge is what can be perceived by the senses
- knowledge is arrived at through the gathering of facts that provide the basics for
rules and laws
- Theory should lead to hypothesis
Interpretivism
- there is a difference between people and the objects of the natural sciences.
Researchers should focus on the subjective meaning of social action
Two scientific approaches
Empirical-analytical
- objectivism
-positivism
- observe, measure from researcher's perspective
- empiricism rather than logic
- Explaining- from researchers perspective
- rule out alternative explanations
- nomothetic approach (worldview I)
- quantitative and experiment, survey, content analysis
Empirical-Interpretive (less often)
-constructionism
-interpretivism
-observating, interpret from participants' perspectives
- starting without theory, but observations
- idiographic (worldview II)
- qualitative, individual interviews, focus groups, ethnography
Empirical cycle
Induction- empirical-interpretive approach
Deduction- empirical-analytical approach
Scientific method:
Hypothesis should be
Empirically testable - systematic, collect empirical data, research plan
Replicable
Objective
Transparent
Logically consistent
- don’t change interpretation/hypothesis after data collection
- be consistent in what confirms and disconfirms your hypothesis
Falsifiable - a statement that is not falsifiable usually needs some sort of exhaustive
search of all possibilities to disprove it
- verification - confirm and interpretative
- falsification - refute and analytic
- provisional truth - I found the support for my hypothesis but not ACCEPTED it fully
and forever
Problem definition and hypotheses
Problem definition - observation and induction stages
Based on the research objective and research question
Research objective (the aim of the study)
- Indicates the scientific and/or societal impact of study
- indicates the goal of the study (exploration, explanation, prediction, control,
description, interpretation, criticism )
- fundamental (about science with literature) or applied research (practice) - not
mutually exclusive they require each other
Formulating research objectives
- complete
- not vague
- clear (unambiguous)
- not too broad
- Societal relevance (more complete understanding is required, penetrating social life)
- scientific relevance (little is known, results will help us to understand it)
Formulation research questions (what does the researches want to know ), starting point
for the research
- clear
- researchable
Connection with established research and theory
Linked to other RQs
Original contribution unless replication study
Not too broad/too narrow
Always end with ?
Should fit the research objectives
May not contain unsubstantiated or incorrect assumptions
Is not the same as a survey question
Clear not vague
Open-ended (is there a relationship between variables?) or closed-ended (what is the
direction of the relationship between variables? Decline/growth)
RQs
Descriptive (how)
Explanatory (why?) - 2 or more variables
Predictive - 2 or more variables
Relational (to what extend /are variables related to each other - we do not know what
is the cause
Causal ( is one variable affects/influences another )
Hypothesis
Two-tailed - there is a relationship (non-directional)
One-tailed - there is a direction - one growths and another declines etc.
Null- there is no relationship - about statistics no affect in study
Sometimes hypotheses are non-testable (definitions, speculative/normative(should)
statements or statements without specific definitions of place and time)
3A
Conceptual model - schematic overview of my hypothesis (theoretical assumptions)
Includes main concepts(variables)
Independent (predictor) and dependent (outcome) variable
Indicate relationships with arrows - used to indicate relationships ((a)symmetrical)
Indicate the type of the relationship - positive/negative
POSITIVE RELATIONSHIP
More of one, more of another and vice versa
NEGATIVE RELATIONSHIP
More of one, less of the other and vice versa
asymmetrical
Casual or predictive : asymmetrical
One variable affects or predicts the other variable
Independent variable ------------------------> dependent one
How to draft the conceptual model ?
Identify variables and state their nature ( predictor and outcome)
Identify the relationships between them (positive/negative,
symmetrical/asymmetrical)
Building blocks of hypothesis
Units of analysis / cases
All subjects or objects that are mentioned
E.g., people, organisations, newspapers, countries...
Variables
Characteristics of the units of analysis
E.g., age, sex, level of education, aggression
Variables vary (units of analysis vary on these variables)
Values
Possible categories per characteristic (variable)
0‐99 years, male/female, low/high, elementary school/university
You observe these categories
Operationalization- making my concept measurable
1. Come up with measurable definition of a concept(construct/variable)
2. Choose indicators
3. Develop questions/statements (items)
4. Assign measurement scales
Why is it important?
I want my research to be replicable
I want to be able to compare it with results from different studies
Definition - is measurable, it's a description of a concept in MY research that is as accurate
as possible
You can't use the concept itself in the description
You can't use synonyms of the concept in the description
You can't include concepts that need their own definition to be understood correctly
(unless you provide it)
It should contain all info needed (completeness)
It should exclude what is not a part of the concept
The definition should not have a normative character
Measurement scale
Likert answer scale
- 5/7/10-point
- completely disagree- completely agree
- neutral option possible
- several Likert items (indicators) that intend to measure the same concept - Likert
SCALE
Semantic differential scale
- semantic= word, differential = distinction
- different words that define/describe the concept/variable
- several semantic differential items together form a SEMANTIC DIFFERENTIAL SCALE
Manifest variables
Directly measurable
E.g. Sex, age, education level
Latent variables
Not directly measurable
E.g. trust in media, gender (complex), attitude towards something, opinions
Concept/variable often measure by means of several aspects (questions)
What can go wrong?
You don’t measure what you define
You measure what you define but not in a consistent way
Validity
Degree to which our measure is free from systematic error (bias)
E.g. not measuring what we intended to measure, socially desirable answers, respondents
know the goal of the study, respondent wants to meet expectations of researchers
Reliability
Degree to which our measure is free from random error (noise)
E.g. not measuring something in a consistent way, as many negative as positive errors (they
cancel each other out)
Summary
4A
External validity
- Can what we observe in our research context be generalized to a wider context
(population)?
Population validity -is our sample representative for our target population?
Ecological validity - Is the research context true to life? Applicable in other cultures,
circumstances?
MODERATION
Moderate relationship- referred to as an interaction or specification
o Moderator variables influence the strength of a relationship between two other
variables
o The relationship between two variables is different for different levels of the
moderator
o It is a third variable
How a third variables influences the relationship that we are interested in:
No moderation- hypothesis : exposure to media violence positively affects
aggression
Moderation- hypothesis: the positive influences of exposure to media violence
on aggression is stronger for females than for the males.( influenced by sex )
MODERATOR INFLUENCES the relationship between variables
MEDIATION
Mediated relationship-
o Mediator variable explains the relationship between the other two variables
o Mediator variable is influenced by the independent variable(A) and mediator variable
influences the depended variable (B).
o The mediator variable influences the dependent variable but not the independent
one!
How it works
No mediation- hypothesis: exposure to media violence positively affects aggression
Mediation- hypothesis: the positive influence of exposure to media violence on
aggression is explained by social norms. Media violence influences social norms and
social norms influence aggression.
There is a direct relationship between the independent variable and dependent
variable ( C )
The relationship is mediated by a third variable, indicating an indirect relationship
between the independent and dependent variable
There is an asymmetrical relationship between the independent variable and the third
variable
There is an asymmetrical relationship between the third variable and the dependent
variable
MEDIATOR EXPLAINS THE VARIABLES AND THEIR RELATIONSHIP
Partial mediation - the direct relationship between the independent variable and the
dependent variable becomes less strong when you add the mediator to the model
Full mediation - the direct relationship between the independent variable and the
dependent variable disappears when you add the mediator to the model.
The mediator variable influences the independent variable and dependent variable -
SPURIOUS RELATIONSHIP - there seems like there is a relationship between X and Y
but there is really not.
1B
Statistics- quantitative methods of analysis
Types of statistics
1. Descriptive statistics - techniques that are used to summarize a set of numbers
E.g. Mean, pie charts, percentage etc.
2. Inferential statistics - techniques that are used in making decisions based on data
The research process
Hypothesis- A statement about the relationship that we except to find between variables
Variables
Operationalize variables
Based on theory or previous studies
Use these variables in your predictions
Type of statistical analysis depends on type of variables you use
Predictor---> Outcome
Independent ---> Dependent
X---> Y
Asymmetric relationship
- when the proposed relationship is based on an independent (predictor) variable and a
dependent (outcome) variable
Symmetric Relationship
- when it is not clear which variable is independent or dependent e.g. is there a relation
between checking fb and understanding of a teacher
Types of analysis
Univariate analysis - statements about only 1 variable - we measure only one thing
e.g. How many hours a day are students online on social media ?- HOURS/DAY only
variable
Bivariate analysis - statements about 2 variables
E.g. Are students more hours a day online on social media compared to lecturers ?-
STUDENT/LECTURER and HOURS/ DAY
Multivariable analysis - statements about more than 2 variables
e.g. Is the influence of drinking coffee on feeling cranky different for students than the
lecturers? - DRINKING COFEE (yes/no)
STUDENT/LECTURER, FEELING CRANKY (yes/ no)
Types of measurements
NOIR
Nominal -lowest
- labels e.g. Germany is 2 and 20% of students are Dutch
-assign individuals to categories
-magnitude of number not meaningful
- calculations are not possible
Ordinal
-labels again - how do you feel today 1-5 scale but what is the difference
-calculations not possible
- magnitude of values indicates order of events
-magnitude of differences between events not meaningful
Interval
-magnitude of values indicates order of events
- magnitude of differences between events is meaningful
- addition and subtraction is possible
- No meaningful zero point - you cannot multiply/divide scores
e.g. the difference between c and f degrees
Ratio - highest
- magnitude of values indicates orders of events
- magnitude of differences between evets is meaningful
- addition and subtraction is possible
- meaningful zero point - you can multiply and divide
SUMMARY
Level of measurement determined by operationalization of variables
Choose level as high as possible = more types of statistical analyses
Sometimes fixed level e.g. sex or eye color
Interval and ratio data are numerical
Nominal and ordinal data are not numerical
Measures of central tendency
MMM
Mode
the most frequent score
nominal level of measurement
can be used for every level of measurement but for nominal only mode
E.g. THE MODE IS 3 ! Although she tested dragon fruit 4 times the assigned value is 3
1. Acai- 2
2. Pineapple - 3
3. Dragon fruit - 4
4. Coconut - 3
Median
the middle score when scores are ranked - ordered
If the number of scores is odd you get the middle one
If the number of scores in even you have to count the average of two in the middle
Mean
The sum of the scores divided by the number of scores
only used with interval
Can be easily affected by extreme scores
IS NOT INFORMATIVE WITH NOMINAL LEVEL VARIBALES
The dispersion in a distribution
Dispersion for different types of variables
At least ordinal - range, IQR
At least interval - deviance, variance, SD
Nominal - no dispersion
RANGE - the smallest score subtracted from the largest
The age ranges from q0 to q4
Affected by extreme scores/ outliers
IQR - interquartile range - in ordinal data represents variability
The three values that split the stored data into 4 equal parts
Put data in order, split in half (Q2) and count the mean
Second quartile Q2- median
Lower Quartile - Q1- median of the lower half of the data
Upper Quartile Q3- median of the upper half of the data
A lot of data is loss but is not affected by extreme scores
IQR= Q3-Q1
Dispersion - Indicate the spread of scores , How different is each score from the center of
distribution
most often used measures of dispersion are
Deviance
- size is dependent on the number of cores in the data
- Individual score minus mean
- Total deviance - calculate deviance for every person and add them up
- If there are negative signs, all the individual deviances have to be squared before
adding up
- Total sum of squares -> take the average
Variance
- more useful to work with the average dispersion known as the variance
- SS divided by N-1 ( number of cases)
Standard deviation (SD)
-The variance gives us a measure in units squared, SD is easier in interpretation
- is square root of variance
Always include average (mean) and SD
APA6 guidelines: M and SD in italics and round to 2 decimals (calculate with three decimals)
Z-scores - Comparing scores on exams from different courses , level of measurement at
least interval
Standardizing a score with respect to the other scores in the group
Expresses a score in terms of how many SDs it is away from mean
The distribution of z-scores has a mean of 0 and SD of 1
Take each score and subtract mean from It, then divide by SD
We can use it to determinate he chance of sth occurring
The number of SD that a variable is below or above the mean (positive or negative z-score)
If we have a negative z-score we flip the smaller and larger portion values from the
table(from the appendix)
1.96 cuts off the top2.5% of the distribution
-1.96 cuts off the bottom 2.5% of the distribution
As such, 95% of z-scores lie between -1.96 and 1.96.
99% of z-scores lie between -2.58 and 2.58
99.9% of z-scores lie between -3.29 and 3.29
See appendix A in Field and memorize the above and try to figure out why it is so
Standard normal distribution
When a sample is large enough (>100), data are normally distributed
Many data, like IQR, exam grades, age, income are normally distributed
To visualize all the data in the same type of distribution, we standardize scores
- standard normal distribution
- we standardize with z-scores
In normal distribution the mean, mode and median are exactly in the middle. - bell shape
distribution
Normal distribution can be skewed to the left or right - then median is in the middle
3B
Degrees of freedom - n-1 in that many scores my results may vary
Inferential statistics
Techniques that are used in making decisions based on data
Enables you to draw inferences about your data
You try to estimate what happens in a population based on data from sample
We estimate, we are not confident - mostly to 95% confident (sometimes 99%)
Can we generalize our data to wider population?
Population - the collection of units to which we want to generalize a set of findings or a
statistical model
Sample- a smaller (hopefully representative) collection of units from a population used to
determinate truths about that population
Fit of the Model to the data
The mean is a model of what happens in real world but it is never perfect (good fit or
poor fit)
It's hypothetical value
It’s a measure that we use to summarize our data
How can we assess how well the mean represents reality?
Inferences: from a Sample to a Population
Mean and SD describe the sample
Sample Mean and SD are used to estimate the mean and SD of the population
The better our model fits the data the better sample statistics (𝑿) estimates the
population parameter (μ)
Sample mean is the best guess of population mean (but we are never 100% certain)
If we draw an infinite number of means from the samples, we get the population mean
-sampling distribution
Sample distribution- frequency distribution of sample data (from 1 sample)
Sampling distribution - frequency distributions of sample means (form many samples)
Approaches normality, means from all samples are normally distributed IF we sample
infinite times and do it with replacement.
Problem- it's not possible to draw infinite samples , typically it's just one sample THUS we do
not know the mean of samples
Then we use:
CENTRAL LIMIT THEOREM
If the sample size is large enough (>30 at least but better >100)
Our sample mean is the best estimate of the population mean (parameter)
Sample mean = best estimate of population mean , SD informs us of fit of the sample
mean to the data we collected
HOW GOOD IS THE FIT------> Calculate STANDARD ERROR (the SD of the sample means
in a sampling distribution)
Standard error - we use the sample statistics (mean) as a point estimate for the population
parameter (mean in the population)
Tells us the size of the estimation error we are likely to make
Difference between sample and population mean
Confidence
Mean is a point estimate of population parameter
How confident are we that this point estimate precisely reflects the population
parameter? ---->Calculate an interval around our point estimate CONFIDENCE
INTERVAL
Confidence interval - interval we think contains population mean, sample as estimate so
true value is unknown
Sample mean - starting point
Variation in samples indicated by standard error of the mean
Confidence interval = sample mean+/- (Z-score*SE)
Z-distribution = probability distribution
Frequency distribution : distribution of values that you measure (empirically observed)
Probability distribution : distribution of all possible outcomes by the likelihood of each one
occurring
The difference
- probability distribution based on expected values
-frequency distribution based on real/measured values
Z-score = critical value
Sampling distribution= normal distribution
95% confidence - to a certain (here 95%) degree we are certain that our interval will contain
the population mean
Z= +1.96 cuts off the top 2.5% of the distribution
Z= -1.96 cuts off the bottom 2.5% of the distribution
To get certain confidence level interval
CI = [sample mean] +/‐ (z‐score*SE)
If CI small: sample mean close to population mean (true mean)
If CI wide: bad representation of the population
99% of z-score lie between -2.58 and 2.58
The values below and above -1.96 and 1.96 are in the region of rejection- those are very
special and extraordinary values
Hypothesis testing and statistical significance.
Types of hypothesis:
null hypothesis H0 - there is no effect/difference/association in the population
The alternative hypothesis - H1 - there is a effect/difference/association in the
population
When we reject null hypothesis is give us some support for alternative hypothesis but does
not prove it in 100%!
Statistical significance
A measure of how unlikely it is that an event occurred by chance
Refers to probability
It is the chance of rejecting the null hypothesis while in reality the hull hypothesis is
true (false positive)
We want it to be small !
Does not inform us about how practically relevant or important an outcome is
Significance NHST - null hypothesis significance testing
How unlikely (or special) are the sample data assuming that null hypothesis is true?
Significant result: REJECT the null hypothesis
The chances of obtaining the data we've collected assuming that the null hypothesis
is true
P-value - we set alp h a(significance level) at 0.5
The chance of making a mistake (rejecting null hypothesis while we should not) is 5%
or less
We want the chance of obtaining the data we've collected assuming that the null
hypothesis is true to be 5% or less
P< (or equal) 0.5= significant (reject H0)
P>0.5 = not significant (retain H0)
Retain :
If difference between the values based on sample is not statistically significant, we
can't assume they are different in population
Reject:
If difference between values based on sample is statistically significant, we found
support for the alternative hypothesis.
P‐value = .032 (p ≤ .05, reject H0)
Interpretation:
It is very unlikely (3.2%) that you found this difference between means
if you assume that the null hypothesis (no difference between means)
is true
One-sample t-test
You can see if the mean age in your sample is the same as mean age in the population
We want to compare a sample mean to a known or hypothetical population mean
does our sample mean differ significantly from a known or hypothesized value?
(meaning it does not differ due to random effects of change)
Steps for testing hypotheses
1. Specify the statistical hypotheses
2. Select the significance level of the test (0.05)
3. Execute the test
4. Decide on the null hypothesis
5. Estimate effect size
6. Report the test results
Degrees of freedom -> N-1 always
If we have more than 100 cases -> Critical value for α =.05 = 1.96 ---> Normal distribution
If t (test statistics) is below or above the critical values we have it in the region of rejection->
then it's very special (unlikely to occur if null hypothesis were true)and thus it is significant
at p <(or equal) 0.5
To calculate the CI we add the values to the test value ->
Interval does not include the value of the null hypothesis:
The test is significant
Interval includes the value of the null hypothesis:
The test is NOT significant
Effect size Cohen's d - is the difference between means practically relevant?
Cohen’s d effect sizes:
0.2 = small
0.5 = medium
0.8 = large
Some important points of attention
Check APA6 guidelines:
• Calculate with 3 decimals
• Round all results to 2 decimals
• Exception: p‐values
report the exact p‐value and round to 3 decimals (e.g., p = .231)
But…if your p‐value is smaller than .001, then report as: p<.001
• M, SD, p, t, d: italics
4B
Pearson's r
Correlation
Way of measuring the extent to which two variables are related
A measure of the degree of association among variables
Indicates whether a variable changes in a predictable manner
Examines whether as one variable increases, the other one increases, decreases or
stays the same
Pearson Product-Moment Correlation: degree of association between two
interval/ratio variables
Sign of the correlation indicates the direction of the relationship
o It always vary from -1 to 1, with 0 meaning there is no relationship between the
variables.
o -1 or +1 indicated a perfect correlation. Positive - both decrease or increase. Negative-
one increases and other decreases or vice versa.
o +/- .1=weak correlation. +/- .3= moderate relationship +/- .5=strong relationship
o Magnitude of correlation indicates the strength of the association
Correlation- Covariance
When two variables covary: knowing how one variable changes helps you in predicting
how another variable changes.
When variables covary, they are related in some way: correlation among the variables.
Coefficient of determination r2
By squaring the value of r you get the proportion of variance in one variable shared by
the other. The higher % the more variance is shared.
% we get can be reported as the percentage of the variance in the extent to which one
likes ' the support act' is explained by the extend to which one likes the band (and vice
versa)
Correlation analysis
Bivariate analysis- two variables that I measure
Both measured on interval level
Dependent/independent relationship?
Statistical test : Pearman's coefficient
Test statistics t
Steps for testing hypothesis
1. Specify the statistical hypothesis
2. Select the significance level of the test ( alpha at 5% -> looking for p < .05)
3. Execute the test (run the scatter plot and interpret it)
4. Decide on the null hypothesis
5. Estimate effect size
6. Repot the test results
Spearman's rho- non-parametric test used with nonlinear association.
Curve linear - you cannot determine Pearson's correlation coefficient but the
spearman's rho.
Used also with associations between one ordinal and one interval or ratio level interval
Interpretation - the results revealed some association between RANKED variable A and
RANKED variable B
NOMINAL LEVEL VARIABLE CANNOT BE RANKED AND IT CANNOT BE USED IN THE SCATTER
PLOT OR ANY ASSOCIATION COERRELATION
Correlation and causality
Correlation does not imply causation - third variable problem
The third variable problem
- in any correlation, causality between two variables cannot be assumed because there
may be other measured or unmeasured variables affecting the results
Direction of causality
- Correlation coefficients say nothing about the direction of the association
- Symmetrical relationships
Simple regression analysis
i. Correlation= symmetrical relationship
ii. A correlations does not indicate how a prediction (asymmetrical) will be quantified
iii. Regression analysis - when we fit a linear model to our data and use it to predict
values of an outcome variable from one or more predictor variables
iv. One predictor variable: simple regression
v. More predictor variables : multiple regression
Simple regression
A step beyond mere correlation
Not cause and effect
Predict Y from X (predict outcome variable from predictor variable)
Model the asymmetrical relationship between 2 variables
Linear model
Relationship: equation for straight line (y=ax+b)
How to define a line : Intercept and slopes
Where does the line cross the y-axis? = Intercept at X=0 the value of Y is the intercept
Does the line go down or up (positive/negative)
By which degree does it go up or down (SLOPE/GRADIENT)
Sum of Squared residuals-
The regression line is only a model based on the data
This model may not reflect reality
Large SSr indicates poor fit: line not representative of data
Small SSr indicates good fit: line representative of data
We need some way of testing how well the model fits the observed data (SPSS)
Model fit R2- what is proportion of improvement?
R2= amount of variance in the outcome explained by the model
Coefficient of determination
R2 = size of model fit how well does the line fit the data ?
Only for simple regression:
Pearson's r=square root of R2
R2 between 0 and 1 (may be presented as a %)
R2 = 0: no variance explained (useless model)
R2 = 1: variance perfectly explained; model predicts Y(outcome variable) perfectly
The variance in X explains the variance in Y for ….%.
B value for constant and independent variable - UNSTANDARDIZED COEFFICIENT
Prediction for every increase in the unit of the independent variable and how it
changes the dependent variable (here CONSTANT)
B for constant is the intercept
Allows us to understand the variance and actual variable we are looking at
(independent one)
T= test statistics= significant : b is significantly different from 0
We can also predict for the population
Confidence interval : we are 95% certain that the confidence interval contains
unstandardized population regression coefficient ( SLOPE)- B value for independent variable
Standardized coefficient (beta b ) is the same as R - it tells us about the strength and
direction of the predictive effect/ association of the variables . Which aspects of our model
is affecting us more.
The regression model with smelliness of surf shirt as dependent variable and
number of days surfing in the past two weeks as independent variable was
significant, F(1, 154) = 69.50, p < .001. Number of days surfing in the past two
weeks has a significant association with smelliness of the shirt, b* = 0.56, t =
8.34, p < .001, 95% CI [0.26, 0.42] and explains 31.1% of the variance in
smelliness of the shirt (R2 = .31). For every unit increase in number of days
surfing, smelliness of shirt increases with 0.34 unit.
5B
Dichotomous predictors
Can we include a dichotomous predictor in a regression analysis?
Dichotomous=categorical
Regression analysis= interval/ratio variable
E.g. biological sex, experimental group
If we assign the values 0 and 1 to the dichotomous variable, we can include it in a
regression model (treat it as a ratio variable with meaningful zero point - DUMMY
VARIABLE)
B0- also a or constant
intercept ( value of Y when X=0)
Point at which the regression line crosses y axis
B1 - regression coefficient
slope/ gradient
strength and direction
B- values for constant and independent variable - unstandardized coefficients
We cannot compare them because they are based on different units of measurements
They tell us the intercept (for constant) and the value of change if the independent
variable increases by one unit
Assess ‘influence’ of X1 while controlling for ‘influence’ of variables X2, X3, Xi…
Benefit :
Better prediction of Y
Examine unique predictors
Compare predictors
Controlling for /holding constant
Examine how the unique relationship between calorie intake and weight while
controlling for height
Null hypothesis for the multiple regression
In the population general level of anxiety of earthquakes, number of previous
earthquakes and place of living , together do not explain the variance in fear of
earthquakes
Alternative hypothesis:
At least one of the three perditions explains the variance in fear of earthquakes
Null hypothesis for predictors:
In the population predictor 1 is not a predictor of fear of earthquakes.
Alternative hypothesis for predictors:
In the population predictor 1 is a predictor of fear of earthquakes.
Standardized coefficient - we can compare standardized regression coefficient beta
cause the level of measurement is standardized
b* coefficients expressed in standard deviation of variables
Reflect the relative importance of each predictor X
A bigger b* -> variable more influential than variable with smaller b
Use b-values (unstandardized regression coefficient) to calculate predicted value of Y
Use b* (beta; standardized regression coefficient) values to compare the different
independent predictor variables
Dummy variable
If we recode the dichotomous variable into 0 and 1 we can include it in a regression
model (treat it as numerical)
The unstandardized regression coefficient of a dummy variable indicates the
difference between the 0 category and the 1 category.
If it is negative - the higher one score on dummy variable the lower the score on the
dependent variable