You are on page 1of 9

CHAP 1: STATISTICS AND SCIENTIFIC the room in which the research is being

METHOD conducted
Definition of Statistics • Independent variable (IV)
the variable that is systematically manipulated
• a set of mathematical procedures for
by the investigator
organizing, summarizing, and interpreting
information • Dependent variable (DV)
the variable that the investigator measures to
• help ensure that the information or
determine the effect of the independent
observations are presented and interpreted in
variable
an accurate and informative way
Variables: IV and DV
• help researchers bring order out of chaos
• an investigator might be interested in the effect
• provide researchers with a set of standardized
of alcohol on social behavior
techniques that are recognized and understood
throughout the scientific community • the effect of sleep deprivation on aggressive
behavior is studied
Statistics serve two general purposes:
• How long you sleep affects your test score
• used to organize and summarize the
information so that the researcher can see what • You want to compare brands of paper towels,
happened in the research study and can to see which holds the most liquid
communicate the results to others
• If you want to know whether caffeine affects
• help the researcher to answer the questions your appetite
that initiated the research by determining
DATA
exactly what general conclusions are justified
• Data (plural) are measurements or
POPULATIONS AND SAMPLE
observations.
• population is the set of all the individuals of
• Data set is a collection of measurements or
interest in a particular study
observations
• the entire group that a researcher wishes to
• Datum (singular) is a single measurement or
study
observation and is commonly called a score or
• sample is a set of individuals selected from a raw score.
population, usually intended to represent the
PARAMETER AND STATISTICS
population in a research study
• Parameter is a value, usually a numerical
• a sample should always be identified in terms
value, that describes a population. A
of the population from which it was selected
parameter is usually derived from
measurements of the individuals in the
population.

• Statistic is a value, usually a numerical value,


that describes a sample. A statistic is usually
derived from measurements of the individuals
in the sample.
Descriptive and Inferential Statistical Methods
VARIABLES
• Descriptive statistics
Variable is a characteristic or condition that changes
or has different values for different individuals  are statistical procedures used to summarize,
organize, and simplify data.
• variables can be characteristics that differ
 techniques that take raw scores and organize
from one individual to another, such as height,
or summarize them in a form that is more
weight, gender, or personality. Also, variables
manageable
can be environmental conditions that change
 another common technique is to summarize a
such as temperature, time of day, or the size of
set of scores by computing an average
 Inferential statistics - consist of techniques that ORDINAL SCALES
allow us to study samples and then make
• With an ordinal scale, we rank-order the
generalizations about the populations from which
objects being measured according to whether
they were selected.
they possess more, less, or the same amount of
 Sampling error is the naturally occurring
the variable being measured
discrepancy, or error, that exists between
a sample statistic and the corresponding • numbers on the scale represent rank orderings,
population parameter. rather than raw score magnitudes

• possesses a relatively low level of the property


of magnitude

• When using an ordinal scale, you cannot do


operations of addition, subtraction,
multiplication, division, or ratios.
INTERVAL SCALES

• possesses the properties of magnitude and


equal interval but doesn’t have an absolute
zero point

CHAPTER 2: BASIC MATHEMATICAL AND • When using an interval scale, you can do
MEASUREMENT CONCEPTS operations of addition and subtraction. You
cannot do multiplication, division, or ratios.
• sum of the squared X scores
RATIO SCALES
• Square anay then add
• has all the properties of an interval scale and,
• sum of the X scores, quantity squared in addition, has an absolute zero point
• Add anay then square • When using a ratio scale, you can perform all
MEASUREMENT SCALE mathematical operations

• a measuring scale can have one or more of the • examples of variables measured with ratio
following mathematical attributes: magnitude, scales include reaction time, length, weight,
an equal interval between adjacent units, and age, and frequency of any event
an absolute zero point CONTINUOUS AND DISCRETE VARIABLES
• Four types of scales are commonly • Continuous variable is one that theoretically can
encountered in the behavioral sciences: have an infinite number of values between
nominal, ordinal, adjacent units on the scale.
interval, and ratio
• Discrete variable is one in which there are no
NOMINAL SCALES possible values between adjacent units on the
• the lowest level of measurement and is most scale
often used with variables that are qualitative in • Real limits of a continuous variable are those
nature rather than quantitative values that are above and below the recorded
• one that has categories for the units value by one-half of the smallest measuring unit
of the scale.
• does not possess any of the mathematical
attributes of magnitude, equal interval, or • 50 kg
absolute zero point • Lower real limit 49.5 kg
• EX: brands of jogging shoes, kinds of fruit, • Upper real limit 50.5 kg
types of music, days of the month, nationality,
religious preference, and eye color

• When using a nominal scale, you cannot do SIGNIFICANT FIGURES


operations of addition, subtraction,
multiplication, division, or ratios.
• It is standard practice to carry all • percentile rank of a score is the percentage of
intermediate calculations to two or more scores with values lower than the score in
decimal places further than will be reported question.
in the final answer.
PERCENTILE SAMPLE
• Thus, when the final answer is required to
• For instance, if you received a score of 95 on
have two decimal places, you should carry
a math test and this score was greater than or
intermediate calculations to at least four
equal to the scores of 88% of the students
decimal places and round the final answer to
taking the test, then your percentile rank
two places.
would be 88. You would be in the 88th
CHAPTER 3: FREQUENCY DISTRIBUTIONS percentile.
UNGROUPED FREQUENCY DISTRIBUTIONS PERCENTILE VS PERCENTAGE

 Frequency distribution • Percentile is a measure used in statistics


 presents the s core values and t heir frequency indicating the value below which a given
of occurrence. When presented in a table, the percentage of observations in a group of
score values are listed in rank order, with the observations fall; says how many students
lowest score value usually at the bottom of the have scored less than you on a given test
table.
• Percentage says how much you have
 the major purpose of a frequency distribution performed if the test was of 100 marks.
is to present the scores in such a way to
facilitate ease of understanding and GRAPHING FREQUENCY DISTRIBUTIONS
interpretation • Frequency distributions are often displayed as
Constructing a Frequency Distribution of Grouped graphs rather than tables.
Scores
• Since a graph is based completely on the
1. Finding the range. tabled scores, the graph does not contain any
Range = Highest score minus lowest score new information.
2. Determining interval width (i). Let’s assume
we wish to group the data into approximately • However, a graph presents the data pictorially,
10 class intervals. which often makes it easier to see important
3. Listing the intervals. features of the data
4. Tallying the scores GRAPHS
5. Summing into frequencies
• vertical axis is called the ordinate, or Y axis,
and the horizontal axis is the abscissa, or X
 Relative frequency distribution indicates the axis.
proportion of the total number
of scores that occurs in each interval. • independent variable is plotted on the X axis
 Cumulative frequency distribution indicates the and the dependent variable on the Y axis
number of scores that fall • score values are usually plotted on the X axis
below the upper real limit of each interval. and the frequency of the score values is plotted
 Cumulative percentage distribution indicates the on the Y axis
percentage of scores that
fall below the upper real limit of each interval FOUR MAIN TYPES OF GRAPHS:

• bar graph -
PERCENTILES and PERCENTILE RANK
• histogram
• measures of relative standing
• frequency polygon
• used extensively in education to compare the
performance of an individual to that of a reference • cumulative percentage curve
group BAR GRAPH
• percentile or percentile point is the value on the • nominal or ordinal data
measurement scale below which a specified
percentage of the scores in the distribution fall.
• height of the bar represents the frequency or ARITHMETIC MEAN
number of members of that category
• value you ordinarily calculate when you
• no numerical relationship between the categories average something

• bars for each category in a bar graph do not touch • the sum of the scores divided by the number
each other. This further emphasizes the lack of a of scores.
quantitative relationship between the categories
Notations: Population vs. Sample
HISTOGRAM
Population Sample
• used to represent frequency distributions Mean μ ̅; M
X
composed of interval or ratio data Standard Deviation Σ S
• a bar is drawn for each class interval. The class Size N N
intervals are plotted on the horizontal axis such Arithmetic Mean: In equation form
that each class bar begins and terminates at the real
limits of the interval
FREQUENCY POLYGON

• identical to that of the histogram. However, for


this type of graph, instead of using bars, a point is
plotted over the midpoint of each interval at a
height corresponding to the frequency of the
interval
HISTOGRAM VS FREQUENCY POLYGON
• histogram displays the scores as though they were
equally distributed over the interval, whereas the
frequency polygon displays the scores as though
they were all concentrated at the midpoint of the PROPERTIES OF THE MEAN
interval.
 sensitive to the exact value of all the scores in
• Some investigators prefer to use the frequency the distribution
polygon when they are comparing the shapes of  the sum of the deviations about the mean
two or more distributions. equals zero
• The frequency polygon also has the effect of  very sensitive to extreme scores
displaying the scores as though they were  the sum of the squared deviations of all the
continuously distributed, which in many instances scores about their mean is a minimum
is actually the case.  under most circumstances, of the measures
used for central tendency, the mean is least
CHAPTER 4: MEASURES OF CENTRAL subject to sampling variation
TENDENCY AND VARIABILITY
MEASURES OF CENTRAL TENDENCY  Overall mean is equal to the sum of the mean
of each group times the number of scores in
• a summary statistic the group, divided by the sum of the number
• represents the center point or typical value of of scores in each group
a dataset

• indicate where most values in a distribution


fall

• also referred to as the central location of a


distribution

• three most often used measures of central


tendency: arithmetic mean, the median, and
the mode
MEASURES OF VARIABILITY

• variability specifies how far apart the scores


MEDIAN are spread

• measures of central tendency are a


 second most frequently encountered measure
quantification of the average value of the
of central tendency
distribution, measures of variability quantify
 the scale value below which 50% of the
the extent of dispersion
scores fall. It is therefore the same thing as P50
 centermost score if the number of scores is • Three measures of variability are commonly
odd. If the number is even, the median is taken used in the behavioral sciences: the range,
as the average of the two centermost scores. the standard deviation, and the variance
 less sensitive than the mean to extreme scores.
Range
 under usual circumstances, the median is more
 difference between the highest and lowest
subject to sampling variability than the mean
scores in the distribution.
but less subject to sampling variability than the
 in equation form range really measures the
mode.
spread of only the extreme scores and not the
spread of any of the scores in between
STANDARD DEVIATION

 deviation score tells how far away the raw


score is from the mean of its distribution
deviation score is defined as

MODE

 most frequent score in the distribution


 easiest of the three measures to determine
 found by inspection of the scores; there isn’t
any calculation necessary Standard deviation: raw score method
 when all the scores in the distribution have the
same frequency, it is customary to say that the
distribution
has no mode
 unimodal; that is, they have only one mode
 Using this equation to find SS allows us to use
 it is possible for a distribution to have many
the raw scores without the necessity of
modes: bimodal
calculating deviation scores
 not used very much in the behavioral sciences
 avoids the decimal remainder difficulties
because it is not very stable from sample to
 raw score method is generally easier to use and
sample and often there is more than one mode
avoids potential errors
for a given set of scores.

Measures of Central Tendency and Symmetry

PROPERTIES OF STANDARD DEVIATION

 gives us a measure of dispersion relative to the


mean
 is sensitive to each score in the distribution: If
a score is moved closer to the mean, then the
standard deviation will become smaller
 like the mean, the standard deviation is stable
with regard to sampling fluctuations

VARIANCE
• score transformation - process by which the
raw score is altered

• z transformation results in a distribution


having a mean of 0 and a standard deviation
of 1.

• z scores allow us to determine the number or


percentage of scores that fall above or below
any score in the distribution.

• important use of z scores —namely, to


CHAPTER 5: THE NORMAL CURVE AND compare scores t hat a re not otherwise
STANDARD SCORES directly comparable.

NORMAL CURVE

 theoretical distribution of population scores.


 a bell-shaped curve
 has two inflection points, one on each side of
the mean.
 the inflection points are located where the
curve changes from being convex downward
to being convex upward.
 The curve is said to be asymptotic to the
horizontal axis - approaches the horizontal
axis and gets closer and closer to it, but it never
quite touches
THREE CHARACTERISTIC OF Z SCORE
• z scores have the same shape as the set of raw
scores

• mean of the z scores always equals zero


(μz=0)

• the standard deviation of z scores always


equals 1 (z=1)

CHAPTER 6: CORRELATION
RELATIONSHIPS: LINEAR RELATIONSHIP
To calculate the number of scores in each area, all we
• relationship between variables can best be
need to do is multiply the relevant percentage by the
seen by plotting a graph using the paired X
total number of scores.
and Y values for as the points on the graph.
STANDARD SCORE (Z SCORE)
• Such a graph is called a scatter plot (a graph
 a transformed score that designates how many of paired X and Y values)
standard deviation units the corresponding
raw score is above or below the mean. • linear relationship between two variables is
one in which the relationship can be most
accurately represented by a straight line.
POSITIVE AND NEGATIVE RELATIONSHIP CORRELATION COEFFIECIENT
INTERPRETATION
• The slope of the line tells us whether the
relationship is positive or negative.
• Positive relationship

- indicates a direct relationship between the


variables
- Positive slope
- as X increases, Y increases.
Linear Correlation Coefficient Pearson r
 Negative relationship
- indicates an inverse relationship between X • Pearson r is a measure of the extent to which
and Y paired scores occupy the same or opposite
- Negative slope positions within their own distributions.
- as X increases, Y decreases
• Correlation must be independent of the units
 Perfect relationship a positive or negative
used in measuring the two variables
relationship exists and all of the points fall on the
line. (image a) • the most frequently encountered correlation
 Imperfect relationship a relationship exists, but coefficient in behavioral science research
all of the points do not fall on the line. (image b, CALCULATING PEARSON r
c, d)

CORRELATION

- focuses on the direction and degree of the


relationship
 Direction of the relationship:
- positive or negative
• Degree of relationship:

- refers to the magnitude or strength of the


relationship.
- can vary from nonexistent to perfect

Correlation coefficient

- expresses quantitatively the magnitude and


direction of the relationship
- can vary from +1 to –1
- the sign of the coefficient tells us whether the
relationship is positive or negative
- the numerical part of the correlation
coefficient describes the magnitude of the
correlation
- the higher the number, the greater is the
correlation
- +1 means the correlation is perfect and the
relationship is positive
- –1 means the correlation is perfect and the
relationship is negative
- 0 means the relationship is nonexistent
CHAPTER 7: REGRESSION
• Regression and correlation are closely related

- both involve the relationship between two


variables
- both utilize the same set of basic data: paired
scores taken from the same or matched
subjects
• Regression focuses on using the relationship for
Relationship of r2 and explained variability
prediction
• r2 is called the coefficient of determination
• Regression is a topic that considers using the
• 2
r : the proportion of the total variability of Y relationship between two or more variables for
that is accounted for or explained by X prediction.
• If one of the variables is causal, then r2 is a • Regression line is a best fitting line used for
measure of the size of its effect prediction.
OTHER CORRELATION COEFFIECIENT REGRESSION LINE
- In deciding which correlation coefficient to • is called the regression line of Y on X, or
calculate, the shape of the relationship and simply the regression of Y on X, because we
the measuring scale of the data are the two are predicting Y given X.
most important considerations
• The regression line represents our best
• Correlation coefficient η (eta)
estimate of the Y scores, given their
- for curvilinear relationship (relationship corresponding X values.
between motor skills and age)
• The standard error of estimate gives us a
• Spearman rank order correlation coefficient
measure of the average deviation of the
rho (rs)
prediction
- Both ordinal scales errors about the regression line.
• Biserial correlation coefficient (rb)

- One on the interval scale one dichotomous


variable
• phi (ø) coefficient

- Each of the variables is dichotomous


Effect of Range on Correlation

- estricting the range of either of the variables


will have the effect of lowering the
correlation
Effect of Extreme Scores

- an extreme score can drastically alter


the magnitude of the correlation coefficient Homoscedasticity
Correlation Does Not Imply Causation
• assumption of homoscedasticity: the
- whenever two variables are correlated, there variability of Y
are four possible remains constant as we go from one X score
explanations of the correlation: to the next
1. the correlation between X and Y is spurious,
2. X is the cause of Y, • The homoscedasticity assumption implies
3. Y is the cause of X, that if we divided the X scores into columns,
4. a third variable is the cause of the correlation the variability of Y would not change from
between X and Y. column to column
MULTIPLE REGRESSION

- an extension of simple regression to


situations that involve two or more predictor
variables.
- the general form of the multiple regression
equation for two predictor variables is

R2 or r2

- coefficient of determination
- the proportion of the variance in the
dependent variable that is predictable from
the independent variable(s)

You might also like