You are on page 1of 9

Reinforcement is the process in which a behavior is strengthened by the immedi- ate

consequence that reliably follows its occurrence. When a behavior is strength- ened, it is more
likely to occur again in the future.
Thorndike placed a hungry cat in a cage and put food outside of the cage where the cat could
see it. Thorndike rigged the cage so that a door would open if the cat hit a lever with its paw.
The cat was clawing and biting the bars of the cage, reaching its paws through the open- ings
between the bars, and trying to squeeze through the opening. Eventually, the cat accidentally hit
the lever, the door opened, and the cat got out of the cage and ate the food. Each time
Thorndike put the hungry cat inside the cage it took less time for the cat to hit the lever that
opened the door. Eventually, the cat hit the lever with its paw as soon as Thorndike put it in the
cage (Thorndike, 1911). Thorndike called this phenomenon the law of effect.

In this example, when the hungry cat was put back in the cage, the cat was more likely to hit the
lever because this behavior had resulted in an immediate consequence: escaping the cage and
getting food. Getting to the food was the consequence that reinforced (strengthened) the cat's
behavior of hitting the lever with a paw.

1. The occurrence of a particular behavior

2. is followed by an immediate consequence

3. that results in the strengthening of the behavior. (The person is more likely to engage in the
behavior again in the future.)

We can determine that a behavior is strengthened when there is an increase in its frequency,
duration, intensity, or speed (decreased latency). A behavior that is strengthened through the
process of reinforcement is called an operant behav- ior. An operant behavior acts on the
environment to produce a consequence and, in turn, is controlled by, or occurs again in the
future as a result of, its immediate consequence. The consequence that strengthens an operant
behavior is called a reinforcer.

In the first example in Table 4-1, the child cried at night when her parents put her to bed. The
child's crying was an operant behavior. The reinforcer for her crying was the parents' attention.
Because crying at night resulted in this immediate consequence (reinforcer), the child's crying
was strengthened: She was more likely to cry at night in the future.
There are two types of reinforcement: positive reinforcement and negative rein- forcement. It is
extremely important to remember that both positive reinforcement and negative reinforcement
are processes that strengthen a behavior, that is, they both increase the probability that the
behavior will occur in the future. Positive and negative reinforcement are distinguished only by
the nature of the consequence that follows the behavior.

Positive reinforcement is defined as follows.

1. The occurrence of a behavior

2. is followed by the addition of a stimulus (a reinforcer) or an increase in the intensity of a


stimulus,

3. which results in the strengthening of the behavior.

Negative reinforcement, by contrast, is defined as follows.

1. The occurrence of a behavior

2. is followed by the removal of a stimulus (an aversive stimulus) or a decrease in the intensity
of a stimulus,

3. which results in the strengthening of the behavior.

A stimulus is an object or event that can be detected by one of the senses, and thus has the
potential to influence the person (stimuli is the plural form of the word stimulus). The object or
event may be a feature of the physical environment or the social environment (the behavior of
the person or of others).

In positive reinforcement, the stimulus that is presented or that appears after the behavior is
called a positive reinforcer.

(A positive reinforcer often is seen as something pleasant, desirable, or valuable that a


person will try to get.) In negative reinforcement, the stimulus that is removed or avoided after
the behavior is called an aversive stimulus. (An aversive stimulus often is seen as
something unpleasant, painful, or annoying that a person will try to get away from or
avoid.) The essential difference, therefore, is that in positive reinforcement, a response
produces a stimulus (a positive reinforcer), whereas in negative reinforcement, a response
removes or prevents the occurrence of a stimulus (an aversive stimulus). In both cases, the
behavior is more likely to occur in the future.

Consider Example 8 in Table 4-1. The mother's behavior of buying her child candy results in
termination of the child's tantrum (an aversive stimulus is removed). As a result, the mother is
more likely to buy her child candy when he tantrums in a store. This is an example of negative
reinforcement. On the other hand, when the child tantrums, he gets candy (a positive reinforcer
is presented).
As a result, he is more likely to tantrum in the store. This is an example of positive
reinforcement.

SOCIAL VERSUS AUTOMATIC REINFORCEMENT

As you have learned, reinforcement can involve the addition of a reinforcer (posi- tive
reinforcement) or the removal of an aversive stimulus (negative reinforce- ment) following the
behavior. In both cases, the behavior is strengthened. For both positive and negative
reinforcement, the behavior may produce a consequence through the actions of another person
or through direct contact with the physical environment. When a behavior produces a reinforcing
consequence through the actions of another person, the process is social reinforcement. An
example of social positive reinforcement might involve asking your roommate to bring you the
bag of chips. An example of social negative reinforcement might involve asking your roommate
to turn down the TV when it is too loud. In both cases, the consequence of the behavior was
produced through the actions of another person. When the behavior produces a reinforcing
consequence through direct contact with the physical environment, the process is automatic
reinforcement. An example of automatic positive reinforcement would be if you went to the
kitchen and got the chips for yourself. An example of automatic negative reinforcement would
be if you got the remote and turned down the volume on the TV yourself. In both cases, the
reinforcing consequence was not produced by another person.

One type of positive reinforcement involves the opportunity to engage in a high-probability


behavior (a preferred behavior) as a consequence for a low- probability behavior (a less-
preferred behavior), to increase the low-probability behavior (Mitchell & Stoffelmayr, 1973), This
is called the Premack principle (Premack, 1959).
For example, the Premack principle operates when parents require their fourth grade son to
complete his homework before he can go outside to play with his friends. The opportunity to
play (a high-probability behavior) after the completion of the homework (low-probability
behavior) reinforces the behavior of doing homework; that is, it makes it more likely that the
child will complete his homework.

ESCAPE AND AVOIDANCE BEHAVIORS


When defining negative reinforcement, a distinction is made between escape and avoidance. In
escape behavior, the occurrence of the behavior results in the ter- mination of an aversive
stimulus that was already present when the behavior occurred. That is, the person escapes
from the aversive stimulus by engaging in a particular behavior, and that behavior is
strengthened. In avoidance behavior, the occurrence of the behavior prevents an aversive
stimulus from occurring. That is, the person avoids the aversive stimulus by engaging in a
particular behavior, and that behavior is strengthened.

In an avoidance situation, a warning stimulus often signals the occurrence of an aversive


stimulus, and the person engages in an avoidance behavior when this warning stimulus is
present. Both escape and avoidance are types of negative rein- forcement; therefore, both
result in an increase in the rate of the behavior that ter- minated or avoided the aversive
stimulus.

The distinction between escape and avoidance is shown in the following situation. A laboratory
rat is placed in an experimental chamber that has two sides. separated by a barrier, the rat can
jump over the barrier to get from one side to the other. On the floor of the chamber is an electric
grid that can be used to deliver a shock to one side or the other. Whenever the shock is
presented on the right side of the chamber, the rat jumps to the left side, thus escaping from the
shock. Jumping to the left side of the chamber is escape behavior because the rat escapes from
an aversive stimulus (the shock). When the shock is applied to the left side, the rat jumps to the
right side. The rat learns this escape behavior rather quickly and jumps to the other side of the
chamber as soon as the shock is applied.

In the avoidance situation, a tone is presented just before the shock is applied. (Rats have
better hearing than vision.)

Conditioned and Unconditioned Reinforcers

Reinforcement is a natural process that affects the behavior of humans and other animals.
Through the process of evolution, we have inherited certain biological characteristics that
contribute to our survival. One characteristic we have inherited is the ability to learn new
behaviors through reinforcement. In particular, certain stimuli are naturally reinforcing because
the ability of our behaviors to be rein- forced by these stimuli has survival value (Cooper, Heron,
& Heward, 1987; 2007). For example, food, water, and sexual stimulation are natural positive
rein- forcers because they contribute to survival of the individual and the species. Escape from
painful stimulation or extreme levels of stimulation (cold, heat, or other discomforting or aversive
stimulation) is naturally negatively reinforcing because escape from or avoidance of these
stimuli also contributes to survival. These natural reinforcers are called unconditioned
reinforcers because they func- tion as reinforcers the first time they are presented to most
human beings; no prior experience with these stimuli is needed for them to function as
reinforcers. Unconditioned reinforcers sometimes are called primary reinforcers. These stimuli
are unconditioned reinforcers because they have biological importance (Cooper et al., 1987;
2007).

Another class of reinforcers is the conditioned reinforcers. A conditioned reinforcer (also called a
secondary reinforcer) is a stimulus that was once neutral (a neutral stimulus does not currently
function as a reinforcer; i.e., it does not influence the behavior that it follows) but became
established as a reinforcer by being paired with an unconditioned reinforcer or an already
established condi- tioned reinforcer. For example, a parent's attention is a conditioned reinforcer
for most children because attention is paired with the delivery of food, warmth, and other
reinforcers many times in the course of a young child's life. Money is per- haps the most
common conditioned reinforcer. Money is a conditioned reinforcer because it can buy (is paired
with) a wide variety of unconditioned and condi- tioned reinforcers throughout a person's life. If
you could no longer use money to buy anything, it would no longer be a conditioned reinforcer.
People would not work or engage in any behavior to get money if it could not be used to obtain
other reinforcers. This illustrates one important point about conditioned reinfor- cers: They
continue to be reinforcers only if they are at least occasionally paired with other reinforcers.

Nearly any stimulus may become a conditioned reinforcer if it is paired with an existing
reinforcer. For example, when trainers teach dolphins to perform tricks at aquatic parks, they
use a handheld clicker to reinforce the dolphin's behavior. Early in the training process, the
trainer delivers a fish as a reinforcer and pairs the sound of the clicker with the delivery of the
fish to eat. Eventually, the click- ing sound itself becomes a conditioned reinforcer. After that,
the trainer occasion- ally pairs the sound with the unconditioned reinforcer (the fish) so that the
clicking sound continues to be a conditioned reinforcer (Pryor, 1985). A neutral stimulus such as
a plastic poker chip or a small square piece of colored cardboard can be used as a conditioned
reinforcer (or token) to modify human behavior in a token reinforcement program. In a token
reinforcement program, the token is presented to the person after a desirable behavior, and
later the person exchanges the token for other reinforcers (called backup reinforcers). Because
the tokens are paired with (exchanged for) the backup reinforcers, the tokens themselves
become reinforcers for the desirable behavior.

When a conditioned reinforcer is paired with a wide variety of other reinfor- cers, it is called a
generalized conditioned reinforcer. Money is a generalized conditioned reinforcer because it
is paired with (exchanged for) an almost unlim- ited variety of reinforcers. As a result, money is
a powerful reinforcer that is less likely to diminish in value (to become satiated) when it is
accumulated. That is, satiation (losing value as a reinforcer) is less likely to occur for
generalized reinfor- cers such as money.

Factors That Influence the Effectiveness of Reinforcement

The effectiveness of reinforcement is influenced by a number of factors. These include the


immediacy and consistency of the consequence, establishing opera- tions, the magnitude of the
reinforcer, and individual differences.
Immediate reinforcement is crucial for effective behavior modification. When a consequence
closely follows a behavior, it reinforces that behavior. Delayed consequences weaken the
connection between behavior and consequence, reducing effectiveness. For instance, in dog
training, giving a treat immediately after a desired behavior strengthens that behavior. Similarly,
in social interactions, immediate responses such as smiles or laughter reinforce appropriate
communication, shaping future behavior.

Contingency
If a response is consistently followed by an immediate consequence, that conse- quence is
more likely to reinforce the response. When the response produces the consequence and the
consequence does not occur unless the response occurs first, we say that a contingency exists
between the response and the consequence. When a contingency exists, the consequence is
more likely to reinforce the response (turning the key in your ignition to start your car). This is an
example of contingency: Every time you turn the key, the car starts. The behavior of turning the
key is reinforced by the engine starting. If the engine started only sometimes when you turned
the key, and if it started sometimes when you did not turn the key, the behavior of turning the
key in this particular car would not be strength- ened very much. A person is more likely to
repeat a behavior when it results in a consistent reinforcing consequence. That is, a behavior is
strengthened when a reinforcer is contingent on the behavior (when the reinforcer occurs only if
the behavior occurs).

Motivating Operations

Some events can make a particular consequence more or less reinforcing at some times than at
other times. These antecedent events, called motivating operations (MOs), alter the value of a
reinforcer. There are two types of MOs; establishing operations and abolishing operations. An
establishing operation (EO) makes a reinforcer more potent (it establishes the effectiveness of
a reinforcer). An abolishing operation (AO) makes a reinforcer less potent (it abolishes or
decreases the effectiveness of a reinforcer). Motivating operations have two effects: a) they
alter the value of a reinforcer and b) they make the behavior that produces that rein- forcer more
or less likely to occur at that time. An EO makes a reinforcer more potent and makes a behavior
that produces the reinforcer more likely. An AO makes a reinforcer less potent and makes a
behavior that produces that reinforcer less likely.

Let's consider some examples of establishing operations. Food is a more pow- erful reinforcer
for a person who hasn't eaten recently. Not having eaten in a while is an EO that makes food
more reinforcing at that time and makes the behavior of getting food more likely to occur.
Likewise, water is a more potent reinforcer for someone who has not had a drink all day or who
just ran 6 miles. Water or other beverages are more reinforcing when a person just ate a large
amount of salty popcorn than when a person did not. (That is why some bars give you free salty
popcorn.) In these examples, going without food or water (dep- rivation), running 6 miles, and
eating salty popcorn are events called establishing operations because they increase the
effectiveness of a reinforcer at a particular time or in a particular situation and make the
behavior that results in that rein- forcer more likely to occur.
Deprivation is a type of establishing operation that increases the effectiveness. of most
unconditioned reinforcers and some conditioned reinforcers. A particular reinforcer (such as
food or water) is more powerful if a person has gone without it for some time. For example,
attention may be a more powerful reinforcer for a child who has gone without attention for a
period of time. Similarly, although money is almost always a reinforcer, it may be a more
powerful reinforcer for someone who has gone without money (or enough money) for a period of
time. In addition, any circumstances in which a person needs more money (e.g., unex- pected
doctor bills) make money a stronger reinforcer.

Individual Differences

The likelihood of a consequence being a reinforcer varies from person to person, so it is


important to determine that a particular consequence is a reinforcer for a particular person. It is
important not to assume that a particular stimulus will be a reinforcer for a person just because it
appears to be a reinforcer for most people. For example, praise may be meaningless to some
people, even though it is a reinforcer for most. Chocolate candy may be reinforcers for most
children, but it won't be for the child who is allergic to chocolate and gets sick when she cats it.
Chapter 15 discusses vari- ous ways to identify which consequences function as reinforcers for
people.

Magnitude

The other characteristic of a stimulus that is related to its power as a reinforcer is its amount or
magnitude. Given the appropriate establishing operation, generally, the effectiveness of a
stimulus as a reinforcer is greater if the amount or magnitude of a stimulus is greater. This is
true for both positive and negative reinforcement. A larger positive reinforcer strengthens the
behavior that produces it to a greater extent than a smaller amount or magnitude of the same
reinforcer does. For example, a person would work longer and harder for a large amount of
money than for a small amount. Likewise, the termination of a more intense aversive stimulus
strengthens the behavior that terminates it more than the termination of a lower magnitude or
intensity of the same stimulus would. For example, a person would work harder or engage in
more behavior to decrease or eliminate an extremely pain- ful stimulus than a mildly painful
stimulus. You would work a lot harder to escape from a burning building than you would to get
out of the hot sun.

SCHEDULES OF REINFORCEMENT

The schedule of reinforcement for a particular behavior specifies whether every response is
followed by a reinforcer or whether only some responses are followed by a reinforcer. A
continuous reinforcement schedule (CRF schedule) is one in which each occurrence of a
response is reinforced. In an intermittent reinforcement schedule, by contrast, each
occurrence of the response is not reinforced. Rather, responses are occasionally or
intermittently reinforced.
A CRF schedule is used when a person is learning a behavior or engaging in the behavior for
the first time. This is called acquisition: The person is acquiring a new behavior. Once the
person has acquired or learned the behavior, an intermittent reinforcement schedule is used so
that the person continues to engage in the behavior. This is called maintenance: The behavior
is maintained over time with the use of inter- mittent reinforcement. A supervisor could not stand
by Maria and praise her for every correct behavior every day that she works. Not only is this
impossible, but it is also unnecessary. Intermittent reinforcement is more effective than a CRF
schedule for maintaining a behavior.

Fixed Ratio

In fixed ratio and variable ratio schedules of reinforcement, the delivery of the rein- forcer is
based on the number of responses that occur. In a fixed ratio (FR) sched- ule, a specific or fixed
number of responses must occur before the reinforcer is delivered. That is, a reinforcer is
delivered after a certain number of responses. For example, in a fixed ratio 5 (FR 5) schedule,
the reinforcer follows every fifth response. In an FR schedule, the number of responses needed
before the reinforcer is delivered does not change. Ferster and Skinner (1957) found that
pigeons would engage in high rates of responding on FR schedules, however, there was often a
brief pause in responding after the delivery of the reinforcer. Ferster and Skinner investigated
FR schedules ranging from FR 2 to FR 400, in which 400 responses had to occur before the
reinforcer was delivered. Typically, the rate of responding is greater when more responses are
needed for reinforcement in an FR schedule.

FR schedules of reinforcement sometimes are used in academic or work set- tings to maintain
appropriate behavior. Consider the example of Paul, a 26-year-old adult with severe intellectual
disability who works in a factory packaging parts for shipment. As the parts come by on a
conveyor belt, Paul picks them up and puts them into boxes. Paul's supervisor delivers a token
(conditioned reinforcer) after every 20 parts that Paul packages. This is an example of an FR
20. At lunch and after work, Paul exchanges his tokens for backup reinforcers (e.g., snacks or
soft drinks). An FR schedule could be used in a school setting by giving students rein- forcers
(such as stars, stickers, or good marks) for correctly completing a fixed num- ber of problems or
other academic tasks. Piece-rate pay in a factory, in which workers get paid a specified amount
of money for a fixed number of responses (e.g., $5 for every 12 parts assembled), is also an
example of an FR schedule.

Variable Ratio

In a variable ratio (VR) schedule, as in an FR schedule, delivery of a reinforcer is based on the


number of responses that occur, but in this case, the number of responses needed for
reinforcement varies each time, around an average number. That is, a reinforcer is delivered
after an average of x responses. For example, in a variable ratio 10 (VR 10) schedule, the
reinforcer is provided after an average of 10 responses. The number of responses needed for
each reinforcer may range from just 2 or 3 up to 20 or 25, but the average number of responses
equals 10. Ferster and Skinner (1957) evaluated VR schedules with pigeons and found that
such schedules produced high, steady rates of responding; in contrast with FR schedules, there
is little pausing after the delivery of the reinforcer. In their research, Ferster and Skinner
evaluated various VR schedules, including some that needed a large number of responses for
reinforcement.

Fixed Interval

With interval schedules (fixed interval, variable interval), a response is reinforced only if it occurs
after an interval of time has passed. It does not matter how many responses occur; as soon as
the specified interval of time has elapsed, the first response that occurs is reinforced. In a fixed
interval (FI) schedule, the interval of time is fixed, or stays the same each time. For example, in
a fixed interval 20-second (FI 20-second) schedule of reinforcement, the first response that
occurs after 20 seconds has elapsed results in the reinforcer. Responses that occur before the
20 seconds are not reinforced; they have no effect on the subsequent delivery of the reinforcer
(i.e., they don't make it come any sooner). Once the 20 seconds has elapsed, the reinforcer is
available, and the first response that occurs is rein- forced. Then, 20 seconds later, the
reinforcer is available again, and the first response that occurs produces the reinforcer.

Variable Interval

In a variable interval (VI) schedule of reinforcement, as in an FI schedule, the reinforcer is


delivered for the first response that occurs after an interval of time has elapsed. The difference
is that in a VI schedule, each time interval is a differ- ent length. The interval varies around an
average time. For example, in a variable interval 20-second (VI 20-second) schedule,
sometimes the interval is more than 20 seconds and other times it is less than 20 seconds. The
interval length is not predictable each time, but the average length is 20 seconds. Ferster and
Skinner (1957) investigated various VI schedules of reinforcement. They found that the pattern
of responding on a VI schedule was different from that on an FI schedule. On the VI schedule,
the pigeon's behavior (pecking the key) occurred at a steady rate, whereas on the FI schedule,
the frequency decreased in the early part of the interval and increased near the end of the
interval. Because the length of the interval-and thus the availability of the reinforcer-was
unpredictable in a VI schedule, this off-and-on pattern of responding did not develop.

You might also like