Calvin, MC Dowell 2016

Behavioural Processes 127 (2016) 52–61
Contents lists available at ScienceDirect
Behavioural Processes
journal homepage: www.elsevier.com/locate/behavproc
Extending unified-theory-of-reinforcement neural networks to

steady-state operant behavior
Olivia L. Calvin ∗,1 , J.J. McDowell
Department of Psychology, Emory University, Atlanta, Georgia
a r t i c l e i n f o a b s t r a c t
Article history: The unified theory of reinforcement has been used to develop models of behavior over the last 20 years
Received 1 February 2016 (Donahoe et al., 1993). Previous research has focused on the theory’s concordance with the respondent
Received in revised form 16 March 2016 behavior of humans and animals. In this experiment, neural networks were developed from the theory
Accepted 23 March 2016
to extend the unified theory of reinforcement to operant behavior on single-alternative variable-interval
Available online 24 March 2016
schedules. This area of operant research was selected because previously developed neural networks
could be applied to it without significant alteration. Previous research with humans and animals indicates
Keywords:
that the pattern of their steady-state behavior is hyperbolic when plotted against the obtained rate of
Agent-based modeling
Neural networks
reinforcement (Herrnstein, 1970). A genetic algorithm was used in the first part of the experiment to
Steady-state behavior determine parameter values for the neural networks, because values that were used in previous research
Operant behavior did not result in a hyperbolic pattern of behavior. After finding these parameters, hyperbolic and other
Quantitative law of effect similar functions were fitted to the behavior produced by the neural networks. The form of the neural
Matching law network’s behavior was best described by an exponentiated hyperbola (McDowell, 1986; McLean and
White, 1983; Wearden, 1981), which was derived from the generalized matching law (Baum, 1974).
In post-hoc analyses the addition of a baseline rate of behavior significantly improved the fit of the
exponentiated hyperbola and removed systematic residuals. The form of this function was consistent
with human and animal behavior, but the estimated parameter values were not.
© 2016 Elsevier B.V. All rights reserved.
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.1. Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2. Apparatus and materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3.1. Phase I: finding viable UTR-neural-network parameter values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.2. Phase II: evaluation of steady-state behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.1. Phase I: finding viable UTR-neural-network parameter values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2. Phase II: evaluation of steady-state behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1. Introduction
The central assertion of the unified theory of reinforcement

∗ Corresponding author at: Department of Psychology, Emory University, Atlanta,
(UTR) is that behavior in operant and respondent experiments is
GA 30322, Georgia.
a result of the same neural process (Donahoe et al., 1993). This is
E-mail address: ncalvin@emory.edu (O.L. Calvin). a general theory that describes the internal biological processes
1
Went by the name of Nicholas T. Calvin in previously published work. that lead to both operant and respondent behavior. To evaluate
http://dx.doi.org/10.1016/j.beproc.2016.03.016
0376-6357/© 2016 Elsevier B.V. All rights reserved.
O.L. Calvin, J.J McDowell / Behavioural Processes 127 (2016) 52–61 53
the plausibility of the UTR as an account for behavior, biologi- The parameter a allows for systematic deviations from the exact
cally inspired models have been developed and evaluated (Burgos, matching of ratios of behavior and reinforcement, which are termed
1996, 1997, 2003, 2005, 2007; Burgos and Murillo-Rodríguez, 2007; under- and over-matching. The re in Eq. (2) has a slightly differ-
Burgos et al., 2008; Burns et al., 2011; Calvin and McDowell, 2015; ent meaning than in Eq. (1) because its full theoretical expression
Donahoe, 2002; Donahoe and Burgos, 1999, 2000; Donahoe et al., is re a /b, which could be interpreted as the relative value of the
1993, 1997a,b; Sánchez et al., 2010). UTR-inspired models do not reinforcers obtained by unmeasured behavior. The interpretation
explicitly distinguish between respondent and operant contingen- changes because the bias parameter, b, accounts for systematic dif-
cies, but in their functioning they adapt to both contingencies. ferences in the reinforcing values of measured and unmeasured
The theory states that behavior adapts by adjusting the strength reinforcers. For the purposes of fitting the equation it is simplified
of neural connections in response to positive consequences. In to a single parameter, because re and b cannot be independently
the absence of positive consequences, neural connections slowly estimated. The exponent parameter, a, adjusts the form of the func-
weaken and behavior is less likely to be observed. Through repeated tion by bending it at the lowest rates of obtained reinforcement. If
interactions with the environment, UTR-inspired models adapt the value of a is greater than 1 then the function tends to flatten at
their behavior to environmental events. the lowest rates of obtained reinforcement, and when less than 1
Research with UTR-based models has focused on respondent the function becomes steeper. While this exponentiated hyperbola
behavior, and very little has been done to examine operant behav- is similar to the hyperbola specified by Eq. (1), it has some unique
ior. The only exceptions to this were demonstrations of operant fitting characteristics, and is based on the more strongly supported
conditioning by UTR-based models (Donahoe et al., 1993; Calvin generalized matching law (McDowell, 2013).
and McDowell, 2015). That a behavior is more frequently observed To assess the UTR’s predictions it is necessary to simulate UTR-
when the behavior is followed by positive consequences was a min- inspired neural networks. The complex and flexible behavior of
imum prerequisite for UTR-based models to be plausible, and it was these neural networks comes from the interactions of relatively
important that this was demonstrated. Since the UTR must account simple components. At their simplest, these networks are built
for both operant and respondent behavior, more operant research from two types of components: neural processing units (NPUs) and
would enhance its plausibility as an account for behavior. connections. These components serve different functions within
An area of operant research to which previously explored UTR- the networks, with NPUs primarily determining how the network
based models can be applied without significant alteration is the will behave in the immediate future, and connections transmitting
quantitative law of effect (Herrnstein, 1970). The quantitative and regulating the importance of NPU determinations. Connections
law of effect is a development of the matching law (Herrnstein, are very important, because they are the components of neural
1961) that added important theoretical underpinnings in order networks that adapt to the environment. Connections adapt by
to understand behavior on single-alternative variable-interval (VI) changing their strength, which regulates how important the NPU
schedules. The quantitative law of effect has been shown to describe at the beginning of the connection is to the NPU at its end point.
single-alternative VI behavior of animals (e.g., Herrnstein, 1970; If a positive consequence follows behavior, then the connections
McSweeney et al., 1983; reviewed in McDowell, 2013) and humans that previously led to that behavior are strengthened, which makes
(Beardsley and McDowell, 1992; Bradshaw et al., 1976, 1977, 1978; that behavior more likely to occur in the future. Detailed mathe-
Fernandez et al., 1995; McDowell and Wood, 1984, 1985). Her- matical descriptions of the network components may be found in
rnstein developed the quantitative law of effect by making the multiple articles (Burgos, 2003, 2007; Burgos et al., 2008; Calvin
two important assumptions that humans and animals engage in and McDowell, 2015; Donahoe et al., 1993; Sánchez et al., 2010),
constant rates of behavior, and that all behavior is choice. These but their exact mathematical functioning is not a critical compo-
assumptions extended the matching law, which describes behav- nent of the UTR and has been omitted in this paper for the sake of
ior on concurrent VIVI schedules, to single VI schedules (Herrnstein, concision. Of the articles that provide mathematical descriptions,
1970). The quantitative law of effect is a hyperbola, Calvin and McDowell (2015) provides a particularly clear descrip-
tion of the networks and also includes a copy of the code that was
kR used to conduct those experiments, which is helpful to whoever is
B= , (1)
R + re interested in replicating UTR neural networks.
While the exact functioning of connections and NPUs are
where B is the observed rate of a target behavior, R is the rate not theoretically important to the UTR, their arrangement and
of obtained reinforcement, and k and re are estimated parameters roles within the network are especially important to the theory
that have important theoretical interpretations. In the theory, the (Donahoe et al., 1993). The standard UTR neural network is orga-
k parameter is the maximum rate of behavior, which is assumed nized into four distinct layers, as shown in Fig. 1. From left to right
to be constant. If the organism is not engaging in a targeted oper- these layers are the input (IN), hippocampal (HIP), dopaminergic
ant behavior then it is assumed to be engaging in other behaviors (DOP), and output (OUT) layers. Information about environmental
that may result in beneficial outcomes. The unmeasured extrane- stimuli and the consequences of behavior are given to the neural
ous behavior is assumed to occasionally result in reinforcement, network in the input layer, thus acting as its eyes and ears. The hip-
which is the re parameter. Mathematically, the hyperbola asymp- pocampal and dopaminergic layers then process this information
totes at k, and re is the point on the x-axis that predicts a rate of to determine how the network should behave in the environment.
behavior that is half that of k (Bradshaw et al., 1976). The output layer implements this decision by interacting with the
An improved version of the matching law was developed by environment. By processing information through these layers the
Baum (1974) to account for systematic inaccuracies in the original neural network engages with and adapts to its environment.
version, and can be used to develop a single alternative version in The architecture of UTR neural networks can be subdivided
the same way that Eq. (1) was derived (McDowell, 1986; McLean into response and learning pathways that cross all four layers. The
and White, 1983; Wearden, 1981). The simplified version of this response pathway determines which, if any, behaviors are evoked
generalized quantitative law of effect is an exponentiated hyper- or elicited by the environment, and the learning pathway adapts
bola, the response pathway to the environment by changing connection
strengths. In Fig. 1, the learning pathway is shaded gray to differ-
kRa entiate it from the response pathway. The network is selectionist
B= . (2)
Ra+ re because behaviors become more likely to occur when followed
54 O.L. Calvin, J.J McDowell / Behavioural Processes 127 (2016) 52–61
reinforcement. The connections that are affected by this learning

process are the gray regions that lead from the NPU labeled Hip in
Fig. 1. By changing the strength of connections, the neural network
adapts to its environment.
With these networks it was possible to evaluate the UTR’s plau-
sibility as an account of operant behavior. A strong quantitative
evaluation of these networks’ operant behavior was permitted by
determining whether it was well described by the quantitative law
of effect. By being well described by the quantitative law of effect,
these neural networks would suggest that the UTR is a plausible
account for the behavior of humans and animals on single alter-
native VI schedules. An important reason to evaluate UTR-based
neural networks against the quantitative law of effect was that no
significant changes would be necessary. This would extend the typ-
ically used neural networks to the area of operant behavior, and it
Fig. 1. The architecture of a standard 11 -31 -31 -2 unified-theory-of-reinforcement would remain consistent with previous research. This project was
neural network, which was the network architecture used in this project. Stan- broken down into two phases. Phase 1 determined viable neural
dard UTR neural networks have four layers, which are input (IN), hippocampal network parameter values using a genetic algorithm, because in
(HIP), dopaminergic (DOP), and output (OUT). The letters within the NPUs indicate pilot testing, typically used neural network parameter values did
special functions (S = stimulus detecting, R = response emitting, Hip = hippocampal,
Dop = dopaminergic, * = unconditioned). The design of this figure is a modification
not result in patterns of behavior that were hyperbolic. Phase 2
of Fig. 1 in Calvin and McDowell (2015). quantitatively evaluated the behavior produced by neural networks
using the parameter values found in Phase 1. Combined, these two
phases address whether typically used UTR-based neural networks
by beneficial consequences, and the absence of beneficial conse- are a viable account for operant behavior in single VI conditions.
quences results in less behavior. The design and arrangement of
these two pathways are the primary ways that the neural network
2. Methods
models are derived from the UTR.
UTR neural networks have hippocampal (HIP in Fig. 1) and
2.1. Subjects
dopaminergic (DOP) interneuron layers in the response pathway,
which serve different functions. The hippocampal interneuron
Standard UTR neural networks (Calvin and McDowell, 2015;
layer simulates the brain’s sensory-association area by interpreting
Donahoe and Burgos, 2000; Burgos, 2003, 2005; Burgos and
environmental stimuli (Burgos, 2003; Burgos et al., 2008; Donahoe,
Murrilo-Rodríguez, 2007; Burgos et al., 2008; Sánchez et al., 2010)
2002; Donahoe et al., 1997a,b). This layer interprets the environ-
were used in this experiment. The architecture of these neural net-
ment by combining many simple stimuli, like a flat surface and
works is depicted in Fig. 1. In the response pathway there was one
legs, into more complex stimuli, such as a table. This complex stim-
NPU in the input layer, three in the hippocampal layer, three in the
ulus information is then passed to the dopaminergic interneuron
dopaminergic layer, and two in the output layer. In the learning
layer, which behaves as a motor-association area because it deter-
pathway, there was one NPU in the input layer, one in the hip-
mines how the network will engage with the environment (Burgos,
pocampal layer, and one in the dopaminergic layer. The starting
2003; Burgos et al., 2008; Donahoe, 2002; Donahoe et al., 1997a,b).
strengths of the connections that terminated in the hippocampal
The response layer then implements the motor-association area’s
layer were 0.15. The starting strengths of all other connections were
determinations by producing behaviors. Through this process the
0.01. These are the same starting connection strengths that have
network selects which behaviors to engage in.
been used in recent research (Calvin and McDowell, 2015; Sánchez
The response pathway would be uselessly inflexible if the net-
et al., 2010).
work did not possess the learning pathway. The learning pathway
It was necessary to add a parameter to the neural network
adapts the response pathway’s connections to the network’s envi-
model to conduct this experiment. The new parameter was added
ronment by responding to unconditioned stimuli. If a beneficial
because any response NPUs activation level (i.e., whether the net-
unconditioned stimulus is presented then the unconditioned stim-
work will behave) greater than 0 was considered a response in
ulus NPU, S* in Fig. 1, is activated (Donahoe et al., 1993). The
previous research. This was problematic because the value was
activation of the unconditioned stimulus NPU causes a reinforc-
always greater than 0, which resulted in constant responding. This
ing signal to be passed to the unconditioned/conditioned response
was solved by adding a response threshold parameter, r␪ , that
NPU, R*, and dopaminergic NPU, Dop. This signal causes the uncon-
needed to be exceeded by the response NPU’s activation level. A
ditioned/conditioned response NPU to emit an unconditioned
very small threshold of 0.0005 was sufficient to create adaptive
response, and the dopaminergic NPU to strengthen the network’s
responding that increased in the presence of more reinforcers and
connections that caused the last behavior.
decreased in their absence.
After reinforcement, the neural network adapts to the envi-
ronment by changing the strength of all of its connections. The
learning pathway affects the response pathway’s layers in two 2.2. Apparatus and materials
fundamentally different ways. Connections that terminate in the
dopaminergic and output layers adapt the most when the network’s The software was written by the first author, and experiments
predicted consequence of a behavior and the actual consequence were conducted on a computer using the Windows 7 operating
are very different. The gray regions in Fig. 1 that lead from the NPU system. The computer used for experimentation had a dual core
labeled Dop indicate which connections are affected by this rein- 1.6 GHz processor with 6 GB of RAM. The neural network and lab-
forcement signal. Connections that terminate in the hippocampal oratory code were written in VB.Net 2010, which is a common
layer are affected when environmental stimuli change. This change programming language. Calvin and McDowell (2015) includes a
is dramatically enhanced when the network is reinforced, which link to download a copy of this code The response and reinforce-
allows the network to identify which stimuli frequently precede ment counts were recorded and stored in standard databases (i.e.,
Fig. 2. Rate of behavior as a function of obtained reinforcement rate for the five evolved neural network solutions.
Table 1 two phases evaluated whether the operant behavior of UTR neu-
Fitted equations.
ral networks resembled that of animals and humans working on VI
Function Fitted equation schedules.
kR Both phases of the experiment utilized the same simulated
Hyperbola B=
R + re operant chamber. The operant chamber consisted of a single
kRa
Exponentiated Hyperbola B= a operandum that delivered unconditioned reinforcers according to
R + re
Asymptotic Exponential B = k (1 − e−re R ) random-interval (RI) schedules. The RI schedules used in this exper-
Asymptotic Power B = kR−re iment drew random intervals from an exponential distribution and
Logarithmic ⎧re (R)
B = log are, thus, idealized Fleshler-Hoffman VI schedules (Fleshler and
⎨ re R, 0 ≤ r ≤ rk Hoffman, 1962; McDowell et al., 2008). Eleven RI schedules were
e
Ramp B= presented in a random order, and the transitions between sched-
⎩ k, 0 >
k
ules were signaled by a blackout period (i.e., the activation levels
re
of the NPUs in the neural networks were set to 0 between schedule
With an Operant Baseline (b)
Hyperbola + b kR
B = R+r +b
presentations). The RI means of the 11 schedules were 2, 3, 5, 8,
e
B = RkR
a 12, 17, 25, 45, 85, 145, and 225 time steps, and each schedule was
Exponentiated Hyperbola + b a +re + b
Asymptotic Exponential + b B = k (1 − e−re R ) + b presented for 10,500 time steps in Phase 1 and 20,500 time steps
Asymptotic Power + b B = kR−re + b in Phase 2. Time steps are arbitrary units of time that approximate
Logarithmic + b B= logre (R) + b the amount of time it takes for the neural networks to engage in a
k
re R + b, 0 ≤ r ≤ single operant response. Unconditioned reinforcers were delivered
Ramp + b B= re
k one time step after the response that earned them because food
k + b, 0 >
re delivery is not instantaneous (Donahoe et al., 1993).
Prior to engaging with the set of single RI schedules, the neu-
text files and Microsoft Excel). Data were analyzed using standard ral networks were trained to engage in operant behavior. This step
software (i.e., Microsoft Excel). served two purposes: (1) to prepare the neural network for the
experiment and (2) to ensure that the neural network could engage
in meaningful behavior. To assess whether each neural network
2.3. Procedures
was capable of operantly behaving it was required to demonstrate
operant conditioning within 500 trials. For each trial, a conditioned
This project consisted of two phases that were designed to
stimulus was presented for 5 time steps (i.e., the stimulus NPU
assess whether the operant behavior of UTR neural networks was
was maximally activated). If the response NPU’s activation level
well described by the quantitative law of effect. The goal of the
exceeded the response threshold parameter, r␪ , on the 4th time
first phase of the experiment was to determine parameter val-
step then an unconditioned reinforcer was delivered on the 5th
ues for UTR neural networks that efficiently collect reinforcers
time step (i.e., the unconditioned stimulus NPU was maximally acti-
by maximizing the number of collected reinforcers while simul-
vated). If the activation level of the response NPU was greater than
taneously minimizing effort. The second phase of the experiment
0.8 on the 4th time step during any of the 500 trials the network
evaluated how well the operant behavior of these neural networks
was considered to have acquired operant responding, because it
was described by the quantitative law of effect. Combined, these
indicated a strongly determined behavior. If the activation level that interacted with a simulated operant chamber. Decimal values
failed to exceed this value on the 4th time step, then the neural for the UTR-neural networks were translated from each solution’s
network was deemed incapable of operantly responding and no binary genotype. For 8 of the 10 parameters the decimal values
further testing was performed with that neural network. ranged from 0 to 1. To translate the binary genotype of each of these
parameters to a decimal value, the solution’s binary genotype was
2.3.1. Phase I: finding viable UTR-neural-network parameter turned into its integer equivalent and then divided by 1023. For
values example, the genotype of 0101010101 would describe the decimal
A genetic algorithm was used to find viable solutions that may value of 0.333 (341/1023). This method gave the parameter val-
be well described by the quantitative law of effect. This phase ues a precision of approximately three decimal places. The decimal
was necessary because in pilot studies, typically used parame- value range for the other 2 parameters, response activation thresh-
ter values [i.e., Logistic Function (␦ = 0.5, ␥ = 0.1), ␶ = 0.1, ␬ = 0.1, old (r␪ ) and reinforcement threshold (d␪ ), extended from 0 to 0.1
Gaussian Threshold (␮= 0.20, ␴= 0.15), r␪ = 0.0005, ␣ = 0.5, ␤ = 0.1, because these values needed to be small and precise. To translate
& d␪ = 0.001; see Calvin and McDowell (2015) for mathematical the binary genotype of these 2 parameters to a decimal value, the
details regarding these parameters] did not generate a pattern of integer equivalent of the binary genotype was divided by 10230,
behavior that was a hyperbola. Genetic algorithms are an engi- which gave them precision to the fourth decimal place.
neering method that recursively determine a point of optimality After building the neural networks from the population solu-
by approximating the process of evolution (Holland, 1975). Dona- tions, each network then interacted with the simulated operant
hoe and Burgos have previously used this method to develop neural chamber described in Section 2.3. The number of times the neu-
networks that could discriminate between overlapping and distinct ral networks engaged in the operant behavior and the number of
discriminative stimuli (Burgos, 1996, 1997; Donahoe and Burgos, reinforcers acquired were recorded. After all of the neural networks
1999; Donahoe, 2002). This technique is especially useful in finding had engaged with all 11 schedules of reinforcement for 10,500 time
a point of optimality within an unstudied and complex parameter steps, the genetic algorithm selected which solutions would be used
space. It would be extremely difficult, if not impossible, to a priori to create the next generation of solutions.
predict what combination of parameter values would result in a
desired pattern of behavior, because there has not been a para- 2.3.1.1. Step 1: selection. Solutions in the population were assigned
metric study of UTR neural networks, and the parameters have fitness values and those that were fitter were selected to create the
complex nonlinear effects on the network’s behavior. This made next generation. Designating which solutions are more fit than oth-
the genetic algorithm methodology ideal for finding UTR-neural- ers is challenging, but the most straightforward and obvious fitness
network parameter values that might be well described by the criterion would be how well Eq. (1) fitted the pattern of behavior
quantitative law of effect. produced by a potential neural-network solution. This is, however,
Genetic algorithms find a solution to a problem by evolving a a problematic fitness criterion, which was not discovered until it
population of potential solutions. The population of potential solu- was attempted. This direct and simple criterion resulted in the evo-
tions used by the genetic algorithm is conceptually similar to a lution of a neural network that constantly responded regardless
species. As with every member of a species, each potential solution of the rate of reinforcement (i.e., a horizontal line). This occurred
has a genotype and expresses a phenotype. The solution’s genotype because Eq. (1) perfectly fits this pattern of behavior when re is 0.
is a binary representation of the parameter values that are used This initial failure led to the development of a more theoretically
to regulate the behavior of UTR-neural networks. The solution’s relevant definition of fitness. The quantitative law of effect states
phenotype is the behavior of a UTR-neural-network built from the that humans and animals efficiently distribute behavior based on
parameter values that are encoded in its binary genotype. The pop- the relative rates of reinforcement that behavior produces. A fit-
ulation of possible solutions evolves over multiple generations by ness criterion that captures this idea is that behavior is allocated
selecting fitter solutions to reproduce and letting less fit solutions to maximize the number of reinforcers acquired while minimizing
disappear from the population. opportunity costs. The fitness can thus be expressed as
Algorithmically this evolutionary process was broken down into
Fitness = vR − B, (3)
three steps: selection, reproduction, and mutation. In the selection
step, possible solutions compete to reproduce. Selected solutions where B is the number of operant behaviors, R is the number of
then reproduce to create a new population of solutions. Random acquired reinforcers, and v is a multiplier that adjusts the value of
mutations are then added to the new population to create novel the reinforcers relative to opportunity costs. B is subtracted from
variation. By repeating these three steps over many generations, the R to simulate an opportunity cost; every time there is an operant
population of potential solutions adapts to the single RI environ- behavior there are many other behaviors that may have resulted
ment in the same way that a species can adapt to its environment in beneficial consequences. In this phase, 5v values were used in
over many generations. separate conditions to obtain a sample of viable solutions. The v
To begin the genetic algorithm, an initial population of 100 values were set such that reinforcers were 5, 10, 15, 20, or 25 times
potential solutions was created. The initial population was gen- more valuable than the cost of each behavior. The only exception
erated from 100 copies of a seed solution that were then heavily to Eq. (3) was that if the neural network did not learn to respond
mutated at a rate of 5% per genotypic bit. Each bit of the seed when introduced to the operant chamber, its fitness was set to the
solution’s binary genotype had a 5% chance of switching from 0 lowest possible value.
to 1 or from 1 to 0. The seed solution’s parameters were found Selection occurred after all the solutions within a population
by exploratory sampling, and produced a pattern of behavior that were assigned fitness values. Parent solutions were selected by
was closer to being hyperbolic than typically used parameters. The having tournaments of small subgroups of the population. Each
seed parameters for NPUs were: Logistic Function (␦ = 0.5, ␥ = 0.1), tournament compared 5 solutions that were randomly selected
␶ = 0.1, ␬ = 0.1, Gaussian Threshold (␮= 0.20, ␴= 0.15), & r␪ = 0.0005. from the population of solutions. The solution with the highest fit-
The seed parameters for the connections were: ␣ = 0.7, ␤ = 0.15, & ness among the subgroup of competitors was selected to be a parent
d␪ = 0.005. of the next generation. In the event of a tie among the competi-
After initializing the genetic algorithm, each solution was used tors, no solution was selected and another set of competitors was
to create a neural network. Each solution in the population pro- drawn. This process was repeated until 100 parent solutions were
vided 10 parameter values that regulated a UTR-neural network selected. Although unlikely, the same solution could be selected to
Table 2
Evolved parameter values for the five fitness criteria (v) conditions.
v Connection Activation Level Gaussian Logistic
␣ ␤ d␪ ␶ ␬ r␪ ␮ ␴ ␦ ␥
5 0.70 0.15 0.0034 0.07 0.35 0.0005 0.20 0.15 0.50 0.10
10 0.70 0.15 0.0050 0.60 0.12 0.0036 0.20 0.15 0.50 0.10
15 0.70 0.15 0.0019 0.60 0.10 0.0038 0.20 0.15 0.50 0.10
20 0.95 0.15 0.0050 0.07 0.11 0.0005 0.20 0.15 0.50 0.10
25 0.95 0.15 0.0112 0.07 0.11 0.0005 0.20 0.15 0.50 0.10
be a parent all 100 times, but if this occurred then the entire set of Six functions with similar forms were fitted to the pooled data
100 parents were discarded and the process was repeated. At the of each solution to determine whether the patterns of behavior
end of this step, 100 parent solutions had been chosen from the were uniquely described by the quantitative law of effect. This
population to create a new population. gave a total of 110 data points (10 networks × 11 rates of rein-
forcement) per solution. The first two functions that were fitted
to the data were the hyperbolic quantitative law of effect, Eq. (1),
2.3.1.2. Step 2: reproduction. After the parent solutions were
and the exponentiated-hyperbolic quantitative law of effect, Eq. (2).
selected, they were used to create a new population of solutions.
The other four functions were asymptotic exponential (McDowell,
For each member of the new population, 2 parents were randomly
2004),
chosen with replacement from among the 100 parent solutions. The
genotypes of these two parent solutions were combined to create B = k (1 − e−re R ) ,
the new solution by taking the beginning part of one parent’s geno-
type and attaching it to the end part of another parent’s genotype. asymptotic power (de Villers and Herrnstein, 1976; McDowell,
The new solution’s genotype was represented by a binary string 2004),
that was 100 bits long. The exact point in the binary string where B = kR−re ,
the new solution’s genotype switches from the first to the second
parent was randomly determined. Because this point was random, logarithmic (McDowell, 2004),
the new solution’s genotype did not equally represent both parent B = logre (R) ,
solutions. For example, the new solution’s genotype initial 10 bits
could come from the first parent and the remaining 90 bits from the and ramp (Beardsley and McDowell, 1992; McDowell, 2004),
second parent. While the majority of the new solution’s genotype ⎧
⎪ k
was allowed to come from a single parent, at least 1 bit had to come ⎨ re R, 0 ≤ r ≤
re
from each parent. This process was repeated 100 times to create a B= .
new population of solutions that was the same size as the previous ⎪
⎩ k
k, 0 >
generation. re
For all equations, B and R represent the rates of response and
reinforcement, respectively; k represents the asymptote or upper
2.3.1.3. Step 3: mutation. Each solution of the new generation of
limit of each function and re represents the rate of change. The last
possible solutions was then mutated. Every bit of each solution’s
four functions are not theoretically important, but they evaluate the
genotypes was susceptible to being randomly flipped from 0 to
uniqueness of the fits to the quantitative laws of effect. All functions
1 or from 1 to 0. The chance of each bit mutating was 1 in 100.
were fitted by the method of least squares and their equations are
With this rate of mutation, the probability of a solution’s genotype
summarized in Table 1.
having at least one bit flip was 64% and was 10% for any specific
Additional equations were fitted in post-hoc analyses after
parameter. This also gives an expected value of 1 parameter value
reviewing parameter values and residuals. The residuals indicated
changing within each solution. After mutation, the new popula-
that a y-intercept of 0, which is true of all of the previously listed
tion of solutions was used to create neural networks that engaged
equations, was an incorrect assumption for the behavior produced
with the simulated operant chamber. After these neural networks
by these neural networks. An additional parameter was therefore
engaged with all 11 schedules of reinforcement, the next genera-
added to each of the 6 main equations that permitted a y-intercept
tion was produced by going back to Step 1. The genetic algorithm
that was greater than 0. For example, Eq. (1) was changed to
was stopped after 30 generations of potential solutions were cre-
ated. kR
B= + b.
R + re
2.3.2. Phase II: evaluation of steady-state behavior The parameter b that was added to each equation can be
The 5 solutions that were evolved using the different v values in interpreted as a baseline frequency of operant behavior that the
Eq. (3) were then each used to animate neural networks. Ten iden- networks engage in when no reinforcement is available. It seems
tical neural networks were used to evaluate each of the 5 solutions. intuitively necessary that the baseline frequency of operant behav-
These neural networks were tested in the same operant chamber ior must be greater than 0 because there must be occasional
that was used in the genetic algorithm. The 11 schedules of rein- behavior that could encounter the scheduled reinforcers. If no
forcement were presented for a longer duration, namely 20,500 behavior ever occurred then a human or animal would never
time steps. The number of obtained reinforcers and operant behav- encounter the scheduled reinforcers regardless of how rich the
iors on each of the 11 schedules were recorded. To restrict the schedule might be. Equations with operant baselines have been
assessment of each solution to just stable behavior, data from the previously fitted to single-alternative VI behavior, and the addition
first 500 time steps of each schedule were not included in anal- of the baseline parameter improved those fits (Navakatikyan et al.,
yses. The remaining counts of obtained reinforcers and operant 2013). In experiments with humans and animals, the operant base-
behaviors were divided by 40 to give average rates of behavior and line may be close to 0 but in this experiment it was a useful addition
reinforcement per 500 time steps. to help describe the behavior of the neural networks.
To determine which functions best characterized the behavior hyperbola, fits to the v = 10 and 15 solutions generated k parameter
of the neural networks, the percentage of variance accounted for, values that were greater than the maximum possible rate of behav-
the residuals, and the corrected Akaike Information Criteria (AICc ; ior. The maximum number of behaviors that can be exhibited by the
Sugiura, 1978) of each fit were compared. The AICc is a method of networks is 500 behaviors per 500 time steps. This violates the the-
comparing equations that balances the costs and benefits of addi- oretical meaning of k for these equations and also fails to accurately
tional parameters. A function that fits the data well should not have represent the observed patterns of behavior. Visual examination of
a visible trend in the residuals and a cubic polynomial fitted to the patterns of behavior shown in Fig. 2 suggested that the asymp-
the standardized residuals should account for little of the variance. tote for the v 10 and 15 solutions should not be greater than 500
The function that best described neural network behavior should because the rate of behavior slightly decreased at the highest rate
account for a large percentage of the variance, have no visible trend of reinforcement and it never approached 500.
in the residuals, and should have the lowest AICc value. Visual examination of the standardized residuals of the fits of the
hyperbola and exponentiated hyperbola suggested the addition of
an operant level adjustment to the equations may be beneficial. As
3. Results
shown in left side of Fig. 3, the standardized residuals of the hyper-
bola and exponentiated-hyperbola fits were systematic and well
3.1. Phase I: finding viable UTR-neural-network parameter values
described by cubic polynomials (hyperbola R2 = 0.66; exponenti-
ated hyperbola R2 = 0.67). The residuals indicate that the lowest
The genetic algorithm converged on a set of parameters within
predicted rates of behavior were much less than the observed
the first 20 generations for all 5 of the evolution conditions. The
rates. Additionally, in Fig. 2, if the lines for the rate of behav-
parameter values that were evolved under each of the 5 fitness
ior were extended to a y-intercept they would not intercept at 0.
conditions are listed in Table 2. The v = 5, 10, and 15 solutions pri-
Interestingly, this also appears to be the case with pigeons (Baum
marily evolved by changing the NPU’s activation level parameters
and Davison, 2014). Thus, the planned equations were adjusted by
(i.e., ␶, ␬, & r␪ ). The v = 20 and 25 solutions evolved by changing the
adding a y-intercept parameter, b.
connection parameters (i.e., ␣ & ␤). The d␪ parameter, which con-
The post-hoc analyses that included the y-intercept parameter
trols whether connection weights change, evolved to have slightly
provided a better understanding of the behavior of the UTR-neural-
different values across all of the evolutions and there was no iden-
network solutions. The exponentiated hyperbola with the operant
tifiable pattern to its values. To ensure that this experiment built
level adjustment accounted for the largest percentage of variance
upon previous research, as was advocated by Calvin and McDowell
for all solutions (Table 3). The AICc analysis also suggested that
(2015), the same experiments conducted in Calvin and McDowell
the exponentiated hyperbola with the y-intercept was the best
were replicated and the same results were obtained.
account across the 5 solutions (Table 4). Furthermore, the pattern
of residuals shown in the bottom right panel of Fig. 3 were not
3.2. Phase II: evaluation of steady-state behavior well described by a cubic polynomial (R2 = 0.001). Overall, these
data suggest that the behavior of the UTR-neural-network solu-
All 5 of the evolved solutions exhibited patterns of behavior that tions were best described by the exponentiated hyperbola with the
were roughly hyperbolic in form as is shown in Fig. 2. There were operant level adjustment.
slight differences in the patterns of behavior that reflect the groups
identified in Phase I. The rate of behavior of the v = 5, 10, and 15
solutions slightly decreased at the highest rate of obtained rein- 4. Discussion
forcement. This decrease is most pronounced for the v = 5 solution,
which exhibited a very low overall rate of behavior. The v = 20 and The genetic algorithm successfully evolved sets of parameter
25 solutions both reached their asymptotes more rapidly than the values for UTR neural networks that qualitatively behaved like
v = 10 and 15 solutions, and exhibited higher rates of behavior. The humans and animals. All 5 solutions were best characterized by
v = 20 and 25 solutions also remained at their asymptotes and did the exponentiated hyperbola with the operant level adjustment,
not decrease at the highest rate of reinforcement. which is the best supported equation for steady state behavior on
While all of the solutions were roughly hyperbolic, the planned single VI schedules. This fit the data very well, had random resid-
quantitative analysis determined which functions best character- uals, and had the best AICc values. Importantly, the asymptote
ized the solutions. Eq. (2), the exponentiated hyperbola, accounted plus the y-intercept, k + b, was approximately equal to the maxi-
for the largest percentage of variance when fitted to the v = 5, 10, mum rate of behavior that was permitted by the simulation. This
and 15 solutions, as shown in the top half of Table 3. The AICc means that the fit did not violate the underlying theory. Unfortu-
values also indicated that the exponentiated-hyperbola fits best nately, the a parameter values for those fits were problematic. The
accounted for the v = 5, 10, and 15 solutions, which is shown in range of values for the a parameter (1.59–1.82) did not match those
Table 4. The v = 20 and 25 solutions were best accounted for by the commonly seen with humans and animals in concurrent sched-
asymptotic exponential function as is indicated by both the per- ule experiments (Baum, 1974, 1979; McDowell, 1989; Myers and
centage of variance accounted for and AICc values. The v = 20 and Myers, 1977; Wearden and Burgess, 1982). Overall, it is good that
25 solutions were, thus, ruled out as viable solutions for simulat- the behavior of the UTR neural networks was best characterized
ing human and animal behavior because the best account for their by the exponentiated hyperbola with the y-intercept, but the a
patterns of behavior was not a hyperbola. parameter results did not match those of humans and animals.
The obtained parameter values for the exponentiated hyperbola, Some of the other equations also described the data well. The
which are shown in Table 5, were both promising and problem- asymptotic exponential with baseline adjustment fits were very
atic. The exponent, a, values for the v 5, 10, and 15 solutions were similar to the exponentiated hyperbola with baseline fits across
all within the range that is typically seen in humans and animals all of the solutions. As shown in Table 3, the asymptotic expo-
(e.g., Baum 1974, 1979; McDowell, 1989; Myers and Myers, 1977; nential with baseline adjustment accounted for only slightly less
Wearden and Burgess, 1982) when fitting the generalized matching variance than the exponentiated hyperbola with baseline adjust-
law. The problem with these parameter values is that k parameter ment. The AICc s were also similar, as shown in Table 4, with the
values, which govern the asymptote of the functions, are too large asymptotic exponential equally good at characterizing the v = 15
to adhere with theory. For both the hyperbola and exponentiated solution’s behavior and only slightly worse for the other conditions.
Table 3
Percentage of variance accounted for by different functions.
Function Reinforcer value (v)
5 10 15 20 25
Hyperbola 96.5 97.2 98.4 98.0 97.9

Exponentiated Hyperbola 96.6 97.7 98.4 99.3 99.3
Asymptotic Exponential 96.5 96.6 98.0 99.5a 99.7a
Asymptotic Power 95.6 95.0 94.0 84.3 83.3
Logarithmic 83.5 91.8 93.2 93.8 93.5
Ramp 93.6 89.6 92.3 96.9 96.5
With the Baseline Parameter

Hyperbola + b 96.9 98.5 98.8 98.6 98.5
Exponentiated Hyperbola + b 97.6a 99.3a 99.4a 99.6a 99.7a
Asymptotic Exponential + b 97.2a 99.2a 99.4a 99.5a 99.7a
Asymptotic Power + b 95.9 96.3 96.7 94.5 93.7
Logarithmic + b 89.4 93.8 95.6 94.6 93.8
Ramp + b 96.9 98.4 98.1 98.1 98.3
Note: The functions that accounted for the largest percentage of variance for the planned and post-hoc analyses are bolded.
a
Random residuals.
Table 4
AICc difference from the function with the smallest AICc.
Function Parameters Reinforcer Value (v)
5 10 15 20 25
Hyperbola 2 40 139 115 172 213

Exponentiated Hyperbola 3 38 120 111 53 92
Asymptotic Exponential 2 39 163 138 12 11
Asymptotic Power 2 64 205 257 396 440
Logarithmic 1 207 258 268 291 334
Ramp 2 105 286 286 217 268
With the Baseline Parameter

Hyperbola + b 3 27 76 83 133 176
Exponentiated Hyperbola + b 4 0 0 1a 0 0
Asymptotic Exponential + b 3 17 10 0 11 13
Asymptotic Power + b 3 59 174 195 283 334
Logarithmic + b 2 160 229 223 280 331
Ramp + b 3 26 79 132 168 188
Note: The functions that accounted for the largest percentage of variance for the planned and post-hoc analyses are bolded.
a
Not significantly different from the function with the smallest AICc.
Table 5
Parameter values of the quantitative laws of effect fits.
Function v k re a b
Hyperbola 5 315 45
Hyperbola 10 556 30
Hyperbola 15 560 27
Hyperbola 20 570 16
Hyperbola 25 567 14
Exponentiated Hyperbola 5 371 39 0.88
–
With the Baseline Parameter (b)
Hyperbola + b 5 331 58 11
Hyperbola + b 10 548 43 44
Hyperbola + b 15 550 34 29
Hyperbola + b 20 612 12 −56
Hyperbola + b 25 612 11 −59
Exponentiated Hyperbola + b 5 207 497 1.80 28
If fewer than 11 data points were used to compare the models it examined the same models in experiments with live organisms
seems probable that there would not be enough information to favored the simple hyperbola, Eq. (1), over the asymptotic expo-
strongly favor the exponentiated hyperbola over the asymptotic nential (de Villers and Herrnstein, 1976). It seems unlikely that the
exponential. This is important, because previous comparisons that findings in that review would have been as clear if the behavior
Fig. 3. Standardized residuals of the four quantitative-law-of-effect-based equations for the reinforcer value (v) 15 solution.
of humans and animals were more similar to that of the neural and would have to only engage in a single response at each time
network solutions in this experiment. step. These would be interesting advances to make with UTR-based
The operant level baseline adjustments helped the equations neural networks.
describe the behavior of the neural network solutions. The func- In summary, this experiment led to a number of solutions that
tions with the operant baseline adjustments were favored by the were similar to human and animal behavior in single VI experi-
AICc for nearly all fits to the data. The addition of the operant ments. The parameter values of the best fitting equations did not
baseline also eliminated the residual trends for the exponentiated match that of humans and animals, unfortunately. The results of
hyperbola and asymptotic exponential fits. It may be important to this experiment do not entirely support the UTR’s plausibility as an
incorporate the operant baseline adjustment with the hyperbolic account for behavior on single VI schedules, but the results were
models when fitting these functions to human and animal behav- close enough that it seems likely that slight modifications to the
ior. For example, Navakatikyan et al. (2013) compared hyperbolic equations of the model, the fitness criterion, or experimental design
models that lacked the operant baseline adjustment with some would provide a better account. Further exploration of the operant
atheoretic models that included the adjustment. The atheoretic behavior of UTR-based neural networks will enhance the theory
models that included the adjustment were generally better than and further develop it.
hyperbolic models that lacked the adjustment. It is possible that
the inclusion of the operant level adjustment may have been the
main factor that led to the atheoretic models outperforming the References
hyperbolic models, and in the future it may be best to incorporate
Baum, W.M., 1974. On two types of deviation from the matching law: bias and
the intercept for all quantitative law of effect fits.
undermatching. J. Exp. Anal. Behav. 22, 231–242, http://dx.doi.org/10.1901/
A weakness of this study’s design is that it did not directly jeab.1974.22-231.
simulate extraneous reinforcement, which is an important theoret- Baum, W.M., 1979. Matching, undermatching, and overmatching in studies of
ical component of the quantitative law of effect. This experiment choice. J. Exp. Anal. Behav. 32 (2), 269–281, http://dx.doi.org/10.1901/jeab.
1979.32-269.
approached this problem via the fitness criterion by incorporating Baum, W.M., Davison, M., 2014. Choice with frequently changing food rates and
an opportunity cost to behaving. The problem with this approach food ratios. J. Exp. Anal. Behav. 101, 246–274 http://dx.doi.org/10.1002/jeab.70.
is that an opportunity cost only exists because there are presum- Beardsley, S.D., McDowell, J.J., 1992. Application of Herrnstein’s hyperbola to time
allocation of naturalistic human behavior maintained by naturalistic social
ably other operant behaviors that the neural network could engage reinforcement. J. Exp. Anal. Behav. 57, 177–185, http://dx.doi.org/10.1901/
in that would be reinforced. It is possible that the fitness selection jeab.1992.57-177.
criterion used in this experiment may not properly simulate envi- Bradshaw, C.M., Szabadi, E., Bevan, P., 1976. Behavior of humans in
variable-interval schedules of reinforcement. J. Exp. Anal. Behav. 26 (2),
ronmental dynamics. A more ecologically valid approach would be 135–141, http://dx.doi.org/10.1901/jeab.1976.26-135.
to directly simulate extraneous reinforcement and behaviors that Bradshaw, C.M., Szabadi, E., Bevan, P., 1977. Effect of punishment on human
can acquire those reinforcers. variable-interval performance. J. Exp. Anal. Behav. 27, 275–279, http://dx.doi.
org/10.1901/jeab.1977.27-275.
Single VI schedules with extraneous reinforcement and behav-
Bradshaw, C.M., Szabadi, E., Bevan, P., 1978. Effect of variable interval punishment
ior would be the same as directly simulating the network’s behavior on the behavior of humans in variable-interval schedules of monetary
on concurrent VIVI schedules. It is very likely that a UTR-based neu- reinforcement. J. Exp. Anal. Behav. 29, 161–166, http://dx.doi.org/10.1901/
jeab.1978.29-161.
ral networks could be built that generates behavior that would be
Burgos, J.E., 1996. Computational explorations of the evolution of artificial neural
well described by the generalized matching law, and with param- networks in Pavlovian environments. In: Unpublished Doctoral Dissertation.
eter values that are similar to humans and animals. Other more University of Massachusetts, MA.
general neural network models have already demonstrated that Burgos, J.E., 1997. Evolving artificial neural networks in Pavlovian environments.
In: Donahoe, J.W., Dorsel, V.P. (Eds.), Neural-Network Approaches to
capacity (Seth, 2001). To run these simulations a new version of Cognition: Biobehavioral Foundations. Elsevier, Amsterdam, pp. 58–79.
the UTR-based neural networks would have to be created, how- Burgos, J.E., 2003. Theoretical note: simulating latent inhibition with selection
ever. The networks would have to have a third response NPU added neural networks. Behav. Process. 62, 183–192, http://dx.doi.org/10.1016/
S0376-6357(03)00025-1.
Burgos, J.E., 2005. Theoretical note: the C/T ratio in artificial neural networks. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. University of
Behav. Process. 69, 249–256, http://dx.doi.org/10.1016/j.beproc.2005.02.008. Michigan Press, Ann Arbor, MI.
Burgos, J.E., 2007. Autoshaping and automaintenance: a neural-network approach. McDowell, J.J., 1986. On the falsifiability of matching theory. J. Exp. Anal. Behav. 45,
J. Exp. Anal. Behav. 88, 115–130, http://dx.doi.org/10.1901/jeab.2007.75-04. 63–74, http://dx.doi.org/10.1901/jeab.1986.45-63.
Burgos, J.E., Murillo-Rodríguez, E., 2007. Neural-network simulations of two McDowell, J.J., 1989. Two modern developments in matching theory. Behav. Anal.
context-dependence phenomena. Behav. Process. 75, 242–249, http://dx.doi. 12, 153–166.
org/10.1016/j.beproc.2007.02.003. McDowell, J., 2004. A computational model of selection by consequences. J. Exp.
Burgos, J.E., Flores, C., García, Ó., Díaz, C., Cruz, Y., 2008. A simultaneous procedure Anal. Behav. 81, 297–317, http://dx.doi.org/10.1901/jeab.2004.81-297.
facilitates acquisition under an optimal interstimulus interval in artificial McDowell, J.J., 2013. On the theoretical and empirical status of the matching law
neural networks and rats. Behav. Process. 78, 302–309, http://dx.doi.org/10. and matching theory. Psychol. Bull. 139 (5), 1000–1028, http://dx.doi.org/10.
1016/j.beproc.2008.02.018. 1037/a0029924.
Burns, R., Burgos, J.E., Donahoe, J.W., 2011. Pavlovian conditioning: pigeon McDowell, J.J., Wood, H.M., 1984. Confirmation of linear system theory prediction:
nictitating membrane. Behav. Process. 86, 102–108, http://dx.doi.org/10.1016/ changes in Herrnstein’s k as a function of changes in reinforcer magnitude. J.
j.beproc.2010.10.004. Exp. Anal. Behav. 41, 183–192, http://dx.doi.org/10.1901/jeab.1984.41-183.
Calvin, N.T., McDowell, J.J., 2015. Unified-theory-of-reinforcement neural networks McDowell, J.J., Wood, H.M., 1985. Confirmation of linear system theory prediction:
do not simulate the blocking effect. Behav. Process. 120, 54–63, http://dx.doi. rate of change of Herrnstein’s k as a function of response-force requirement. J.
org/10.1016/j.beproc.2015.08.008. Exp. Anal. Behav. 43, 61–73, http://dx.doi.org/10.1901/jeab.1985.43-61.
de Villers, P.A., Herrnstein, R.J., 1976. Toward a law of response strength. Psychol. McDowell, J.J., Caron, M.L., Kulubekova, S., Berg, J.P., 2008. A computational theory
Bull. 83 (6), 1131–1153, http://dx.doi.org/10.1037/0033-2909.83.6.1131. of selection by consequences applied to concurrent schedules. J. Exp. Anal.
Donahoe, J.W., 2002. Behavior analysis and neuroscience. Behav. Process. 57, Behav. 90, 387–403, http://dx.doi.org/10.1901/jeab.2008.90.387.
241–259, http://dx.doi.org/10.1016/S0376-6357(02)00017-7. McLean, A.P., White, K.G., 1983. Temporal constraint on choice: sensitivity and bias
Donahoe, J.W., Burgos, J.E., 1999. Timing without a timer. J. Exp. Anal. Behav. 71, in multiple schedules. J. Exp. Anal. Behav. 39, 405–426, http://dx.doi.org/10.
257–301, http://dx.doi.org/10.1901/jeab.1999.71-257. 1901/jeab.1983.39-405.
Donahoe, J.W., Burgos, J.E., 2000. Behavior analysis and revaluation. J. Exp. Anal. McSweeney, F.K., Melville, C.L., Whipple, J.E., 1983. Herrnstein’s equation for the
Behav. 74, 332–346, http://dx.doi.org/10.1901/jeab.2000.74-331. rates of responding during concurrent schedules. Anim. Learn. Behav. 11,
Donahoe, J.W., Burgos, J.E., Palmer, D.C., 1993. A selectionist approach to 275–289, http://dx.doi.org/10.3758/BF03199777.
reinforcement. J. Exp. Anal. Behav. 60, 17–40, http://dx.doi.org/10.1901/jeab. Myers, D.L., Myers, L.E., 1977. Undermatching: a reappraisal of performance on
1993.60-17. concurrent variable-interval schedules of reinforcement. J. Exp. Anal. Behav.
Donahoe, J.W., Palmer, D.C., Burgos, J.E., 1997a. The S-R issue: its status in behavior 27, 203–214, http://dx.doi.org/10.1901/jeab.1977.27-203.
analysis and in Donahoe and Palmer’s. Learn. Complex Behav. J. Exp. Anal. Navakatikyan, M.A., Murrell, P., Benseman, J., Davison, M., Elliffe, D., 2013. Law of
Behav. 67, 193–211, http://dx.doi.org/10.1901/jeab.1997.68-46. effect models and choice between many alternatives. J. Exp. Anal. Behav. 100,
Donahoe, J.W., Palmer, D.C., Burgos, J.E., 1997b. The unit of selection: what do 222–256, http://dx.doi.org/10.1002/jeab.37.
reinforcers reinforce? J. Exp. Anal. Behav. 67, 259–273, http://dx.doi.org/10. Sánchez, J.M., Galeazzi, J.M., Burgos, J.E., 2010. Some structural determinants of
1901/jeab.1997.67-259. Pavlovian conditioning in artificial neural networks. Behav. Process. 84,
Fernandez, E., McDowell, J., 1995. Response-reinforcement relationships in chronic 526–535, http://dx.doi.org/10.1016/j.beproc.2010.01.018.
pain syndrome: applicability of Herrnstein’s law. Behav. Res. Ther. 33, Seth, A.K., 2001. Modeling group foraging: individual suboptimality, interference,
855–863, http://dx.doi.org/10.1016/0005-7967(95)00005-1. and a kind of matching. Adapt. Behav. 9 (2), 67–90, http://dx.doi.org/10.1177/
Fleshler, M., Hoffman, H.S., 1962. A progression for generating variable-interval 105971230200900204.
schedules. J. Exp. Anal. Behav. 5 (4), 529–530, http://dx.doi.org/10.1901/jeab. Sugiura, N., 1978. Further analysis of the data by Akaike’s information criterion and
1962.5-529. the finite corrections. Commun. Stat. A7, 13–26, http://dx.doi.org/10.1080/
Herrnstein, R.J., 1961. Relative and absolute strength of response as a frequency of 03610927808827599.
reinforcement. J. Exp. Anal. Behav. 4 (3), 267–272, http://dx.doi.org/10.1901/ Wearden, J.H., 1981. Bias and undermatching: implications for Herrnstein’s
jeab.1961.4-267. equation. Behav. Anal. Lett. 1 (3), 177–185.
Herrnstein, R.J., 1970. On the law of effect. J. Exp. Anal. Behav. 13 (2), 243–266, Wearden, J.H., Burgess, I.S., 1982. Matching since Baum (1979). J. Exp. Anal. Behav.
http://dx.doi.org/10.1901/jeab.1970.13-243. 38 (3), 339–348, http://dx.doi.org/10.1901/jeab.1982.38-339.

Calvin, MC Dowell 2016

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Calvin, MC Dowell 2016

Uploaded by

Copyright:

Available Formats

Behavioural Processes 127 (2016) 52–61

Contents lists available at ScienceDirect

Extending uniﬁed-theory-of-reinforcement neural networks to

The central assertion of the uniﬁed theory of reinforcement

reinforcement. The connections that are affected by this learning

v Connection Activation Level Gaussian Logistic

Function Reinforcer value (v)

Hyperbola 96.5 97.2 98.4 98.0 97.9

With the Baseline Parameter

Function Parameters Reinforcer Value (v)

Hyperbola 2 40 139 115 172 213

With the Baseline Parameter

You might also like