You are on page 1of 15

Learning classifier system

Learning classifier systems, or LCS, are a paradigm of rule-based


machine learning methods that combine a discovery component
(e.g. typically a genetic algorithm) with a learning component
(performing either supervised learning, reinforcement learning, or
unsupervised learning).[2] Learning classifier systems seek to
identify a set of context-dependent rules that collectively store and
apply knowledge in a piecewise manner in order to make
predictions (e.g. behavior modeling,[3] classification,[4][5] data
mining,[5][6][7] regression,[8] function approximation,[9] or game
strategy). This approach allows complex solution spaces to be
broken up into smaller, simpler parts.

The founding concepts behind learning classifier systems came


from attempts to model complex adaptive systems, using rule-based
agents to form an artificial cognitive system (i.e. artificial
intelligence).

2D visualization of LCS rules


Methodology learning to approximate a 3D
function. Each blue ellipse
The architecture and components of a given learning classifier represents an individual rule covering
system can be quite variable. It is useful to think of an LCS as a part of the solution space. (Adapted
machine consisting of several interacting components. Components from images taken from XCSF[1] with
may be added or removed, or existing components permission from Martin Butz)
modified/exchanged to suit the demands of a given problem
domain (like algorithmic building blocks) or to make the algorithm
flexible enough to function in many different problem domains. As a result, the LCS paradigm can be
flexibly applied to many problem domains that call for machine learning. The major divisions among LCS
implementations are as follows: (1) Michigan-style architecture vs. Pittsburgh-style architecture,[10] (2)
reinforcement learning vs. supervised learning, (3) incremental learning vs. batch learning, (4) online
learning vs. offline learning, (5) strength-based fitness vs. accuracy-based fitness, and (6) complete action
mapping vs best action mapping. These divisions are not necessarily mutually exclusive. For example,
XCS,[11] the best known and best studied LCS algorithm, is Michigan-style, was designed for
reinforcement learning but can also perform supervised learning, applies incremental learning that can be
either online or offline, applies accuracy-based fitness, and seeks to generate a complete action mapping.

Elements of a generic LCS algorithm

Keeping in mind that LCS is a paradigm for genetic-based machine learning rather than a specific method,
the following outlines key elements of a generic, modern (i.e. post-XCS) LCS algorithm. For simplicity let
us focus on Michigan-style architecture with supervised learning. See the illustrations on the right laying
out the sequential steps involved in this type of generic LCS.

Environment
The environment is the source of data upon which an LCS learns.
It can be an offline, finite training dataset (characteristic of a data
mining, classification, or regression problem), or an online
sequential stream of live training instances. Each training instance is
assumed to include some number of features (also referred to as
attributes, or independent variables), and a single endpoint of
interest (also referred to as the class, action, phenotype, prediction,
or dependent variable). Part of LCS learning can involve feature
selection, therefore not all of the features in the training data need to
be informative. The set of feature values of an instance is
commonly referred to as the state. For simplicity let's assume an A step-wise schematic illustrating a
example problem domain with Boolean/binary features and a generic Michigan-style learning
Boolean/binary class. For Michigan-style systems, one instance classifier system learning cycle
from the environment is trained on each learning cycle (i.e. performing supervised learning
incremental learning). Pittsburgh-style systems perform batch
learning, where rule sets are evaluated in each iteration over much
or all of the training data.

Rule/classifier/population

A rule is a context dependent relationship between state values and some prediction. Rules typically take
the form of an {IF:THEN} expression, (e.g. {IF 'condition' THEN 'action'}, or as a more specific example,
{IF 'red' AND 'octagon' THEN 'stop-sign'}). A critical concept in LCS and rule-based machine learning
alike, is that an individual rule is not in itself a model, since the rule is only applicable when its condition is
satisfied. Think of a rule as a "local-model" of the solution space.

Rules can be represented in many different ways to handle different data types (e.g. binary, discrete-valued,
ordinal, continuous-valued). Given binary data LCS traditionally applies a ternary rule representation (i.e.
rules can include either a 0, 1, or '#' for each feature in the data). The 'don't care' symbol (i.e. '#') serves as a
wild card within a rule's condition allowing rules, and the system as a whole to generalize relationships
between features and the target endpoint to be predicted. Consider the following rule (#1###0 ~ 1) (i.e.
condition ~ action). This rule can be interpreted as: IF the second feature = 1 AND the sixth feature = 0
THEN the class prediction = 1. We would say that the second and sixth features were specified in this rule,
while the others were generalized. This rule, and the corresponding prediction are only applicable to an
instance when the condition of the rule is satisfied by the instance. This is more commonly referred to as
matching. In Michigan-style LCS, each rule has its own fitness, as well as a number of other rule-
parameters associated with it that can describe the number of copies of that rule that exist (i.e. the
numerosity), the age of the rule, its accuracy, or the accuracy of its reward predictions, and other descriptive
or experiential statistics. A rule along with its parameters is often referred to as a classifier. In Michigan-
style systems, classifiers are contained within a population [P] that has a user defined maximum number of
classifiers. Unlike most stochastic search algorithms (e.g. evolutionary algorithms), LCS populations start
out empty (i.e. there is no need to randomly initialize a rule population). Classifiers will instead be initially
introduced to the population with a covering mechanism.

In any LCS, the trained model is a set of rules/classifiers, rather than any single rule/classifier. In Michigan-
style LCS, the entire trained (and optionally, compacted) classifier population forms the prediction model.

Matching
One of the most critical and often time-consuming elements of an LCS is the matching process. The first
step in an LCS learning cycle takes a single training instance from the environment and passes it to [P]
where matching takes place. In step two, every rule in [P] is now compared to the training instance to see
which rules match (i.e. are contextually relevant to the current instance). In step three, any matching rules
are moved to a match set [M]. A rule matches a training instance if all feature values specified in the rule
condition are equivalent to the corresponding feature value in the training instance. For example, assuming
the training instance is (001001 ~ 0), these rules would match: (###0## ~ 0), (00###1 ~ 0), (#01001 ~ 1),
but these rules would not (1##### ~ 0), (000##1 ~ 0), (#0#1#0 ~ 1). Notice that in matching, the
endpoint/action specified by the rule is not taken into consideration. As a result, the match set may contain
classifiers that propose conflicting actions. In the fourth step, since we are performing supervised learning,
[M] is divided into a correct set [C] and an incorrect set [I]. A matching rule goes into the correct set if it
proposes the correct action (based on the known action of the training instance), otherwise it goes into [I].
In reinforcement learning LCS, an action set [A] would be formed here instead, since the correct action is
not known.

Covering

At this point in the learning cycle, if no classifiers made it into either [M] or [C] (as would be the case when
the population starts off empty), the covering mechanism is applied (fifth step). Covering is a form of online
smart population initialization. Covering randomly generates a rule that matches the current training
instance (and in the case of supervised learning, that rule is also generated with the correct action.
Assuming the training instance is (001001 ~ 0), covering might generate any of the following rules:
(#0#0## ~ 0), (001001 ~ 0), (#010## ~ 0). Covering not only ensures that each learning cycle there is at
least one correct, matching rule in [C], but that any rule initialized into the population will match at least
one training instance. This prevents LCS from exploring the search space of rules that do not match any
training instances.

Parameter updates/credit assignment/learning

In the sixth step, the rule parameters of any rule in [M] are updated to reflect the new experience gained
from the current training instance. Depending on the LCS algorithm, a number of updates can take place at
this step. For supervised learning, we can simply update the accuracy/error of a rule. Rule accuracy/error is
different than model accuracy/error, since it is not calculated over the entire training data, but only over all
instances that it matched. Rule accuracy is calculated by dividing the number of times the rule was in a
correct set [C] by the number of times it was in a match set [M]. Rule accuracy can be thought of as a 'local
accuracy'. Rule fitness is also updated here, and is commonly calculated as a function of rule accuracy. The
concept of fitness is taken directly from classic genetic algorithms. Be aware that there are many variations
on how LCS updates parameters in order to perform credit assignment and learning.

Subsumption

In the seventh step, a subsumption mechanism is typically applied. Subsumption is an explicit


generalization mechanism that merges classifiers that cover redundant parts of the problem space. The
subsuming classifier effectively absorbs the subsumed classifier (and has its numerosity increased). This can
only happen when the subsuming classifier is more general, just as accurate, and covers all of the problem
space of the classifier it subsumes.

Rule discovery/genetic algorithm


In the eighth step, LCS adopts a highly elitist genetic algorithm (GA) which will select two parent
classifiers based on fitness (survival of the fittest). Parents are selected from [C] typically using tournament
selection. Some systems have applied roulette wheel selection or deterministic selection, and have
differently selected parent rules from either [P] - panmictic selection, or from [M]). Crossover and mutation
operators are now applied to generate two new offspring rules. At this point, both the parent and offspring
rules are returned to [P]. The LCS genetic algorithm is highly elitist since each learning iteration, the vast
majority of the population is preserved. Rule discovery may alternatively be performed by some other
method, such as an estimation of distribution algorithm, but a GA is by far the most common approach.
Evolutionary algorithms like the GA employ a stochastic search, which makes LCS a stochastic algorithm.
LCS seeks to cleverly explore the search space, but does not perform an exhaustive search of rule
combinations, and is not guaranteed to converge on an optimal solution.

Deletion

The last step in a generic LCS learning cycle is to maintain the maximum population size. The deletion
mechanism will select classifiers for deletion (commonly using roulette wheel selection). The probability of
a classifier being selected for deletion is inversely proportional to its fitness. When a classifier is selected for
deletion, its numerosity parameter is reduced by one. When the numerosity of a classifier is reduced to zero,
it is removed entirely from the population.

Training

LCS will cycle through these steps repeatedly for some user defined number of training iterations, or until
some user defined termination criteria have been met. For online learning, LCS will obtain a completely
new training instance each iteration from the environment. For offline learning, LCS will iterate through a
finite training dataset. Once it reaches the last instance in the dataset, it will go back to the first instance and
cycle through the dataset again.

Rule compaction

Once training is complete, the rule population will inevitably contain some poor, redundant and
inexperienced rules. It is common to apply a rule compaction, or condensation heuristic as a post-
processing step. This resulting compacted rule population is ready to be applied as a prediction model (e.g.
make predictions on testing instances), and/or to be interpreted for knowledge discovery.

Prediction

Whether or not rule compaction has been applied, the output of an LCS algorithm is a population of
classifiers which can be applied to making predictions on previously unseen instances. The prediction
mechanism is not part of the supervised LCS learning cycle itself, however it would play an important role
in a reinforcement learning LCS learning cycle. For now we consider how the prediction mechanism can
be applied for making predictions to test data. When making predictions, the LCS learning components are
deactivated so that the population does not continue to learn from incoming testing data. A test instance is
passed to [P] where a match set [M] is formed as usual. At this point the match set is differently passed to a
prediction array. Rules in the match set can predict different actions, therefore a voting scheme is applied. In
a simple voting scheme, the action with the strongest supporting 'votes' from matching rules wins, and
becomes the selected prediction. All rules do not get an equal vote. Rather the strength of the vote for a
single rule is commonly proportional to its numerosity and fitness. This voting scheme and the nature of
how LCS's store knowledge, suggests that LCS algorithms are implicitly ensemble learners.

Interpretation

Individual LCS rules are typically human readable IF:THEN expression. Rules that constitute the LCS
prediction model can be ranked by different rule parameters and manually inspected. Global strategies to
guide knowledge discovery using statistical and graphical have also been proposed.[12][13] With respect to
other advanced machine learning approaches, such as artificial neural networks, random forests, or genetic
programming, learning classifier systems are particularly well suited to problems that require interpretable
solutions.

History

Early years

John Henry Holland was best known for his work popularizing genetic algorithms (GA), through his
ground-breaking book "Adaptation in Natural and Artificial Systems"[14] in 1975 and his formalization of
Holland's schema theorem. In 1976, Holland conceptualized an extension of the GA concept to what he
called a "cognitive system",[15] and provided the first detailed description of what would become known as
the first learning classifier system in the paper "Cognitive Systems based on Adaptive Algorithms".[16] This
first system, named Cognitive System One (CS-1) was conceived as a modeling tool, designed to model a
real system (i.e. environment) with unknown underlying dynamics using a population of human readable
rules. The goal was for a set of rules to perform online machine learning to adapt to the environment based
on infrequent payoff/reward (i.e. reinforcement learning) and apply these rules to generate a behavior that
matched the real system. This early, ambitious implementation was later regarded as overly complex,
yielding inconsistent results.[2][17]

Beginning in 1980, Kenneth de Jong and his student Stephen Smith took a different approach to rule-based
machine learning with (LS-1), where learning was viewed as an offline optimization process rather than an
online adaptation process.[18][19][20] This new approach was more similar to a standard genetic algorithm
but evolved independent sets of rules. Since that time LCS methods inspired by the online learning
framework introduced by Holland at the University of Michigan have been referred to as Michigan-style
LCS, and those inspired by Smith and De Jong at the University of Pittsburgh have been referred to as
Pittsburgh-style LCS.[2][17] In 1986, Holland developed what would be considered the standard
Michigan-style LCS for the next decade.[21]

Other important concepts that emerged in the early days of LCS research included (1) the formalization of a
bucket brigade algorithm (BBA) for credit assignment/learning,[22] (2) selection of parent rules from a
common 'environmental niche' (i.e. the match set [M]) rather than from the whole population [P],[23] (3)
covering, first introduced as a create operator,[24] (4) the formalization of an action set [A],[24] (5) a
simplified algorithm architecture,[24] (6) strength-based fitness,[21] (7) consideration of single-step, or
supervised learning problems[25] and the introduction of the correct set [C],[26] (8) accuracy-based
fitness[27] (9) the combination of fuzzy logic with LCS[28] (which later spawned a lineage of fuzzy LCS
algorithms), (10) encouraging long action chains and default hierarchies for improving performance on
multi-step problems,[29][30][31] (11) examining latent learning (which later inspired a new branch of
anticipatory classifier systems (ACS)[32]), and (12) the introduction of the first Q-learning-like credit
assignment technique.[33] While not all of these concepts are applied in modern LCS algorithms, each were
landmarks in the development of the LCS paradigm.

The revolution

Interest in learning classifier systems was reinvigorated in the mid 1990s largely due to two events; the
development of the Q-Learning algorithm[34] for reinforcement learning, and the introduction of
significantly simplified Michigan-style LCS architectures by Stewart Wilson.[11][35] Wilson's Zeroth-level
Classifier System (ZCS)[35] focused on increasing algorithmic understandability based on Hollands
standard LCS implementation.[21] This was done, in part, by removing rule-bidding and the internal
message list, essential to the original BBA credit assignment, and replacing it with a hybrid BBA/Q-
Learning strategy. ZCS demonstrated that a much simpler LCS architecture could perform as well as the
original, more complex implementations. However, ZCS still suffered from performance drawbacks
including the proliferation of over-general classifiers.

In 1995, Wilson published his landmark paper, "Classifier fitness based on accuracy" in which he
introduced the classifier system XCS.[11] XCS took the simplified architecture of ZCS and added an
accuracy-based fitness, a niche GA (acting in the action set [A]), an explicit generalization mechanism
called subsumption, and an adaptation of the Q-Learning credit assignment. XCS was popularized by its
ability to reach optimal performance while evolving accurate and maximally general classifiers as well as its
impressive problem flexibility (able to perform both reinforcement learning and supervised learning). XCS
later became the best known and most studied LCS algorithm and defined a new family of accuracy-based
LCS. ZCS alternatively became synonymous with strength-based LCS. XCS is also important, because it
successfully bridged the gap between LCS and the field of reinforcement learning. Following the success
of XCS, LCS were later described as reinforcement learning systems endowed with a generalization
capability.[36] Reinforcement learning typically seeks to learn a value function that maps out a complete
representation of the state/action space. Similarly, the design of XCS drives it to form an all-inclusive and
accurate representation of the problem space (i.e. a complete map) rather than focusing on high payoff
niches in the environment (as was the case with strength-based LCS). Conceptually, complete maps don't
only capture what you should do, or what is correct, but also what you shouldn't do, or what's incorrect.
Differently, most strength-based LCSs, or exclusively supervised learning LCSs seek a rule set of efficient
generalizations in the form of a best action map (or a partial map). Comparisons between strength vs.
accuracy-based fitness and complete vs. best action maps have since been examined in greater detail.[37][38]

In the wake of XCS

XCS inspired the development of a whole new generation of LCS algorithms and applications. In 1995,
Congdon was the first to apply LCS to real-world epidemiological investigations of disease [39] followed
closely by Holmes who developed the BOOLE++,[40] EpiCS,[41] and later EpiXCS[42] for
epidemiological classification. These early works inspired later interest in applying LCS algorithms to
complex and large-scale data mining tasks epitomized by bioinformatics applications. In 1998, Stolzmann
introduced anticipatory classifier systems (ACS) which included rules in the form of 'condition-action-
effect, rather than the classic 'condition-action' representation.[32] ACS was designed to predict the
perceptual consequences of an action in all possible situations in an environment. In other words, the
system evolves a model that specifies not only what to do in a given situation, but also provides information
of what will happen after a specific action will be executed. This family of LCS algorithms is best suited to
multi-step problems, planning, speeding up learning, or disambiguating perceptual aliasing (i.e. where the
same observation is obtained in distinct states but requires different actions). Butz later pursued this
anticipatory family of LCS developing a number of improvements to the original method.[43] In 2002,
Wilson introduced XCSF, adding a computed action in order to perform function approximation.[44] In
2003, Bernado-Mansilla introduced a sUpervised Classifier System (UCS), which specialized the XCS
algorithm to the task of supervised learning, single-step problems, and forming a best action set. UCS
removed the reinforcement learning strategy in favor of a simple, accuracy-based rule fitness as well as the
explore/exploit learning phases, characteristic of many reinforcement learners. Bull introduced a simple
accuracy-based LCS (YCS)[45] and a simple strength-based LCS Minimal Classifier System (MCS)[46]
in order to develop a better theoretical understanding of the LCS framework. Bacardit introduced
GAssist[47] and BioHEL,[48] Pittsburgh-style LCSs designed for data mining and scalability to large
datasets in bioinformatics applications. In 2008, Drugowitsch published the book titled "Design and
Analysis of Learning Classifier Systems" including some theoretical examination of LCS algorithms.[49]
Butz introduced the first rule online learning visualization within a GUI for XCSF[1] (see the image at the
top of this page). Urbanowicz extended the UCS framework and introduced ExSTraCS, explicitly
designed for supervised learning in noisy problem domains (e.g. epidemiology and bioinformatics).[50]
ExSTraCS integrated (1) expert knowledge to drive covering and genetic algorithm towards important
features in the data,[51] (2) a form of long-term memory referred to as attribute tracking,[52] allowing for
more efficient learning and the characterization of heterogeneous data patterns, and (3) a flexible rule
representation similar to Bacardit's mixed discrete-continuous attribute list representation.[53] Both Bacardit
and Urbanowicz explored statistical and visualization strategies to interpret LCS rules and perform
knowledge discovery for data mining.[12][13] Browne and Iqbal explored the concept of reusing building
blocks in the form of code fragments and were the first to solve the 135-bit multiplexer benchmark problem
by first learning useful building blocks from simpler multiplexer problems.[54] ExSTraCS 2.0 was later
introduced to improve Michigan-style LCS scalability, successfully solving the 135-bit multiplexer
benchmark problem for the first time directly.[5] The n-bit multiplexer problem is highly epistatic and
heterogeneous, making it a very challenging machine learning task.

Variants

Michigan-Style Learning Classifier System

Michigan-Style LCSs are characterized by a population of rules where the genetic algorithm operates at the
level of individual rules and the solution is represented by the entire rule population. Michigan style systems
also learn incrementally which allows them to perform both reinforcement learning and supervised learning,
as well as both online and offline learning. Michigan-style systems have the advantage of being applicable
to a greater number of problem domains, and the unique benefits of incremental learning.

Pittsburgh-Style Learning Classifier System

Pittsburgh-Style LCSs are characterized by a population of variable length rule-sets where each rule-set is a
potential solution. The genetic algorithm typically operates at the level of an entire rule-set. Pittsburgh-style
systems can also uniquely evolve ordered rule lists, as well as employ a default rule. These systems have
the natural advantage of identifying smaller rule sets, making these systems more interpretable with regards
to manual rule inspection.

Hybrid systems
Systems that seek to combine key strengths of both systems have also been proposed.

Advantages
Adaptive: They can acclimate to a changing environment in the case of online learning.
Model free: They make limited assumptions about the environment, or the patterns of
association within the data.
They can model complex, epistatic, heterogeneous, or distributed underlying patterns
without relying on prior knowledge.
They make no assumptions about the number of predictive vs. non-predictive features in
the data.
Ensemble Learner: No single model is applied to a given instance that universally provides
a prediction. Instead a relevant and often conflicting set of rules contribute a 'vote' which can
be interpreted as a fuzzy prediction.
Stochastic Learner: Non-deterministic learning is advantageous in large-scale or high
complexity problems where deterministic or exhaustive learning becomes intractable.
Implicitly Multi-objective: Rules evolve towards accuracy with implicit and explicit pressures
encouraging maximal generality/simplicity. This implicit generalization pressure is unique to
LCS. Effectively, more general rules, will appear more often in match sets. In turn, they have
a more frequent opportunity to be selected as parents, and pass on their more general
(genomes) to offspring rules.
Interpretable:In the interest of data mining and knowledge discovery individual LCS rules
are logical, and can be made to be human interpretable IF:THEN statements. Effective
strategies have also been introduced to allow for global knowledge discovery identifying
significant features, and patterns of association from the rule population as a whole.[12]
Flexible application
Single or multi-step problems
Supervised, Reinforcement or Unsupervised Learning
Binary Class and Multi-Class Classification
Regression
Discrete or continuous features (or some mix of both types)
Clean or noisy problem domains
Balanced or imbalanced datasets.
Accommodates missing data (i.e. missing feature values in training instances)

Disadvantages
Limited Software Availability: There are a limited number of open source, accessible LCS
implementations, and even fewer that are designed to be user friendly or accessible to
machine learning practitioners.
Interpretation: While LCS algorithms are certainly more interpretable than some advanced
machine learners, users must interpret a set of rules (sometimes large sets of rules to
comprehend the LCS model.). Methods for rule compaction, and interpretation strategies
remains an area of active research.
Theory/Convergence Proofs: There is a relatively small body of theoretical work behind LCS
algorithms. This is likely due to their relative algorithmic complexity (applying a number of
interacting components) as well as their stochastic nature.
Overfitting: Like any machine learner, LCS can suffer from overfitting despite implicit and
explicit generalization pressures.
Run Parameters: LCSs often have many run parameters to consider/optimize. Typically,
most parameters can be left to the community determined defaults with the exception of two
critical parameters: Maximum rule population size, and the maximum number of learning
iterations. Optimizing these parameters are likely to be very problem dependent.
Notoriety: Despite their age, LCS algorithms are still not widely known even in machine
learning communities. As a result, LCS algorithms are rarely considered in comparison to
other established machine learning approaches. This is likely due to the following factors:
(1) LCS is a relatively complicated algorithmic approach, (2) LCS, rule-based modeling is a
different paradigm of modeling than almost all other machine learning approaches. (3) LCS
software implementations are not as common.
Computationally Expensive: While certainly more feasible than some exhaustive
approaches, LCS algorithms can be computationally expensive. For simple, linear learning
problems there is no need to apply an LCS. LCS algorithms are best suited to complex
problem spaces, or problem spaces in which little prior knowledge exists.

Problem domains
Adaptive-control
Data Mining
Engineering Design
Feature Selection
Function Approximation
Game-Play
Image Classification
Knowledge Handling
Medical Diagnosis
Modeling
Navigation
Optimization
Prediction
Querying
Robotics
Routing
Rule-Induction
Scheduling
Strategy

Terminology
The name, "Learning Classifier System (LCS)", is a bit misleading since there are many machine learning
algorithms that 'learn to classify' (e.g. decision trees, artificial neural networks), but are not LCSs. The term
'rule-based machine learning (RBML)' is useful, as it more clearly captures the essential 'rule-based'
component of these systems, but it also generalizes to methods that are not considered to be LCSs (e.g.
association rule learning, or artificial immune systems). More general terms such as, 'genetics-based
machine learning', and even 'genetic algorithm'[39] have also been applied to refer to what would be more
characteristically defined as a learning classifier system. Due to their similarity to genetic algorithms,
Pittsburgh-style learning classifier systems are sometimes generically referred to as 'genetic algorithms'.
Beyond this, some LCS algorithms, or closely related methods, have been referred to as 'cognitive
systems',[16] 'adaptive agents', 'production systems', or generically as a 'classifier system'.[55][56] This
variation in terminology contributes to some confusion in the field.

Up until the 2000s nearly all learning classifier system methods were developed with reinforcement
learning problems in mind. As a result, the term ‘learning classifier system’ was commonly defined as the
combination of ‘trial-and-error’ reinforcement learning with the global search of a genetic algorithm.
Interest in supervised learning applications, and even unsupervised learning have since broadened the use
and definition of this term.

See also
Rule-based machine learning
Production system
Expert system
Genetic algorithm
Association rule learning
Artificial immune system
Population-based Incremental Learning
Machine learning

References
1. Stalph, Patrick O.; Butz, Martin V. (2010-02-01). "JavaXCSF: The XCSF Learning Classifier
System in Java". SIGEVOlution. 4 (3): 16–19. doi:10.1145/1731888.1731890 (https://doi.org/
10.1145%2F1731888.1731890). ISSN 1931-8499 (https://www.worldcat.org/issn/1931-849
9). S2CID 16861908 (https://api.semanticscholar.org/CorpusID:16861908).
2. Urbanowicz, Ryan J.; Moore, Jason H. (2009-09-22). "Learning Classifier Systems: A
Complete Introduction, Review, and Roadmap" (https://doi.org/10.1155%2F2009%2F73639
8). Journal of Artificial Evolution and Applications. 2009: 1–25. doi:10.1155/2009/736398 (htt
ps://doi.org/10.1155%2F2009%2F736398). ISSN 1687-6229 (https://www.worldcat.org/issn/
1687-6229).
3. Dorigo, Marco (1995). "Alecsys and the AutonoMouse: Learning to control a real robot by
distributed classifier systems" (https://doi.org/10.1007%2FBF00996270). Machine Learning.
19 (3): 209–240. doi:10.1007/BF00996270 (https://doi.org/10.1007%2FBF00996270).
ISSN 0885-6125 (https://www.worldcat.org/issn/0885-6125).
4. Bernadó-Mansilla, Ester; Garrell-Guiu, Josep M. (2003-09-01). "Accuracy-Based Learning
Classifier Systems: Models, Analysis and Applications to Classification Tasks". Evolutionary
Computation. 11 (3): 209–238. doi:10.1162/106365603322365289 (https://doi.org/10.1162%
2F106365603322365289). ISSN 1063-6560 (https://www.worldcat.org/issn/1063-6560).
PMID 14558911 (https://pubmed.ncbi.nlm.nih.gov/14558911). S2CID 9086149 (https://api.se
manticscholar.org/CorpusID:9086149).
5. Urbanowicz, Ryan J.; Moore, Jason H. (2015-04-03). "ExSTraCS 2.0: description and
evaluation of a scalable learning classifier system" (https://www.ncbi.nlm.nih.gov/pmc/article
s/PMC4583133). Evolutionary Intelligence. 8 (2–3): 89–116. doi:10.1007/s12065-015-0128-
8 (https://doi.org/10.1007%2Fs12065-015-0128-8). ISSN 1864-5909 (https://www.worldcat.o
rg/issn/1864-5909). PMC 4583133 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC458313
3). PMID 26417393 (https://pubmed.ncbi.nlm.nih.gov/26417393).
6. Bernadó, Ester; Llorà, Xavier; Garrell, Josep M. (2001-07-07). Lanzi, Pier Luca; Stolzmann,
Wolfgang; Wilson, Stewart W. (eds.). Advances in Learning Classifier Systems (https://archiv
e.org/details/advanceslearning00lanz). Lecture Notes in Computer Science. Springer Berlin
Heidelberg. pp. 115 (https://archive.org/details/advanceslearning00lanz/page/n120)–132.
doi:10.1007/3-540-48104-4_8 (https://doi.org/10.1007%2F3-540-48104-4_8).
ISBN 9783540437932.
7. Bacardit, Jaume; Butz, Martin V. (2007-01-01). Kovacs, Tim; Llorà, Xavier; Takadama, Keiki;
Lanzi, Pier Luca; Stolzmann, Wolfgang; Wilson, Stewart W. (eds.). Learning Classifier
Systems (https://archive.org/details/learningclassifi00kova_690). Lecture Notes in Computer
Science. Springer Berlin Heidelberg. pp. 282 (https://archive.org/details/learningclassifi00ko
va_690/page/n291)–290. CiteSeerX 10.1.1.553.4679 (https://citeseerx.ist.psu.edu/viewdoc/s
ummary?doi=10.1.1.553.4679). doi:10.1007/978-3-540-71231-2_19 (https://doi.org/10.100
7%2F978-3-540-71231-2_19). ISBN 9783540712305.
8. Urbanowicz, Ryan; Ramanand, Niranjan; Moore, Jason (2015-01-01). Continuous Endpoint
Data Mining with ExSTraCS: A Supervised Learning Classifier System. Proceedings of the
Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary
Computation. GECCO Companion '15. New York, NY, USA: ACM. pp. 1029–1036.
doi:10.1145/2739482.2768453 (https://doi.org/10.1145%2F2739482.2768453).
ISBN 9781450334884. S2CID 11908241 (https://api.semanticscholar.org/CorpusID:119082
41).
9. Butz, M. V.; Lanzi, P. L.; Wilson, S. W. (2008-06-01). "Function Approximation With XCS:
Hyperellipsoidal Conditions, Recursive Least Squares, and Compaction". IEEE
Transactions on Evolutionary Computation. 12 (3): 355–376.
doi:10.1109/TEVC.2007.903551 (https://doi.org/10.1109%2FTEVC.2007.903551).
ISSN 1089-778X (https://www.worldcat.org/issn/1089-778X). S2CID 8861046 (https://api.se
manticscholar.org/CorpusID:8861046).
10. Introducing Rule-Based Machine Learning: A Practical Guide (http://ryanurbanowicz.com/wp
-content/uploads/2016/09/Urbanowicz_Browne_2015_Introducing-Rule-Based-Machine-Le
arning-A-Practical-Guide-GECCO15-CRC-Copy.pdf), Ryan J. Urbanowicz and Will Browne,
see pp. 72-73 for Michigan-style architecture vs. Pittsburgh-style architecture.
11. Wilson, Stewart W. (1995-06-01). "Classifier Fitness Based on Accuracy". Evol. Comput. 3
(2): 149–175. CiteSeerX 10.1.1.363.2210 (https://citeseerx.ist.psu.edu/viewdoc/summary?do
i=10.1.1.363.2210). doi:10.1162/evco.1995.3.2.149 (https://doi.org/10.1162%2Fevco.1995.3.
2.149). ISSN 1063-6560 (https://www.worldcat.org/issn/1063-6560). S2CID 18341635 (http
s://api.semanticscholar.org/CorpusID:18341635).
12. Urbanowicz, R. J.; Granizo-Mackenzie, A.; Moore, J. H. (2012-11-01). "An analysis pipeline
with statistical and visualization-guided knowledge discovery for Michigan-style learning
classifier systems" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4244006). IEEE
Computational Intelligence Magazine. 7 (4): 35–45. doi:10.1109/MCI.2012.2215124 (https://
doi.org/10.1109%2FMCI.2012.2215124). ISSN 1556-603X (https://www.worldcat.org/issn/15
56-603X). PMC 4244006 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4244006).
PMID 25431544 (https://pubmed.ncbi.nlm.nih.gov/25431544).
13. Bacardit, Jaume; Llorà, Xavier (2013). "Large‐scale data mining using genetics‐based
machine learning". Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery.
3 (1): 37–61. doi:10.1002/widm.1078 (https://doi.org/10.1002%2Fwidm.1078).
S2CID 43062613 (https://api.semanticscholar.org/CorpusID:43062613).
14. Holland, John (1975). Adaptation in natural and artificial systems: an introductory analysis
with applications to biology, control, and artificial intelligence (https://books.google.com/book
s?id=5EgGaBkwvWcC). Michigan Press. ISBN 9780262581110.
15. Holland JH (1976) Adaptation. In: Rosen R, Snell F (eds) Progress in theoretical biology, vol
4. Academic Press, New York, pp 263–293
16. Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms Reprinted
in: Evolutionary computation. The fossil record. In: David BF (ed) IEEE Press, New York
1998. ISBN 0-7803-3481-7
17. Lanzi, Pier Luca (2008-02-08). "Learning classifier systems: then and now". Evolutionary
Intelligence. 1 (1): 63–82. doi:10.1007/s12065-007-0003-3 (https://doi.org/10.1007%2Fs120
65-007-0003-3). ISSN 1864-5909 (https://www.worldcat.org/issn/1864-5909).
S2CID 27153843 (https://api.semanticscholar.org/CorpusID:27153843).
18. Smith S (1980) A learning system based on genetic adaptive algorithms. Ph.D. thesis,
Department of Computer Science, University of Pittsburgh
19. Smith S (1983) Flexible learning of problem solving heuristics through adaptive search (http
s://www.researchgate.net/profile/Stephen_Smith14/publication/220815785_Flexible_Learni
ng_of_Problem_Solving_Heuristics_Through_Adaptive_Search/links/0deec52c18dbd0dd5
3000000.pdf). In: Eighth international joint conference on articial intelligence. Morgan
Kaufmann, Los Altos, pp 421–425
20. De Jong KA (1988) Learning with genetic algorithms: an overview. Mach Learn 3:121–138
21. Holland, John H. "Escaping brittleness: the possibilities of general purpose learning
algorithms applied to parallel rule-based system." Machine learning(1986): 593-623. (http://d
l.acm.org/citation.cfm?id=216016)
22. Holland, John H. (1985-01-01). Properties of the Bucket Brigade (http://dl.acm.org/citation.cf
m?id=645511.657087). Proceedings of the 1st International Conference on Genetic
Algorithms. Hillsdale, NJ, USA: L. Erlbaum Associates Inc. pp. 1–7. ISBN 978-0805804263.
23. Booker, L (1982-01-01). Intelligent Behavior as an Adaptation to the Task Environment (htt
p://www.citeulike.org/group/664/article/431772) (Thesis). University of Michigan.
24. Wilson, S. W. "Knowledge growth in an artificial animal (http://www.cs.sfu.ca/~vaughan/teac
hing/415/papers/wilson_animat.pdf). Proceedings of the First International Conference on
Genetic Algorithms and their Applications." (1985).
25. Wilson, Stewart W. (1987). "Classifier systems and the animat problem" (https://doi.org/10.10
07%2FBF00058679). Machine Learning. 2 (3): 199–228. doi:10.1007/BF00058679 (https://d
oi.org/10.1007%2FBF00058679). ISSN 0885-6125 (https://www.worldcat.org/issn/0885-612
5).
26. Bonelli, Pierre; Parodi, Alexandre; Sen, Sandip; Wilson, Stewart (1990-01-01).
NEWBOOLE: A Fast GBML System (https://archive.org/details/machinelearningp0000inte/p
age/153). Proceedings of the Seventh International Conference (1990) on Machine
Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. pp. 153–159 (https://
archive.org/details/machinelearningp0000inte/page/153). ISBN 978-1558601413.
27. Frey, Peter W.; Slate, David J. (1991). "Letter recognition using Holland-style adaptive
classifiers" (https://doi.org/10.1007%2FBF00114162). Machine Learning. 6 (2): 161–182.
doi:10.1007/BF00114162 (https://doi.org/10.1007%2FBF00114162). ISSN 0885-6125 (http
s://www.worldcat.org/issn/0885-6125).
28. Valenzuela-Rendón, Manuel. "The Fuzzy Classifier System: A Classifier System for
Continuously Varying Variables (http://sci2s.ugr.es/sites/default/files/files/TematicWebSites/
GeneticFuzzySystems/(1991)_Valenzuela-Rendon.pdf)." In ICGA, pp. 346-353. 1991.
29. Riolo, Rick L. (1988-01-01). Empirical Studies of Default Hierarchies and Sequences of
Rules in Learning Classifier Systems (http://dl.acm.org/citation.cfm?id=914945) (Thesis).
Ann Arbor, MI, USA: University of Michigan.
30. R.L., Riolo (1987-01-01). "Bucket brigade performance. I. Long sequences of classifiers" (htt
p://agris.fao.org/agris-search/search.do?recordID=US201301782174). Genetic Algorithms
and Their Applications: Proceedings of the Second International Conference on Genetic
Algorithms: July 28–31, 1987 at the Massachusetts Institute of Technology, Cambridge, MA.
31. R.L., Riolo (1987-01-01). "Bucket brigade performance. II. Default hierarchies" (http://agris.fa
o.org/agris-search/search.do?recordID=US201301782175). Genetic Algorithms and Their
Applications: Proceedings of the Second International Conference on Genetic Algorithms:
July 28–31, 1987 at the Massachusetts Institute of Technology, Cambridge, MA.
32. W. Stolzmann, "Anticipatory classifier systems," in Proceedings of the 3rd Annual Genetic
Programming Conference, pp. 658–664, 1998.
33. Riolo, Rick L. (1990-01-01). Lookahead Planning and Latent Learning in a Classifier System
(http://dl.acm.org/citation.cfm?id=116517.116553). Proceedings of the First International
Conference on Simulation of Adaptive Behavior on from Animals to Animats. Cambridge,
MA, USA: MIT Press. pp. 316–326. ISBN 978-0262631389.
34. Watkins, Christopher John Cornish Hellaby. "Learning from delayed rewards." PhD diss.,
University of Cambridge, 1989.
35. Wilson, Stewart W. (1994-03-01). "ZCS: A Zeroth Level Classifier System". Evolutionary
Computation. 2 (1): 1–18. CiteSeerX 10.1.1.363.798 (https://citeseerx.ist.psu.edu/viewdoc/su
mmary?doi=10.1.1.363.798). doi:10.1162/evco.1994.2.1.1 (https://doi.org/10.1162%2Fevco.
1994.2.1.1). ISSN 1063-6560 (https://www.worldcat.org/issn/1063-6560). S2CID 17680778
(https://api.semanticscholar.org/CorpusID:17680778).
36. Lanzi, P. L. (2002). "Learning classifier systems from a reinforcement learning perspective".
Soft Computing. 6 (3–4): 162–170. doi:10.1007/s005000100113 (https://doi.org/10.1007%2F
s005000100113). ISSN 1432-7643 (https://www.worldcat.org/issn/1432-7643).
S2CID 39103390 (https://api.semanticscholar.org/CorpusID:39103390).
37. Kovacs, Timothy Michael Douglas. A Comparison of Strength and Accuracy-based Fitness
in Learning and Classifier Systems. 2002.
38. Kovacs, Tim. "Two views of classifier systems." In International Workshop on Learning
Classifier Systems, pp. 74-87. Springer Berlin Heidelberg, 2001 (https://link.springer.com/ch
apter/10.1007/3-540-48104-4_6)
39. Congdon, Clare Bates. "A comparison of genetic algorithms and other machine learning
systems on a complex classification task from common disease research." PhD diss., The
University of Michigan, 1995.
40. Holmes, John H. (1996-01-01). "A Genetics-Based Machine Learning Approach to
Knowledge Discovery in Clinical Data" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2233
061). Proceedings of the AMIA Annual Fall Symposium: 883. ISSN 1091-8280 (https://www.
worldcat.org/issn/1091-8280). PMC 2233061 (https://www.ncbi.nlm.nih.gov/pmc/articles/PM
C2233061).
41. Holmes, John H. "Discovering Risk of Disease with a Learning Classifier System (https://we
b.archive.org/web/20180820234915/https://pdfs.semanticscholar.org/71e4/eb6c630dee4b76
2e74b2970f6dc638a351ab.pdf)." In ICGA, pp. 426-433. 1997.
42. Holmes, John H., and Jennifer A. Sager. "Rule discovery in epidemiologic surveillance data
using EpiXCS: an evolutionary computation approach (https://link.springer.com/10.1007%2F
11527770_60)." InConference on Artificial Intelligence in Medicine in Europe, pp. 444-452.
Springer Berlin Heidelberg, 2005.
43. Butz, Martin V. "Biasing exploration in an anticipatory learning classifier system (https://web.
archive.org/web/20180820234943/https://pdfs.semanticscholar.org/3572/7a56fcce7a73ccc4
3e5bfa19389780e6d436.pdf)." In International Workshop on Learning Classifier Systems,
pp. 3-22. Springer Berlin Heidelberg, 2001.
44. Wilson, Stewart W. (2002). "Classifiers that approximate functions". Natural Computing. 1
(2–3): 211–234. doi:10.1023/A:1016535925043 (https://doi.org/10.1023%2FA%3A10165359
25043). ISSN 1567-7818 (https://www.worldcat.org/issn/1567-7818). S2CID 23032802 (http
s://api.semanticscholar.org/CorpusID:23032802).
45. Bull, Larry. "A simple accuracy-based learning classifier system (https://web.archive.org/we
b/20180820234941/https://pdfs.semanticscholar.org/120c/8f5057995c36ee60ec320c2263b
20af05444.pdf)." Learning Classifier Systems Group Technical Report UWELCSG03-005,
University of the West of England, Bristol, UK (2003).
46. Bull, Larry. "A simple payoff-based learning classifier system (https://link.springer.com/chapt
er/10.1007/978-3-540-30217-9_104)." InInternational Conference on Parallel Problem
Solving from Nature, pp. 1032-1041. Springer Berlin Heidelberg, 2004.
47. Peñarroya, Jaume Bacardit. "Pittsburgh genetic-based machine learning in the data mining
era: representations, generalization, and run-time." PhD diss., Universitat Ramon Llull,
2004.
48. Bacardit, Jaume; Burke, Edmund K.; Krasnogor, Natalio (2008-12-12). "Improving the
scalability of rule-based evolutionary learning". Memetic Computing. 1 (1): 55–67.
doi:10.1007/s12293-008-0005-4 (https://doi.org/10.1007%2Fs12293-008-0005-4).
ISSN 1865-9284 (https://www.worldcat.org/issn/1865-9284). S2CID 775199 (https://api.sem
anticscholar.org/CorpusID:775199).
49. Drugowitsch, Jan (2008). Design and Analysis of Learning Classifier Systems - Springer.
Studies in Computational Intelligence. Vol. 139. doi:10.1007/978-3-540-79866-8 (https://doi.
org/10.1007%2F978-3-540-79866-8). ISBN 978-3-540-79865-1.
50. Urbanowicz, Ryan J., Gediminas Bertasius, and Jason H. Moore. "An extended michigan-
style learning classifier system for flexible supervised learning, classification, and data
mining (http://www.seas.upenn.edu/~gberta/uploads/3/1/4/8/31486883/urbanowicz_2014_e
xstracs_algorithm.pdf)." In International Conference on Parallel Problem Solving from
Nature, pp. 211-221. Springer International Publishing, 2014.
51. Urbanowicz, Ryan J., Delaney Granizo-Mackenzie, and Jason H. Moore. "Using expert
knowledge to guide covering and mutation in a michigan style learning classifier system to
detect epistasis and heterogeneity (https://web.archive.org/web/20180820234834/https://pdf
s.semanticscholar.org/b407/8f8bb6aa9e39e84b0b20874662a6ed8b7df1.pdf)."
InInternational Conference on Parallel Problem Solving from Nature, pp. 266-275. Springer
Berlin Heidelberg, 2012.
52. Urbanowicz, Ryan; Granizo-Mackenzie, Ambrose; Moore, Jason (2012-01-01). Instance-
linked Attribute Tracking and Feedback for Michigan-style Supervised Learning Classifier
Systems. Proceedings of the 14th Annual Conference on Genetic and Evolutionary
Computation. GECCO '12. New York, NY, USA: ACM. pp. 927–934.
doi:10.1145/2330163.2330291 (https://doi.org/10.1145%2F2330163.2330291).
ISBN 9781450311779. S2CID 142534 (https://api.semanticscholar.org/CorpusID:142534).
53. Bacardit, Jaume; Krasnogor, Natalio (2009-01-01). A Mixed Discrete-continuous Attribute
List Representation for Large Scale Classification Domains. Proceedings of the 11th Annual
Conference on Genetic and Evolutionary Computation. GECCO '09. New York, NY, USA:
ACM. pp. 1155–1162. CiteSeerX 10.1.1.158.7314 (https://citeseerx.ist.psu.edu/viewdoc/sum
mary?doi=10.1.1.158.7314). doi:10.1145/1569901.1570057 (https://doi.org/10.1145%2F156
9901.1570057). ISBN 9781605583259. S2CID 10906515 (https://api.semanticscholar.org/C
orpusID:10906515).
54. Iqbal, Muhammad; Browne, Will N.; Zhang, Mengjie (2014-08-01). "Reusing Building Blocks
of Extracted Knowledge to Solve Complex, Large-Scale Boolean Problems". IEEE
Transactions on Evolutionary Computation. 18 (4): 465–480.
doi:10.1109/tevc.2013.2281537 (https://doi.org/10.1109%2Ftevc.2013.2281537).
S2CID 525358 (https://api.semanticscholar.org/CorpusID:525358).
55. Booker, L. B.; Goldberg, D. E.; Holland, J. H. (1989-09-01). "Classifier systems and genetic
algorithms" (https://deepblue.lib.umich.edu/bitstream/2027.42/27777/1/0000171.pdf) (PDF).
Artificial Intelligence. 40 (1): 235–282. doi:10.1016/0004-3702(89)90050-7 (https://doi.org/1
0.1016%2F0004-3702%2889%2990050-7). hdl:2027.42/27777 (https://hdl.handle.net/2027.
42%2F27777).
56. Wilson, Stewart W., and David E. Goldberg. "A critical review of classifier systems." In
Proceedings of the third international conference on Genetic algorithms, pp. 244-255.
Morgan Kaufmann Publishers Inc., 1989.

External links

Video tutorial
Learning Classifier Systems in a Nutshell (https://www.youtube.com/watch?v=CRge_cZ2cJ
c) - (2016) Go inside a basic LCS algorithm to learn their components and how they work.

Webpages
LCS & GBML Central (http://gbml.org/)
UWE Learning Classifier Research Group (http://www.cems.uwe.ac.uk/lcsg/)
Prediction Dynamics (http://prediction-dynamics.com/)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Learning_classifier_system&oldid=1155243119"

You might also like