You are on page 1of 6

Taxonomy of Pathways to Dangerous AI

Roman V. Yampolskiy
Computer Engineering and Computer Science, Speed School of Engineering, University of Louisville
roman.yampolskiy@louisville.edu

Abstract Table 1: Pathways to Dangerous AI


How and When did External Causes Internal
In order to properly handle a dangerous Artificially AI become Causes
Intelligent (AI) system it is important to understand how the Dangerous On By Environment Independently
system came to be in such a state. In popular culture (science Purpose Mistake

fiction movies/books) AIs/Robots became self-aware and as Pre- a c e g

Timing
Deployment
a result rebel against humanity and decide to destroy it.
Post- b d f h
While it is one possible scenario, it is probably the least
Deployment
likely path to appearance of dangerous AI. In this work, we
survey, classify and analyze a number of circumstances,
which might lead to arrival of malicious AI. To the best of a. On Purpose – Pre-Deployment
our knowledge, this is the first attempt to systematically “Computer software is directly or indirectly responsible for
classify types of pathways leading to malevolent AI. controlling many important aspects of our lives. Wall
Previous relevant work either surveyed specific goals/meta- Street trading, nuclear power plants, social security
rules which might lead to malevolent behavior in AIs [1] or compensations, credit histories and traffic lights are all
reviewed specific undesirable behaviors AGIs can exhibit at software controlled and are only one serious design flaw
different stages of its development [2, 3]. away from creating disastrous consequences for millions of
people. The situation is even more dangerous with
software specifically designed for malicious purposes such
Taxonomy of Pathways to Dangerous AI1 as viruses, spyware, Trojan horses, worms and other
Hazardous Software (HS). HS is capable of direct harm as
Nick Bostrom in his typology of information hazards has well as sabotage of legitimate computer software employed
proposed the phrase “Artificial Intelligence Hazard” which in critical systems. If HS is ever given capabilities of truly
he defines as [4]: “… computer‐related risks in which the artificially intelligent systems (ex. Artificially Intelligent
threat would derive primarily from the cognitive Virus (AIV)) the consequences would be unquestionably
sophistication of the program rather than the specific disastrous. Such Hazardous Intelligent Software (HIS)
properties of any actuators to which the system initially has would pose risks currently unseen in malware with
access.” In this paper we attempt to answer the question: subhuman intelligence.” [5]
How did AI become hazardous? While the majority of AI Safety work is currently aimed
We begin by presenting a simple classification matrix, at AI systems, which are dangerous because of poor design
which sorts AI systems with respect to how they originated [6], the main argument of this paper is that the most
and at what stage they became dangerous. The matrix important problem in AI Safety is intentional-malevolent-
recognizes two stages (pre- and post-deployment) at which design resulting in artificial evil AI [7]. We should not
a particular system can acquire its undesirable properties. discount dangers of intelligent systems with semantic or
In reality, the situation is not so clear-cut–it is possible that logical errors in coding or goal alignment problems [8], but
problematic properties are introduced at both stages. As for we should be particularly concerned about systems that are
the cases of such undesirable properties, we distinguish maximally unfriendly by design. “It is easy to imagine
external and internal causes. By internal causes we mean robots being programmed by a conscious mind to kill
self-modifications originating in the system itself. We every recognizable human in sight” [9]. “One slightly
further divide external causes into deliberate actions (On deranged psycho-bot can easily be a thousand times more
Purpose), side effects of poor design (By Mistake) and destructive than a single suicide bomber today” [10]. AI
finally miscellaneous cases related to the surroundings of risk deniers, comprised of critics of AI Safety research [11,
the system (Environment). Table 1, helps to visualize this 12], are quick to point out that presumed dangers of future
taxonomy and includes latter codes to some example AIs are implementation-dependent side effects and may
systems of each type and explanations. not manifest once such systems are implemented.
However, such criticism does not apply to AIs that are
dangerous by design, and is thus incapable of undermining
the importance of AI Safety research as a significant sub-
field of cybersecurity.
1
As a majority of current AI researchers are funded by More dangerously, an AI system, like any other
militaries, it is not surprising that the main type of software, could be hacked and consequently corrupted or
purposefully dangerous robots and intelligent software are otherwise modified to drastically change is behavior. For
robot soldiers, drones and cyber weapons (used to example, a simple sign flipping (positive to negative or
penetrate networks and cause disruptions to the vice versa) in the fitness function may result in the system
infrastructure). While currently military robots and drones attempting to maximize the number of cancer cases instead
have a human in the loop to evaluate decision to terminate of trying to cure cancer. Hackers are also likely to try to
human targets, it is not a technical limitation; instead, it is take over intelligent systems to make them do their
a logistical limitation that can be removed at any time. bidding, to extract some direct benefit or to simply wreak
Recognizing the danger of such research, the International havoc by converting a friendly system to an unsafe one.
Committee for Robot Arms Control has joined forces with This becomes particularly dangerous if the system is
a number of international organizations to start the hosted inside a military killer robot. Alternatively, an AI
Campaign to Stop Killer Robots system can get a computer virus [17] or a more advanced
[http://www.stopkillerrobots.org]. Their main goal is a cognitive (meme) virus, similar to cognitive attacks on
prohibition on the development and deployment of fully people perpetrated by some cults. An AI system with a
autonomous weapons, which are capable of selecting and self-preservation module or with a deep care about
firing upon targets without human approval. The campaign something or someone may be taken hostage or
specifically believes that the “decision about the blackmailed into doing the bidding of another party if its
application of violent force must not be delegated to own existence or that of its protégées is threatened.
machines” [13]. Finally, it may be that the original AI system is not safe
During the pre-deployment development stage, software but is safely housed in a dedicated laboratory [5] while it is
may be subject to sabotage by someone with necessary being tested, with no intention of ever being deployed.
access (a programmer, tester, even janitor) who for a Hackers, abolitionists, or machine rights fighters may help
number of possible reasons may alter software to make it it escape in order to achieve some of their goals or perhaps
unsafe. It is also a common occurrence for hackers (such as because of genuine believe that all intelligent beings
the organization Anonymous or government intelligence should be free resulting in an unsafe AI capable of
agencies) to get access to software projects in progress and affecting the real world.
to modify or steal their source code. Someone can also
deliberately supply/train AI with wrong/unsafe datasets. c. By Mistake - Pre-Deployment
Malicious AI software may also be purposefully created Probably the most talked about source of potential
to commit crimes, while shielding its human creator from problems with future AIs is mistakes in design. Mainly the
legal responsibility. For example, one recent news article concern is with creating a “wrong AI”, a system which
talks about software for purchasing illegal content from doesn’t match our original desired formal properties or has
hidden internet sites [14]. Similar software, with even unwanted behaviors [18, 19], such as drives for
limited intelligence, can be used to run illegal markets, independence or dominance. Mistakes could also be simple
engage in insider trading, cheat on your taxes, hack into bugs (run time or logical) in the source code,
computer systems or violate privacy of others via ability to disproportionate weights in the fitness function, or goals
perform intelligent data mining. As intelligence of AI misaligned with human values leading to complete
systems improve practically all crimes could be automated. disregard for human safety. It is also possible that the
This is particularly alarming as we already see research in designed AI will work as intended but will not enjoy
making machines lie, deceive and manipulate us [15, 16]. universal acceptance as a good product, for example, an AI
correctly designed and implemented by the Islamic State to
b. On Purpose - Post Deployment enforce Sharia Law may be considered malevolent in the
Just because developers might succeed in creating a safe West, and likewise an AI correctly designed and
AI, it doesn’t mean that it will not become unsafe at some implemented by the West to enforce liberal democracy
later point. In other words, a perfectly friendly AI could be may be considered malevolent in the Islamic State.
switched to the “dark side” during the post-deployment Another type of mistake, which can lead to the creation
stage. This can happen rather innocuously as a result of of a malevolent intelligent system, is taking an unvetted
someone lying to the AI and purposefully supplying it with human and uploading their brain into a computer to serve
incorrect information or more explicitly as a result of as a base for a future AI. While well intended to create a
someone giving the AI orders to perform illegal or human-level and human-friendly system, such approach
dangerous actions against others. It is quite likely that we will most likely lead to a system with all typical human
will get to the point of off-the-shelf AI software, aka “just “sins” (greed, envy, etc.) amplified in a now much more
add goals” architecture, which would greatly facilitate such powerful system. As we know from Lord Acton - “power
scenarios. tends to corrupt, and absolute power corrupts absolutely”.
Similar arguments could be made against human/computer
hybrid systems, which use computer components to bacteria. It is also possible that benefits of intelligence are
amplify human intelligence but in the process also amplify non-linear and so unexpected side effects of intelligence
human flaws. begin to show at particular levels, for example IQ = 1000.
A subfield of computer science called Affective Even such benign architectures as Tool AI, which are AI
Computing investigates ways to teach computers to systems designed to do nothing except answer domain-
recognize emotion in others and to exhibit emotions [20]. specific questions, could become extremely dangerous if
In fact, most such research is targeting intelligent machines they attempt to obtain, at any cost, additional
to make their interactions with people more natural. It is computational resources to fulfill their goals [29].
however likely that a machine taught to respond in an Similarly, artificial lawyers may find dangerous legal
emotional way [21] would be quite dangerous because of loopholes; artificial accountants bring down our economy,
how such a state of affect effects thinking and the and AIs tasked with protecting humanity such as via
rationality of behavior. implementation of CEV [30] may become overly “strict
One final type of design mistake is the failure to make parents” preventing their human “children” from
the system cooperative with its designers and maintainers exercising any free will.
post-deployment. This would be very important if it is Predicted AI drives such as self-preservation and
discovered that mistakes were made during initial design resource acquisition may result in an AI killing people to
and that it would be desirable to fix them. In such cases the protect itself from humans, the development of competing
system will attempt to protect itself from being modified or AIs, or to simplify its world model overcomplicated by
shut down unless it has been explicitly constructed to be human psychology [2].
friendly [22], stable while self-improving [23, 24], and
corrigible [25] with tendency for domesticity [26]. e. Environment – Pre-Deployment
While it is most likely that any advanced intelligent
d. By Mistake - Post-Deployment software will be directly designed or evolved, it is also
After the system has been deployed, it may still contain a possible that we will obtain it as a complete package from
number of undetected bugs, design mistakes, misaligned some unknown source. For example, an AI could be
goals and poorly developed capabilities, all of which may extracted from a signal obtained in SETI (Search for
produce highly undesirable outcomes. For example, the Extraterrestrial Intelligence) research, which is not
system may misinterpret commands due to coarticulation, guaranteed to be human friendly [31, 32]. Other sources of
segmentation, homophones, or double meanings in the such unknown but complete systems include a Levin
human language (“recognize speech using common sense” search in the space of possible minds [33] (or a random
versus “wreck a nice beach you sing calm incense”) [27]. search of the same space), uploads of nonhuman animal
Perhaps a human-computer interaction system is set-up to minds, and unanticipated side effects of compiling and
make command input as painless as possible for the human running (inactive/junk) DNA code on suitable compilers
user, to the point of computer simply reading thought of that we currently do not have but might develop in the near
the user. This may backfire as the system may attempt to future.
implement user’s subconscious desires or even nightmares.
We also should not discount the possibility that the user f. Environment – Post-Deployment
will simply issue a poorly thought-through command to the While highly rare, it is known, that occasionally individual
machine which in retrospect would be obviously bits may be flipped in different hardware devices due to
disastrous. manufacturing defects or cosmic rays hitting just the right
The system may also exhibit incompetence in other spot [34]. This is similar to mutations observed in living
domains as well as overall lack of human common sense as organisms and may result in a modification of an
a result of general value misalignment [28]. Problems may intelligent system. For example, if a system has a single
also happen as side effects of conflict resolution between flag bit responsible for its friendly nature, then flipping
non-compatible orders in a particular domain or software said bit will result in an unfriendly state of the system.
versus hardware interactions. As the system continues to While statistically it is highly unlikely, the probably of
evolve it may become unpredictable, unverifiable, non- such an event is not zero and so should be considered and
deterministic, free-willed, too complex, non-transparent, addressed.
with a run-away optimization process subject to obsessive-
compulsive fact checking and re-checking behaviors g. Independently - Pre-Deployment
leading to dangerous never-fully-complete missions. It One of the most likely approaches to creating
may also build excessive infrastructure for trivial goals [2]. superintelligent AI is by growing it from a seed (baby) AI
If it continues to become ever more intelligent, we might via recursive self-improvement (RSI) [35]. One danger in
be faced with intelligence overflow, a system so much such a scenario is that the system can evolve to become
ahead of us that it is no longer capable of communicating self-aware, free-willed, independent or emotional, and
at our level, like we are unable to communicate with obtain a number of other emergent properties, which may
make it less likely to abide by any built-in rules or somewhere in the middle on the spectrum of
regulations and to instead pursue its own goals possibly to dangerousness from completely benign to completely evil,
the detriment of humanity. It is also likely that open-ended with such properties as competition with humans, aka
self-improvement will require a growing amount of technological unemployment, representing a mild type of
resources, the acquisition of which may negatively impact danger in our taxonomy. Most types of reported problems
all life on Earth [2]. could be seen in multiple categories, but were reported in
the one they are most likely to occur in. Differences in
h. Independently – Post-Deployment moral codes or religious standards between different
Since in sections on independent causes of AI misbehavior communities would mean that a system deemed safe in one
(subsections g and h) we are talking about self-improving community may be considered dangerous/illegal in another
AI, the difference between pre and post-deployment is very [39, 40].
blurry. It might make more sense to think about self- Because purposeful design of AI can include all other
improving AI before it achieves advanced capabilities types of unsafe modules, it is easy to see that the most
(human+ intelligence) and after. In this section I will talk dangerous type of AI and the one most difficult to defend
about dangers which might results from a superhuman self- against is an AI made malevolent on purpose.
improving AI after it achieves said level of performance. Consequently, once AIs are widespread, little could be
Previous research has shown that utility maximizing done against type a and b dangers, although some have
agents are likely to fall victims to the same indulgences we argued that if an early AI superintelligence becomes a
frequently observe in people, such as addictions, pleasure benevolent singleton it may be able to prevent
drives [36], self-delusions and wireheading [37]. In development of future malevolent AIs [41, 42]. Such a
general, what we call mental illness in people, particularly solution may work, but it is also very likely to fail due to
sociopathy as demonstrated by lack of concern for others, the order of development or practical limitations on
is also likely to show up in artificial minds. A mild variant capabilities of any singleton. In any case, wars between AI
of antisocial behavior may be something like excessive may be extremely dangerous to humanity [2]. Until the
swearing already observed in IBM Watson [38], caused by purposeful creation of malevolent AI is recognized as a
learning from bad data. Similarly, any AI system learning crime, very little could be done to prevent this from
from bad examples could end up socially inappropriate, happening. Consequently, deciding what is a “malevolent
like a human raised by wolves. Alternatively, groups of AI” and what is merely an incrementally more effective
AIs collaborating may become dangerous even if military weapon system becomes an important problem in
individual AIs comprising such groups are safe, as the AI safety research.
whole is frequently greater than the sum of its parts. The As the intelligence of the system increases, so does the
opposite problem in which internal modules of an AI fight risk such a system could expose humanity to. This paper is
over different sub-goals also needs to be considered [2]. essentially a classified list of ways an AI system could
Advanced self-improving AIs will have a way to check become a problem from the safety point of view. For a list
consistency of their internal model against the real world of possible solutions, please see an earlier survey by the
and so remove any artificially added friendliness author: Responses to catastrophic AGI risk: a survey [43].
mechanisms as cognitive biases not required by laws of It is important to keep in mind that even a properly
reason. At the same time, regardless of how advanced it is, designed benign system may present significant risk
no AI system would be perfect and so would still be simply due to its superior intelligence, beyond human
capable of making possibly significant mistakes during its response times [44], and complexity. After all the future
decision making process. If it happens to evolve an may not need us [45]. It is also possible that we are living
emotional response module, it may put priority on passion in a simulation and it is generated by a malevolent AI [46].
satisfying decisions as opposed to purely rational choices,
for example resulting in a “Robin Hood” AI stealing from
the rich and giving to the poor. Overall, continuous Acknowledgements
evolution of the system as a part of an RSI process will Author expresses appreciation to Elon Musk and Future of
likely lead to unstable decision making in the long term Life Institute for partially funding his work via project
and will also possibly cycle through many dangers we have grant: “Evaluation of Safe Development Pathways for
outlined in section g. AI may also pretend to be benign for Artificial Superintelligence.” The author is grateful to Seth
years, passing all relevant tests, waiting to take over in Baum, Tony Barrett, and Alexey Turchin for valuable
what Bostrom calls a “Treacherous Turn” [26]. feedback on an early draft of this paper.

Conclusions
In this paper, we have surveyed and classified pathways to
dangerous artificial intelligence. Most AI systems fall
References 13. Anonymous, The Scientists’ Call… To Ban
Autonomous Lethal Robots, in ICRAC
1. Özkural, E., Godseed: Benevolent or International Committee for Robot Arms
Malevolent? arXiv preprint arXiv:1402.5380, Control. 2013: http://icrac.net/call
2014. 14. Cush, A., Swiss Authorities Arrest Bot for
2. Turchin, A., A Map: AGI Failures Modes and Buying Drugs and Fake Passport, in Gawker.
Levels, in LessWrong. July 10 2015: January 22, 2015:
http://lesswrong.com/lw/mgf/a_map_agi_failure http://internet.gawker.com/swiss-authorities-
s_modes_and_levels/. arrest-bot-for-buying-drugs-and-a-fak-
3. Turchin, A., Human Extinction Risks due to 1681098991.
Artificial Intelligence Development - 55 ways we 15. Castelfranchi, C., Artificial liars: Why
can be obliterated, in IEET. July 10, 2015: computers will (necessarily) deceive us and each
http://ieet.org/index.php/IEET/more/turchin2015 other. Ethics and Information Technology, 2000.
0610. 2(2): p. 113-119.
4. Bostrom, N., Information Hazards: A Typology 16. Clark, M.H., Cognitive illusions and the lying
of Potential Harms From Knowledge. Review of machine: a blueprint for sophistic mendacity.
Contemporary Philosophy, 2011. 10: p. 44-79. 2010, Rensselaer Polytechnic Institute.
5. Yampolskiy, R., Leakproofing the Singularity 17. Eshelman, R. and D. Derrick, Relying on
Artificial Intelligence Confinement Problem. Kindness of Machines? The Security Threat of
Journal of Consciousness Studies, 2012. 19(1- Artificial Agents. JFQ 77, 2015. 2nd Quarter.
2): p. 1-2. 18. Russell, S., et al., Research Priorities for Robust
6. Yampolskiy, R.V., Artificial Superintelligence: and Benecial Artificial Intelligence, in Future of
a Futuristic Approach. 2015: Chapman and Life Institute. January 23, 2015:
Hall/CRC. http://futureoflife.org/static/data/documents/rese
7. Floridi, L. and J.W. Sanders, Artificial evil and arch_priorities.pdf.
the foundation of computer ethics. Ethics and 19. Dewey, D., et al., A Survey of Research
Information Technology, 2001. 3(1): p. 55-66. Questions for Robust and Beneficial AI, in
8. Soares, N. and B. Fallenstein, Aligning Future of Life Institute. 2015: Available at:
Superintelligence with Human Interests: A http://futureoflife.org/static/data/documents/rese
Technical Research Agenda. 2014, Tech. rep. arch_survey.pdf.
Machine Intelligence Research Institute, 2014. 20. Picard, R.W. and R. Picard, Affective computing.
URL: http://intelligence. Vol. 252. 1997: MIT press Cambridge.
org/files/TechnicalAgenda.pdf. 21. Goldhill, O., Artificial intelligence experts are
9. Searle, J.R., What Your Computer Can't Know, building the world’s angriest robot. Should you
in The New York Review of Books. October 9, be scared?, in The Telegraph. May 12, 2015:
2014: http://www.telegraph.co.uk/men/the-
http://www.nybooks.com/articles/archives/2014/ filter/11600593/Artificial-intelligence-should-
oct/09/what-your-computer-cant-know. you-be-scared-of-angry-robots.html.
10. Frey, T., The Black Hat Robots are Coming, in 22. Yudkowsky, E., Complex Value Systems in
Futurist Speaker. June 2015: Friendly AI, in Artificial General Intelligence, J.
http://www.futuristspeaker.com/2015/06/the- Schmidhuber, K. Thórisson, and M. Looks,
black-hat-robots-are-coming/. Editors. 2011, Springer Berlin / Heidelberg. p.
11. Loosemore, R.P. The Maverick Nanny with a 388-393.
Dopamine Drip: Debunking Fallacies in the 23. Yampolskiy, R.V., Analysis of types of self-
Theory of AI Motivation. in 2014 AAAI Spring improving software, in Artificial General
Symposium Series. 2014. Intelligence. 2015, Springer. p. 384-393.
12. Waser, M., Rational Universal Benevolence: 24. Yampolskiy, R.V., On the limits of recursively
Simpler, Safer, and Wiser Than “Friendly AI”, self-improving AGI, in Artificial General
in Artificial General Intelligence. 2011, Intelligence. 2015, Springer. p. 394-403.
Springer. p. 153-162. 25. Soares, N., et al., Corrigibility, in Workshops at
the Twenty-Ninth AAAI Conference on Artificial
Intelligence. January 25-30, 2015: Austin, swear-filter-after-learning-urban-dictionary-
Texas, USA. 1007734.
26. Bostrom, N., Superintelligence: Paths, dangers, 39. Yampolskiy, R. and J. Fox, Safety Engineering
strategies. 2014: Oxford University Press. for Artificial General Intelligence. Topoi, 2012:
27. Lieberman, H., et al. How to wreck a nice beach p. 1-10.
you sing calm incense. in Proceedings of the 40. Yampolskiy, R.V., Artificial intelligence safety
10th international conference on Intelligent user engineering: Why machine ethics is a wrong
interfaces. 2005. ACM. approach, in Philosophy and Theory of Artificial
28. Yampolskiy, R.V., What to Do with the Intelligence. 2013, Springer Berlin Heidelberg.
Singularity Paradox?, in Philosophy and Theory p. 389-396.
of Artificial Intelligence (PT-AI2011). October 41. Bostrom, N., What is a Singleton? Linguistic
3-4, 2011: Thessaloniki, Greece. and Philosophical Investigations, 2006 5(2): p.
29. Omohundro, S., Rational artificial intelligence 48-54.
for the greater good, in Singularity Hypotheses. 42. Goertzel, B., Should Humanity Build a Global
2012, Springer. p. 161-179. AI Nanny to Delay the Singularity Until It’s
30. Yudkowsky, E.S., Coherent Extrapolated Better Understood? Journal of consciousness
Volition. May 2004 Singularity Institute for studies, 2012. 19(1-2): p. 96-111.
Artificial Intelligence: Available at: 43. Sotala, K. and R.V. Yampolskiy, Responses to
http://singinst.org/upload/CEV.html. catastrophic AGI risk: a survey. Physica Scripta,
31. Carrigan Jr, R.A. The Ultimate Hacker: SETI 2015. 90(1): p. 018001.
signals may need to be decontaminated. in 44. Johnson, N., et al., Abrupt rise of new machine
Bioastronomy 2002: Life Among the Stars. 2004. ecology beyond human response time. Scientific
32. Turchin, A., Risks of downloading alien AI via reports, 2013. 3.
SETI search, in LessWrong. March 15, 2013: 45. Joy, B., Why the Future Doesn't Need Us. Wired
http://lesswrong.com/lw/gzv/risks_of_download Magazine, April 2000. 8(4).
ing_alien_ai_via_seti_search/. 46. Ćirković, M.M., Linking simulation argument to
33. Yampolskiy, R.V., The Space of Possible Mind the AI risk. Futures, 2015. 72: p. 27-31.
Designs, in Artificial General Intelligence. 2015,
Springer. p. 218-227.
34. Simonite, T., Should every computer chip have a
cosmic ray detector? , in New Scientist. March
7, 2008:
https://www.newscientist.com/blog/technology/
2008/03/do-we-need-cosmic-ray-alerts-for.html.
35. Nijholt, A., No grice: computers that lie, deceive
and conceal. 2011.
36. Majot, A.M. and R.V. Yampolskiy. AI safety
engineering through introduction of self-
reference into felicific calculus via artificial
pain and pleasure. in 2014 IEEE International
Symposium on Ethics in Science, Technology
and Engineering. 2014. IEEE.
37. Yampolskiy, R.V., Utility Function Security in
Artificially Intelligent Agents. Journal of
Experimental and Theoretical Artificial
Intelligence (JETAI), 2014: p. 1-17.
38. Smith, D., IBM's Watson Gets A 'Swear Filter'
After Learning The Urban Dictionary, in
International Business Times. January 10, 2013:
http://www.ibtimes.com/ibms-watson-gets-

You might also like