You are on page 1of 42

Week 9

PERFORMING DATA ANALYSIS AND


EMPLOYING MACHINE LEARNING IN AI
WEEK 9: PERFORMING DATA ANALYSIS AND EMPLOYING MACHINE LEARNING IN AI

OBJECTIVES:
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Defining Data Analysis

The current era is called the Information age not simply because we have
become so data rich but also because society has reached a certain maturity
in analyzing and extracting information from it. Companies such as
Alphabet (Google), Amazon, Apple, Facebook, and Microsoft, which have
built their businesses on data, are viewed as the top five most valuable
companies in the world.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

You can categorize all these transformations in four large and general families
that provide an idea of what happens during data analysis:

» Transforming: Changes the data’s appearance. The term transforming refers to


different processes, though the most common is putting data into ordered rows and
columns in a matrix format (also called flat-file transformation).

» Cleansing: Fixes imperfect data. Depending on the means of acquiring the data, you
may find different problems from missing information, extremes in range, or simply
wrong values.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

» Inspecting: Validates the data. Data analysis is a mainly human job,


though software plays a big role. Humans can easily recognize patterns and
spot strange data elements.

» Modeling: Grasps the relationship between the elements present in data.


To perform this task, you need tools taken from statistics, such as
correlations,t-tests, linear regression, and many others that can determine
whether a value truly is different from another or just related.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Understanding why analysis is important

Data analysis is essential to AI. In fact, no modern AI is possible


without visualizing, cleaning, transforming, and modeling data
before advanced algorithms enter the process and turn it into
information of even higher value than before.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Reconsidering the value of data

With the explosion of data availability on digital devices (as


discussed in Chapter 2), data assumes new nuances of value and
usefulness beyond its initial scope of instructing (teaching) and
transmitting knowledge (transferring data).
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

The abundance of data, when provided to data analysis, acquires new


functions that distinguish it from the informative ones:

» Data describes the world better by presenting a wide variety of facts, and
in more detail by providing nuances for each fact. It has become so
abundant that it covers every aspect of reality.

» Data shows how facts associate with events. You can derive general rules
and learn how the world will change or transform, given certain premises.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Defining Machine Learning

The pinnacle of data analysis is machine learning. You can successfully


apply machine learning only after data analysis provides correct input.
However, only machine learning can associate a series of outputs and
inputs, as well as determine the working rules behind the output in an
effective way.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Understanding how machine learning works

Many people are used to the idea that applications start with a function,
accept data as input, and then provide a result. For example, a programmer
might create a function called Add() that accepts two values as input, such
as 1 and 2. The result of Add() is 3. The output of this process is a value. In
the past, writing a program meant understanding the function used to
manipulate data to create a given result with certain inputs.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Understanding the benefits of machine learning

You find AI and machine learning used in a great many


applications today. The only problem is that the technology works
so well that you don’t know that it even exists. In fact, you might
be surprised to find that many devices in your home already
make use of both technologies.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Here are just a few of the ways in which you might see AI used:

» Fraud detection: You get a call from your credit card company asking
whether you made a particular purchase.

» Resource scheduling: Many organizations need to schedule the use of


resources efficiently.

» Complex analysis: Humans often need help with complex analysis


because there are literally too many factors to consider.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

» Automation: Any form of automation can benefit from the addition of AI to handle
unexpected changes or events.

» Customer service: The customer service line you call today may not even have a
human behind it.

» Safety systems: Many of the safety systems found in machines of various sorts today
rely on AI to take over the vehicle in a time of crisis

» Machine efficiency: AI can help control a machine in such a manner as to obtain


maximum efficiency.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Here are a few uses for machine learning that you might not associate
with an AI:

» Access control: In many cases, access control is a yes or no proposition.

» Animal protection: The ocean might seem large enough to allow animals
and ships to cohabitate without problem.

» Predicting wait times: Most people don’t like waiting when they have no
idea how long the wait will be.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Being useful; being mundane

Even though the movies suggest that AI is sure to make a huge


splash, and you do occasionally see incredible uses for AI in real
life, most uses for AI are mundane and even boring.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Specifying the limits of machine learning

Machine learning relies on algorithms to analyze huge datasets.


Currently, machine learning can’t provide the sort of AI that the
movies present. Even the best algorithms can’t think, feel, display
any form of self-awareness, or exercise free will. What machine
learning can do is perform predictive analytics far faster than any
human can.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

A true AI will eventually occur when computers can finally emulate the
clever combination used by nature:

» Genetics: Slow learning from one generation to the next

» Teaching: Fast learning from organized sources

» Exploration: Spontaneous learning through media and interactions with


others
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

You need to consider three important limits:

» Representation: Representing some problems using mathematical functions isn’t


easy, especially with complex problems like mimicking a human brain.

» Overfitting: Machine learning algorithms can seem to learn what you care about, but
they actually don’t.

» Lack of effective generalization because of limited data: The algorithm learns


what you teach it. If you provide the algorithm with bad or weird data, it behaves in an
unexpected way.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Considering How to Learn from Data

Everything in machine learning revolves around algorithms. An


algorithm is a procedure or formula used to solve a problem. The
problem domain affects the kind of algorithm needed, but the
basic premise is always the same: to solve some sort of problem,
such as driving a car or playing Dominoes.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

You can divide machine learning algorithms into three main


groups, based on their purpose:

» Supervised learning

» Unsupervised learning

» Reinforcement learning
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Supervised learning

Supervised learning occurs when an algorithm learns from


example data and associated target responses that can consist of
numeric values or string labels, such as classes or tags, in order to
later predict the correct response when given new examples. The
supervised approach is similar to human learning under the
supervision of a teacher.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Unsupervised learning

Unsupervised learning occurs when an algorithm learns from plain


examples without any associated response, leaving the algorithm
to determine the data patterns on its own. This type of algorithm
tends to restructure the data into something else, such as new
features that may represent a class or a new series of
uncorrelated values.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Reinforcement learning

Reinforcement learning occurs when you present the algorithm


with examples that lack labels, as in unsupervised learning.
However, you can accompany an example with positive or
negative feedback according to the solution the algorithm
proposes.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Taking Many Different Roads to Learning

Just as human beings have different ways to learn from the world,
so the scientists who approached the problem of AI learning took
different routes. Each one believed in a particular recipe to mimic
intelligence. Up to now, no single model has proven superior to
any other. The no free lunch theorem of having to pay for each
benefit is in full effect.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Discovering five main approaches to AI learning

An algorithm is a kind of container. It provides a box for storing a


method to solve a particular kind of a problem. Algorithms process
data through a series of well-defined states. The states need not be
deterministic, but the states are defined nonetheless. The goal is to
create an output that solves a problem.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Symbolic reasoning

One of the earliest tribes, the symbolists, believed that knowledge could be
obtained by operating on symbols (signs that stand for a certain meaning or
event) and deriving rules from them. By putting together complex systems
of rules, you could attain a logic deduction of the result you wanted to know,
thus the symbolists shaped their algorithms to produce rules from data. In
symbolic reasoning, deduction expands the realm of human knowledge,
while induction raises the level of human knowledge.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Connections modelled on the brain’s neurons

The connectionists are perhaps the most famous of the five tribes. This
tribe strives to reproduce the brain’s functions by using silicon instead
of neurons. Essentially, each of the neurons (created as an algorithm
that models the real- world counterpart) solves a small piece of the
problem, and using many neurons in parallel solves the problem as a
whole.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Evolutionary algorithms that test variation

The evolutionaries rely on the principles of evolution to solve problems. In


other words, this strategy is based on the survival of the fittest (removing
any solutions that don’t match the desired output). A fitness function
determines the viability of each function in solving a problem. Using a tree
structure, the solution method looks for the best solution based on function
output.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Bayesian inference

A group of scientists, called Bayesians, perceived that uncertainty


was the key aspect to keep an eye on and that learning wasn’t
assured but rather took place as a continuous updating of previous
beliefs that grew more and more accurate.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Systems that learn by analogy

The analogyzers use kernel machines to recognize patterns in data. By


recognizing the pattern of one set of inputs and comparing it to the pattern
of a known output, you can create a problem solution. The goal is to use
similarity to determine the best solution to a problem. It’s the kind of
reasoning that determines that using a particular solution worked in a
given circumstance at some previous time; therefore, using that solution for
a similar set of circumstances should also work.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Delving into the three most promising AI learning approaches

Later sections in this chapter explore the nuts and bolts of the core
algorithms chosen by the Bayesians, symbolists, and connectionists. These
tribes represent the present and future frontier of learning from data
because any progress toward a human-like AI derives from them, at least
until a new breakthrough with new and more incredible and powerful
learning algorithms occurs.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Awaiting the next breakthrough

In the 1980s, as expert systems ruled the AI scenery, most


scientists and practitioners deemed machine learning to be a
minor branch of AI that was focused on learning how to best
answer simple predictions from the environment (represented by
data) using optimization.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Exploring the Truth in Probabilities

Some websites would have you believe that statistics and machine
learning are two completely different technologies. Statistics
often use probabilities — which are a way to express uncertainty
regarding world events — and so do machine learning and AI (to
a larger extent than pure statistics).
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Determining what probabilities can do

Probability tells you the likelihood of an event, and you express it


as a number. For instance, if you throw a coin in the air, you don’t
know whether it will land as heads or tails, but you can tell the
probability of both outcomes. The probability of an event is
measured in the range from 0 (no probability that an event
occurs) to 1 (certainty that an event occurs).
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Considering prior knowledge

Probability makes sense in terms of time and space, but some other
conditions also influence the probability you measure. The context is
important. When you estimate the probability of an event, you may
(sometimes wrongly) tend to believe that you can apply the probability that
you calculated to each possible situation. The term to express this belief is a
priori probability, meaning the general probability of an event.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Conditional probability and Naïve Bayes

You can view cases such as the gender-related ones mentioned in


the previous section as conditional probability, and express it as
p(y|x), which you read as the probability of event y happening
given that x has happened. Conditional probabilities are a very
powerful tool for machine learning and AI.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Considering Bayes’ theorem

Apart from being a Presbyterian minister, the Reverend Thomas


Bayes was also a statistician and philosopher who formulated his
theorem during the first half of the eighteenth century. The
theorem was never published while he was alive.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Envisioning the world as a graph

Bayes’ theorem can help you deduce how likely something is to


happen in a certain context, based on the general probabilities of
the fact itself and the evidence you examine, and combined with
the probability of the evidence given the fact. Seldom will a single
piece of evidence diminish doubts and provide enough certainty in
a prediction to ensure that it will happen.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Growing Trees that Can Classify

A decision tree is another type of key algorithm in machine


learning that influences AI implementation and learning. Decision
tree algorithms aren’t new, but do have a long history. The first
algorithm of their kind dates back to the 1970s (with many
ensuing variants).
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Predicting outcomes by splitting data

If you have a group of measures and want to describe them using a single
number, you use an arithmetic mean (summing all the measures and
dividing by the number measures). In a similar fashion, if you have a group
of classes or qualities (for instance, you have a dataset containing records
of many breeds of dogs or types of products), you can use the most frequent
class in the group to represent them all, which is called the mode.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Making decisions based on trees

As an example of decision tree use, this section uses the same Ross Quinlan data-
set discussed in the “Envisioning the world as a graph” section, earlier in the
chapter. Using this dataset lets us present and describe the ID3 algorithm, a special
kind of decision tree found in the paper “Induction of Decision Trees,” mentioned
previously in this chapter. The dataset is quite simple, consisting of only 14
observations relative to the weather conditions, with results that say whether
playing tennis is appropriate.
WEEK 9: PERFORMING DATA ANALYSIS FOR AI AND EMPLOYING MACHINE LEARNING IN AI

Pruning overgrown trees

Even though the play tennis dataset in the previous section illustrates
the nuts and bolts of a decision tree, it has little probabilistic appeal
because it proposes a set of deterministic actions (it has no conflicting
instructions). Training with real data usually doesn’t feature such sharp
rules, thereby providing room for ambiguity and the likelihood of the
hoped-for outcome.

You might also like