You are on page 1of 24

Add your department logo if you

have it here

MODIBBO ADAMA UNIVERSITY OF TECHNOLOGY YOLA

MASTER OF COMPUTER SCIENCE

SEMESTER:

COURSE NAME:

MODULE CODE:

ASSIGNMENT TYPE: INDIVIDUAL

SUBMISSION DATE: 17th -05-2019


Table of Contents
introduction.......................................................................................................................................4

Themes Of Artificial Intelligence .....................................................................................................5

Statistical Analysis Package .........................................................................................................7

Examples Of Statistics Analysis Packages: ....................................................................................7

Reasons Why Statistical Analysis Package Is Not Artificial Intelligence Program ...............................9

Artificial Intelligence Research In Statistics Analysis Software .............................................................9

The Rx Project. .........................................................................................................................10

Other Projects. .........................................................................................................................12

Rss Meeting On Expert Systems In Statistics .............................................................................. 13

Challenges For Ai In Statistics Analysis Packages ........................................................................ 13

Future Directions ......................................................................................................................15

Ai Program, Machine Learning And The Statistics Analysis Techniques .............................................. 16

Relationship Between Applied Statistics And Machine Learning ................................................18

Ai Program Vs. Statistics ........................................................................................................... 18

Key Differences Between Machine Learning And Statistics .........................................................19

Comparison Table Between Machine Learning And Statistics .....................................................20

Both Machine Learning And Statistics Have The Same Objective; ............................................... 21

They’re Related, Sure. But Their Parents Are Different. ..............................................................22

Data Science, Big Data And Data Analytic In Relation To Statistical Analysis Package And Ai Program.
....................................................................................................................................................23

Works Cited .............................................................................................Error! Bookmark not defined.


ABSTRACT

Artificial intelligence is a theory, the base object is the agent who is the “actor”. It is
realized in software while Machine learning is a subset of AI. Many machine learning tools build
on statistical methods that are familiar to most researchers. These include extending linear
regression models to deal with potentially millions of inputs, or using statistical techniques to
summarize a large dataset for easy visualization.

Therefore, Statistical Analysis Package isn’t an AI program because statistics as a domain


may be of particular interest to AI researchers, for it offers both tasks well suited to current AI
capabilities and tasks requiring development of new AI techniques. The existence of extensive
statistical packages suggests that AI contributions to data analysis will be intelligent interfaces, an
area of little development. These interfaces could provide guidance, interpretation, and instruction,
which are needed by novice users and are not available in current packages.

In addition, Like AI, statistics is a study of tools for use in other domains. This attribute
implies that knowledge from three domains impinges on intelligent statistical analysis systems,
AI, statistics, and the “ground” domain. While current systems suggest it is possible to rely on a
user to supply ground domain knowledge, more intelligent treatment of the ground domain will
require development of knowledge representation and acquisition techniques. This is a key
challenge to AI techniques offered by the statistics domain.
Introduction

Researchers in computer science and statistics have developed advanced techniques to


obtain insights from large disparate data sets. Data may be of different types, from different
sources, and of different quality (structured and unstructured data). These techniques can leverage
the ability of computers to perform tasks, such as recognizing images and processing natural
languages, by learning from experience. The application of computational tools to address tasks
traditionally requiring human sophistication is broadly termed ‘artificial intelligence’ (AI). As a
field, AI has existed for many years. However, recent increases in computing power coupled with
increases in the availability and quantity of data have resulted in a resurgence of interest in
potential applications of artificial intelligence (EBA, 2016). These applications are already being
used to diagnose diseases, translate languages, and drive cars; and they are increasingly being used
in the financial sector as well.

There are many terms that are used in describing this field, so some definitions are needed
before proceeding. ‘Big data’ is a term for which there is no single, consistent definition, but the
term is used broadly to describe the storage and analysis of large and/or complicated data sets
using a variety of techniques including AI. Barker et al., (2013) Analysis of such large and
complicated datasets is often called ‘big data analytics.’ A key feature of the complexity relevant
in big data sets analytics often relates to the amount of unstructured or semi-structured data
contained in the datasets.

Many machine learning tools build on statistical methods that are familiar to most
researchers. These include extending linear regression models to deal with potentially millions of
inputs, or using statistical techniques to summarize a large dataset for easy visualization. Yet
Machine learning frameworks are inherently more flexible; patterns detected by machine learning
algorithms are not constrained to the linear relationships that tend to dominate economic and
financial analysis. In general, machine learning deals with (automated) optimization, prediction,
and categorization, not with causal inference. In other words, classifying whether the debt of a
company will be investment grade or high yield one year from now could be done with machine
learning. However, determining what factors have driven the level of bond yields would likely not
be done using machine learning in computer science, artificial intelligence (AI), sometimes
called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural
intelligence displayed by humans and animals. Computer science defines AI research as the study
of "intelligent agents": any device that perceives its environment and takes actions that maximize
its chance of successfully achieving its goals. Poole et al., (1998) colloquially, the term "artificial
intelligence" is used to describe machines that mimic "cognitive" functions that humans associate
with other human minds, such as "learning" and "problem solving" (Russell, 2009)
As machines become increasingly capable, tasks considered to require "intelligence" are often
removed from the definition of AI, a phenomenon known as the AI effect. A quip in Tesler's
Theorem says "AI is whatever hasn't been done yet." (Maloof, 2017). For instance, optical
character recognition is frequently excluded from things considered to be AI, having become a
routine technology. (Rodger, 1991) Modern machine capabilities generally classified as AI include
successfully understanding human speech, (Russell, 2009). Competing at the highest level
in strategic game systems (such as chess and Go) competing at the highest level in strategic
game systems (such as chess and Go),autonomously operating cars, intelligent routing in content
delivery networks, and military simulations.

Figure 1: Optical Character Recognition source (https://biovolttechblog.com/2017/06/07/ocr-


optical-character-recognition/)

Themes of Artificial Intelligence

The main advances over the past sixty years have been advances in search algorithms,
machine learning algorithms, and integrating statistical analysis into understanding the world at
large. However most of the breakthroughs in AI aren’t noticeable to most people. Rather than
talking machines used to pilot space ships to Jupiter, AI is used in more subtle ways such as
examining purchase histories and influence marketing decisions.
According to Musa, (2017), stated that Artificial Intelligence (AI) is the science of using
machines or software programs to make intelligent decisions, in order to make human lives more
comfortable and safer. The word “artificial” signifies a code or a programmed machine, and the
word “Intelligence” refers to the ability to make decisions.

The AI field encompasses machine learning, deep learning, Robotic


Process Automation (RPA), Internet of Things (IoT), big data analytics, text and data analytics,
robotics, cognitive knowledge, and many other tools and solutions. An AI example would be a
robot/machine detecting a chemical spill, recognizing the type of chemical, realizing the
hazardous nature of the substance, and acting by either contacting the authorized official to
mitigate the risk or cleaning the spill and the hazardous material, as programmed. The public
opinion about AI is that robots will take our jobs and they will surpass our capabilities. Indeed,
robots are going to play essential roles in our lives, and they will become widely spread in our
lives; however, they are not going to take over our jobs or make us obsolete. AI and robots are
tools and solutions to automate processes, simplify tasks, and increase efficiency.

During the 1980s, people were afraid of computers, and the widely shared perception was
that computers would take over our jobs. About 40 years later, computers are the essential
elements/tools in organizations - without computers, organizations cannot compete. Computers
are creating jobs, and now the IT industry represents about 18% of the job industry. Similarly,
robots and AI will create jobs. I argue that for every job a robot takes away, four additional jobs
get created – an engineer to manufacture the robots’ parts, a computer scientist to program them,
a technician to assemble them, and an engineer to maintain them.

AI and robots are coming and will play a significant role in our lives. Robots are not
going to take over our jobs or surpass our capabilities. Robots and AI techniques are merely
used to ensure our safety, where vehicle crashes or diseases can be predicted with high accuracy.
Robots will be used to assist us in getting our homes cleaned, buying groceries, mowing grass,
washing cars, helping with medication, and even as our bodyguards. AI will undoubtedly ignite
a smart government revolution – a government that harnesses data science to make intelligent
decisions. A critical goal of using AI solutions is to help organizations to learn more with fewer
data and deliver better results in less time. AI is merely a critical element of a smart community,
and its goal is to simplify tasks, increase efficiency, and save lives.
Statistical Analysis Package

Statistical Analysis Package also called Statistical Analysis software packages comprised
of tools designed to facilitate a qualitative approach to qualitative data, which include texts,
graphics, audio or video. These packages (sometimes referred as CAQDAS - Computer
Assisted/Aided Qualitative Data Analysis) may also enable the incorporation of quantitative
(numeric) data and/or include tools for taking quantitative approaches to qualitative data
.
Statistical software are programs which are used for the statistical analysis of the collection,
organization, analysis, interpretation and presentation of data. In a world full of uncertainty,
business statistics play a significant role in business and helps managers make an informed
decision in disciplines such as quality assurance, production, and operations, financial analysis,
auditing and econometrics among others. Business managers need to collect, analyze and make
inferences from a vast amount of data. Business statistics help them discover patterns and trends
of customers and other useful information that help them make decisions. Business statistics also
help businesses managers measure the performance of the workers as well as improve the products
and services produced. Statistics also helps the managers forecast as well as make correct
predictions about what could occur to the industry in future. Business statistics also plays a massive
role in measuring the financial position of the company.

Examples of Statistics Analysis Packages:

The below Section explains the most wieldy used statistical tools:

i. SPSS
A general-purpose statistical package widely used in academic research for editing, analyzing
and presenting numerical data. It is compatible with all file formats that are commonly used for
structured data such as Excel, plain text files and relational (SQL) databases. The package is
available to use in the Social Science Library Data area, or from the IT Services Shop.

ii. STATA
A powerful and flexible general-purpose statistical software package used in research, among
others in the fields of economics, sociology, political science. It's capabilities include data
management, statistical analysis, graphics, simulations, regression, and custom programming.
STATA is available to eligible students and staff in departments and centres in the Manor Road
Building (MRB); to be eligible you must be nominated by your department/centre. Students can
also purchase STATA at a reduced cost for their own devices from the supplier Timberlake.

iii. R
A free software environment for statistical computing and graphics. It compiles and runs on a
wide variety of UNIX platforms, Windows and MacOS. R provides a wide variety of statistical
(linear and nonlinear modelling, classical statistical tests, time-series analysis, classification,
clustering, etc.) and graphical techniques, and is highly extensible. R is freely available online.

iv. NVivo
A qualitative data analysis (QDA) computer software package produced by QSR International.
It has been designed for qualitative researchers working with very rich text-based and/or
multimedia information, where deep levels of analysis on small or large volumes of data are
required. NVivo is installed on PCs in the SSL Data Area; also available from IT services shop.

v. MAXQDA
An alternative to Nvivo and handles a similar range of data types allowing organization, color
coding and retrieval of data. Text, audio or video may equally be dealt with by this software
package. A range of data visualization tools are also included. Trial licenses available
from MAXQDA

vi. Atlas.ti
Software for the qualitative analysis of large bodies of textual, graphical, audio and video data.
It offers a variety of tools for accomplishing the tasks associated with any systematic approach to
"soft" data, i.e. material which cannot be analyzed by formal, statistical approaches in meaningful
ways.
Justification on why Statistical Analysis Package is not Artificial Intelligence Program

Statistical analysis packages like SPSS, MathLab, SAS e.t.c are not AI programs because
statistics is traditional field, broadly defined as a branch of mathematics dealing with data
collection, organization, analysis, interpretation and presentation. While according to (Apps, 2017)
AI refers to the simulation of a human brain function by machines. This is achieved by creating an
artificial neural network that can show human intelligence. The primary human functions that an
AI machine performs include logical reasoning, learning and self-correction. AI is a wide field of
with many applications but it also one of the most complicated technology to work on. Machines
inherently are not smart and to make them so, we need a lot of computing power and data to
empower them to simulate human thinking.

Therefore, this brought about the “Machine Learning” which is a field of AI that uses
statistical techniques to give computer system the ability to “learn” (e.g., progressive improve
performance on specific task) from the data without being explicitly programmed. Since machine
language uses “Statistical techniques” it can be easily be construed as branded statistics but the
way statistics is used by the statisticians is different than the way it is used by the machine learning
community.

Artificial Intelligence Research in Statistics Analysis Software

According to Pregibon, (1984), stated that the initial results from a few AI research projects
in statistics have been quite interesting to statisticians: Feasibility demonstration systems have
been built at Stanford University, AT&T Bell Laboratories, and the University of Edinburgh.
Several more design studies have been completed. A conference devoted to expert systems in
statistics was sponsored by the Royal Statistical Society. On the other hand, statistics as a domain
may be of particular interest to AI researchers, for it offers both tasks well suited to current AI
capabilities and tasks requiring development of new AI techniques.

Statisticians do a variety of tasks, some of which can now be assisted by expert system
techniques. One common task is data analysis-application of statistical tools to a particular set of
data to reach some conclusion. Another is experiment design-planning the collection of data so
that it can be easily analyzed to reach a conclusion. These tasks are frequently done in a consulting
environment. Expert system techniques have recently been applied to these two tasks.
The statistics discipline has been developing for the past 100 years, most dramatically since
the widespread availability of modern computers. Some of the knowledge developed by research
statisticians is regularly taught-for example the lore of the normal distribution, student’s t-test,
analysis of variance methods, and linear regression analysis. But some statistical knowledge is not
yet so formalized- specifically how these and other methods are chosen and applied to analyze
data in practice (which process we call strategy).

The formalized knowledge indicates depth in the domain, while the informal knowledge
indicates an opportunity for AI techniques. Specifically, it appears that expert system techniques
can provide a framework for formalizing strategies of data analysis, thereby opening the subject
to research by statisticians. This prospect has excited many statisticians.

The existence of extensive statistical packages suggests that AI contributions to data


analysis will be intelligent interfaces, an area of little development. These interfaces could
provide guidance, interpretation, and instruction, which are needed by novice users and are not
available in current packages.

Like AI, statistics is a study of tools for use in other domains. This attribute implies that
knowledge from three domains impinges on intelligent statistical analysis systems-AI, statistics,
and the “ground” domain. While current systems suggest it is possible to rely on a user to supply
ground domain knowledge, more intelligent treatment of the ground domain will require
development of knowledge representation and acquisition techniques. This is a key challenge to
AI techniques offered by the statistics domain.

The next three sections review specific projects applying AI methods in statistics. A section
reviews the talks at the RSS meeting, and the last two sections focus on the challenges to AI
methods posed by this domain.

The Rx Project.
Robert L. Blum (1982, 1983) has been leading the RX project at the Stanford Heuristic
Programming Project. RX aims to design and perform statistical analyses in medicine
to establish causal relationships. The project requires:
i. Interfacing AI software to a large data base as well as to a statistics package
ii. Representing knowledge in medicine and statistics
iii. Developing techniques for:
- automatic hypothesis generation
- study design
-data analysis.

A feasibility demonstration system has been constructed using a subset of a database


collected by the American Rheumatism Association and an interface to IDL (Kaplan,
1978). The study design module is the most elaborated, using stored knowledge of confounding
relationships to derive a confirmatory test for a proposed causal hypothesis. The discovery module
and data analysis module are relatively weak. A key feature of the system is its ability to
incorporate newly confirmed causal relationships into the knowledge base.

REX.
(Gale. & Pregibon, 1982) have built a Regression Expert [REX] system at AT&T Bell
Laboratories, continuing preliminary work (Chambers, 1981; Chambers, Pregibon, and Zayas,
1981). The goal of REX is to allow novices to perform regression analyses safely by using
advanced tools to detect and correct violations of assumptions made by standard techniques. The
project requires:
i. interacting with a novice statistician
ii. interfacing to a statistics package
iii. representing knowledge in statistics and developing techniques in;
~ statistical strategy
~ interpretation of results
~ tutoring
A feasibility demonstration system has been constructed using the system (Becker and
Chambers, 1984) as the underlying statistical package. Statistical knowledge is represented in a
frame-based system modeled after Centaur (Aikins, 1980). A strategy has been implemented for
regression analysis, an area of great interest to statisticians. The strategy implemented focuses on
checking the assumptions implicit in an initial model, and suggesting changes to data, model, or
fitting method as may be appropriate when an assumption fails. The interpretation of results and
tutoring are relatively weak.
The areas covered by RX and REX are complementary at this point. REX does no design;
RX runs its tests blindly.
Other Projects.

Richard O’Keefe (1982) built the Automated Statistical Analyst [ASA] system at
Edinburgh University as part of his thesis work. ASA is designed to help a client analyze
an experiment he has already designed but has not yet performed. It assumes a particularly
unhelpful (ignorant) user and attempts to guide him to an appropriate nonparametric analysis.
Neil C. R.owe (1982), Stanford Department of Computer Science, considered how one
might abstract a statistical data base. His goal was to provide approximate answers to statistical
queries of a large data base without actually accessing the data base. He developed means of
forming and updating an abstract of a data base, and ways to reason with the abstract to provide
answers with a stable accuracy.

George Lawton (1982), U.S. Army Research Institute, has designed an intelligent interface
to SAS (Helwig, 1979) as an initial step in a projected Computer Assisted Research Design System
[CARDS]. CARDS is a long-term project. The initial step is a tool to assist users in quantitative
research integration-standardization and comparison of statistical results from different studies.
A system built by Jean Louis Roos (1983), University of Aix en Province, shows heavy
use of ground domain knowledge. The system guides the economist user in the construction of an
econometric model with consistent economic assumptions for all equations. The model is then
estimated by a standard statistical package without feedback to the expert system.

Petr Hajek and Tomas Havranek (1982), Czechoslovak Academy of Science, have
described requirements for GUHA-80. Their aim is to develop interesting views of empirical
data. While they expect to have a statistical package available to the program for hypothesis
testing, the emphasis is on hypothesis formation. The intention is to try automatic hypothesis
formation ideas, as in Lenat’s (Davis and Lenat,1982) AM program.

Gerald J. Hahn (1983), General Electric, prepared a review of statistical consultation


systems for the 1983 meeting of the American Statistical Association The review included an
example strategy for product life data, including flow charts

David J. Hand (1983), London Institute of Psychiatry, has written on the requirements of
an expert system in statistical consultation. He relates observations of actual statistical consultation
processes, discusses differences between medical and statistical requirements, and gives a useful
list of attributes for a statistical consultation system.
RSS Meeting On Expert Systems in Statistics

The most overt sign of interest among statisticians has been a meeting convened by the
Royal Statistical Society on Expert Systems in Statistics, held in London on Saturday, October 22,
1983.
Two of the talks dealt with using standard programming methods to make much more
helpful interfaces to existing software packages. A. Baines, University of Leeds, has begun work
on these lines, and G. B. Wcthcrill, Department Chairman, University of Kent, has made progress
along these lines. Their work shows the perceived need for assistance in managing today’s
statistical packages and will provide a comparison point for the value of using AI techniques in
rendering that assistance.
The remaining four talks were based on various experiences with AI techniques. Alan
Bundy, TJ university of Edinburgh, speaking first and representing AI research more than any
other speaker, spent time on survey material, then described his work on Mecho and Richard
O’Kcefe’s work on the Automated Statistical Analyst. Daryl Pregibon spoke on our experience
using REX to develop a strategy for regression analysis. David Hand, London Institute of
Psychiatry, presented his views on design criteria for expert systems in statistics based on a
carefully reasoned appraisal of differences between statistics and other areas where expert, systems
have been used. J. Fox, Imperial Cancer Research Fund, addressed inference techniques used in
existing expert systems (i.e., Statistics in AI). Hc described knowledge based techniques used in
medicine and also the use of qua-statistical techniques in decision making. His defense of non-
Bayesian techniques was roundly attacked and stoutly maintained.

Challenges for AI in Statistics Analysis Packages

Analysis of data is done entirely with computer tools now that interactive graphical
software is widely available. Many statistical packages exist, including large ones such as SAS
and SASS (Hull, 1981) and smaller ones such as Minitab (Ryan, 1981). Special mention should
be made of IDL, since it is written in Interlisps and may therefore be particularly easy for some AI
projects to use. IDL is a relatively small package, however these packages are widely used and, in
the eyes of statisticians, abused. The abuse is based on ignorance of statistics and provides an
opportunity for the application of expert system techniques.
For the expert statistician, current packages are very helpful. They can compute a large
variety of statistics for a data set, manage the storage of all data sets, provide nice graphical
displays easily, and be extended easily as a new statistical methodology is developed. They require
little Programming expertise from the user.

But for the novice, current packages are lacking important features. They provide only
numbers, not interpretations. They provide no guidance on to do next, or what should have been
done before. Moreover, they provide no instruction. These are as guidance, interpretation, and
instruction provide feasible targets for application of natural language techniques and instructional
techniques as well as expert system techniques.

The existence of these large and powerful statistical packages provides an opportunity and
a challenge to AI. The addition of an interface to existing software is always a high leverage
opportunity. The AI challenge is to provide access to the software while understanding what is
being done. The problem is that an expert human can easily do things that have a purpose, but
whose purpose is obscure.

Another research direction in statistical packages will test the value of explicit knowledge
representation. This direction uses standard techniques to make the packages easy to use. The
UKCREG program being developed under direction of G. B. Wetherill (1982) is a good example.
This system uses menus to show what can be done and to indicate why it might be good to do
something. It responds with an interpretation of test results rather than a number.
However, it does not keep the knowledge of the tests made for its own use and it does not suggest
actions to the user. This program can certainly be called “friendly,” if not “intelligent.”
Since statistics is a study of tools for use in other domains, statistical methods can be used
to greater effect when informed by domain knowledge, just as AI has found power by restricting
attention to particular domains. Therefore applications of AI in statistics may involve three
domains of knowledge-AI, statistics, and the “ground” domain. Approaches to the ground domain
have been varied. RX has taken medicine as a specific ground domain and has undertaken to
represent knowledge from this domain. Since this approach requires expertise in three areas and
since it may limit the applicability of the system most other projects have left the ground domain
unspecified.
If an interactive system is provided, the user can be expected to provide the required
ground domain knowledge. The apparent success of this approach by Rl3X and ASA suggests that
the statistics knowledge is fairly well closed, self-consistent, and separable from ground domain
knowledge. This is helpful for applications of current expert system technology.
A limitation of not using ground domain vocabulary is the necessity of using Statistics
vocabulary. This necessity in turn requires that the user be willing to learn statistics vocabulary
(and concepts) and that the system be prepared to provide such instruction. While this task may be
reasonable for technical workers, it will exclude most managers. Not using ground domain
knowledge and procedures may be fatal in some cases. A possible means to including ground
domain knowledge and procedures would assume a local statistician who could specialize a
statistics knowledgeable system to local ground domain knowledge, procedures, and vocabulary.
Providing a way to do this is a challenge to knowledge acquisition techniques including machine
learning. No analysis is complete without a written report. The capability of generating a report
from the trace of an analysis would be a useful area to apply natural language generation
techniques.

Once it is possible to consult with a user on several kinds of data analysis, the problem
will arise of deciding which analysis model the user should be using. In standard consultation
practice, this crucial step is usually accomplished by a lot of give and take as the analyst and the
consultant strive to understand each other. The analyst frequently does not understand the
categories available, while the consultant, may flounder in the ground domain vocabulary in which
the problem is presented.
Handling this process by computer will require substantial development of interactive
discourse techniques and substantial tolerance for erroneous inputs.

Theoretical analysis of tests seems to be a longer range goal for AI techniques.


Contributions here will require representing substantial amounts of knowledge in mathematical
statistics. Some short-range progress might be made in automating studies done by Monte Carlo
techniques.

Future Directions

Many sub-domains of data analysis exist, e.g., regression analysis and analysis of variance.
Developing strategies in these areas will be one future activity for statisticians. They can be
expected to require additional AI techniques as well.

Future consultation systems should include both an experiment design phase and a data
analysis phase. It is necessary to accommodate experiments designed without the systems aid,
however, and since the users’ understanding of the design may be poor, this task will be difficult.
No analysis is complete without a report. The reports now generated are crude and
mechanical. Data structures exist to support far more polished reports if natural language
generation capabilities are used.

Research at AT&T Bell Laboratories will include an examination of whether a statistician


can develop a strategy without involving a “knowledge engineer.” Since data analysis is done
entirely on a computer, it may be possible to devise a system that can learn by watching and
questioning, compiling a strategy as an output. We are exploring the possibilities with a system
named Student.
In summary, we should look forward to increasing activity by both statisticians and
Artificial Intelligence researchers applying AI methods in Statistics.

Ai Program, Machine Learning and the Statistics Analysis Techniques

Artificial intelligence (AI) “will pervade the show,” says Gary Shapiro, chief executive of
the Consumer Technology Association. One hundred and thirty years ago today (January 8, 1889),
Herman Hollerith was granted a patent titled “Art of Compiling Statistics.” The patent
described a punched card tabulating machine which heralded the fruitful marriage of statistics and
computer engineering—called “machine learning” since the late 1950s, and reincarnated today as
“deep learning,” or more popularly as “artificial intelligence.”
Commemorating IBM’s 100 anniversary in 2011, The Economist wrote:
In his patent application, Hollerith explained the usefulness of his machine in the context
of a population survey and the statistical analysis of what we now call “big data”:
“The returns of a census contain the names of individuals and various data relating to such
persons, as age, sex, race, nativity, nativity of father, nativity of mother, occupation, civil
condition, etc. These facts or data I will for convenience call statistical items, from which items
the various statistical tables are compiled. In such compilation the person is the unit, and the
statistics are compiled according to single items or combinations of items it may be required to
know the numbers of persons engaged in certain occupations, classified according to sex, groups
of ages, and certain nativities. In such cases persons are counted according to combinations of
items. A method for compiling such statistics must be capable of counting or adding units
according to single statistical items or combinations of such items. The labor and expense of such
tallies, especially when counting combinations of items made by the usual methods, are very great.
In 1959, Arthur Samuel experimented with teaching computers how to beat humans in Commented [D1]: Reference is needed.

chess, calling his approach “machine learning. Later applied successfully to modern challenges
such as spam filtering and fraud detection, the machine-learning approach relied on statistical
procedures that found patterns in the data or classified the data into different buckets, allowing the
computer to “learn” (e.g., optimize the performance, accuracy of a certain task) and “predict” (e.g.,
classify or put in different buckets) the type of new data that is fed to it. Entrepreneurs such as
Norman Nie (SPSS) and Jim Goodnight (SAS) accelerated the practical application of
computational statistics by developing software programs that enabled the widespread use of
machine learning and other sophisticated statistical analysis techniques.
Over the years, the popularity of “neural networks” have gone up and down a number of Commented [D2]: You can just say something with out
citing where u get it . you most cite
hype cycles, starting with the Perceptron, a 2-layer neural network that was considered by the US
Navy to be "the embryo of an electronic computer that will be able to walk, talk, see, write,
reproduce itself and be conscious of its existence." In addition to failing to meet these lofty
expectations-similar in tone to today’s perceived threat of “super-intelligence” neural networks
suffered from a fierce competition from the academics who coined the term “artificial
intelligence”
In 1955 and preferred the manipulation of symbols rather than computational statistics as
a sure path to creating a human-like machine. Commented [D3]: Cite a reference

It didn’t work and “AI Winter” set in. With the invention and successful application of
“backpropagation” as a way to overcome the limitations of simple neural networks, statistical
analysis was again on the ascendance, now cleverly labeled as “deep learning.” In Neural Networks
and Statistical Models (1994), Warren Sarle explained to his worried and confused fellow
statisticians that the ominous-sounding artificial neural networks “are nothing more than nonlinear
regression and discriminant models that can be implemented with standard statistical software…
like many statistical methods, [artificial neural networks] are capable of processing vast amounts
of data and making predictions that are sometimes surprisingly accurate; this
does not make them “intelligent” in the usual sense of the word. Artificial neural networks “learn”
in much the same way that many statistical algorithms do estimation, but usually much more
slowly than statistical algorithms. If artificial neural networks are intelligent, then many statistical
methods must also be considered intelligent.
Sarle provided his colleagues with a handy dictionary translating the terms used by “neural
engineers” to the language of statisticians (e.g., “features” are “variables”). In anticipation of
today’s “data science” and predictions of algorithms replacing statisticians (and even scientists are
nothing more than nonlinear regression and discriminant models that can be implemented with
standard statistical software… like many statistical methods, [artificial neural networks] are
capable of processing vast amounts of data and making predictions that are sometimes surprisingly
accurate; this does not make them “intelligent” in the usual sense of the word. Artificial neural
networks “learn” in much the same way that many statistical algorithms do estimation, but usually
much more slowly than statistical algorithms. If artificial neural networks are intelligent, then
many statistical methods must also be considered
“Neural engineers want their networks to be black boxes requiring no human intervention-
data in, predictions out. The marketing hype claims that neural networks can be used with no
experience and automatically learn whatever is required; this, of course, is nonsense. Doing a
simple linear regression requires a nontrivial amount of statistical expertise.

Relationship between applied Statistics and Machine Learning


The machine learning practitioner has a tradition of algorithms and a pragmatic focus on
results and model skill above other concerns such as model interpretability.

Statisticians work on much the same type of modeling problems under the names of applied
statistics and statistical learning. Coming from a mathematical background, they have more of a
focus on the behavior of models and explain ability of predictions.

The very close relationship between the two approaches to the same problem means that
both fields have a lot to learn from each other. The statisticians need to consider algorithmic
methods was called out in the classic “two cultures” paper. Machine learning practitioners must
also take heed, keep an open mind, and learn both the terminology and relevant methods from
applied statistics.

Ai Program vs. Statistics

Statistics is just about the numbers, and quantifying the data. There are many tools for
finding relevant properties of the data but this is pretty close to pure mathematics. Data Mining is
about using Statistics as well as other programming methods to find patterns hidden in the data
so that you can explain some phenomenon.

Data Mining builds intuition about what is really happening in some data and is still little
more towards math than programming, but uses both. Machine Learning uses Data Mining
techniques and other learning algorithms to build models of what is happening behind some data
so that it can predict future outcomes.
Math is the basis for many of the algorithms, but this is more towards programming. Artificial
Intelligence uses models built by Machine Learning and other ways to reason about the world and
give rise to intelligent behavior whether this is playing a game or driving a robot/car. Artificial
Intelligence has some goal to achieve by predicting how actions will affect the model of the world
and chooses the actions that will best achieve that goal. Very programming based.”

Key differences between Machine Learning and Statistics

Below are the lists of points, describe the key Differences between Machine Learning and
Statistics
I. Machine learning is a branch from the artificial intelligence which deals with the non-
human power in achieving the outcomes. Statistics is a subfield of mathematics where it is
about derivatives and probabilities inferred from the data.
II. Machine learning is one of the fields in the data science and statistics is the base for any
machine learning models. To build the model, one has to do the EDA (exploratory data
analysis) where statistics play a major role.
III. To build a model the initial stage is to do feature engineering that involves which attributes
to be used and which attributes gives results on providing the maximum likelihood. In order
to derive the right features, a correlation between the independent variables or data points
is important to identify.
IV. Machine learning vs statistics is not two different wide concepts. They both Machine
learning and statistics are associated with one another. Without statistics, one cannot build
a model and there is no reason just doing statistical analysis on the data. It leads to building
the model.
V. Even after building the model, to measure the performance and evaluate the results,
statistics come in and play a vital role. To measure the performance, there are many
evaluation metrics being built in the data science. One such is building confusion matrix
algebra where true positives, false negatives, true negatives and false positives are derived.
VI. In terms of the applications, machine learning and statistics are coupled in a way that one
leads to other.
VII. Statistical analysis and machine learning have collaborated in order to apply the data
science to the data problem or to get the insights from the data which leads to a higher
impact on the sales or business and marketing.
VIII. Machine learning is a branch of data science or analytics which leads to automation and
artificial intelligence. Statistics is a branch of mathematics where you apply these solutions
to the data which leads to predictive modeling etc.

COMPARISON TABLE BETWEEN MACHINE LEARNING AND STATISTICS

NO BASISOF COMPARSION MACHING LEARNING STATITICS


1. Machine learning is a set Statistics is a mathematical concept in
Definition of steps or rules fed by the finding the patterns from the data.
user where machine
understands and train by
itself
2. To Predict the future The relationship between the data points
Usage events or classify an
existing material
3. Supervised learning and Forecasting continuous variables,
Types unsupervised learning Regression, classification
4. Input-output Features and labels Data points
5. Correlation between the data points,
univariate, multivariable mathematic
Use cases For hypothesis
knowledge
6. Mathematics, Algorithms, Descriptive statistics, finding patterns,
Ease of use of Weather forecast, topic outliers in the data
applications modeling and Predictive
modeling.
7. Field Data analytics, Artificial Artificial intelligence, data science
intelligence research labs.
8. Stands out Predominant algorithms Derivatives, probabilities
and concepts like neural
networks
9. Keywords Linear regression, Random Covariance, univariate, multivariate,
forest, estimators, p-values, rmse
support vector machine,
neural
networks
Following are the lists of points that show the Comparisons between Machine Learning
and Statistics.

Table 1: Comparison Table between Machine Learning and Statistics

Many people have this doubt, what’s the difference between statistics and machine learning? Is
there something like machine learning vs. statistics?
From a traditional data analytics standpoint, the answer to the above question is simple;
i. Machine Learning is an algorithm that can learn from data without relying on rules-based
programming While Statistical modeling is a formalization of relationships between
variables in the data in the form of mathematical equations.
ii. Machine learning is all about predictions, supervised learning, unsupervised learning, etc.
while Statistics is about sample, population, hypothesis, etc
Two different critters, right? Well, let’s see if they are actually that different!

Both Machine Learning and Statistics have the same Objective;

According to Larry Wasserman said they are both concerned with the same question: how
do we learn from data?
Commented [D4]: Reference or source as figure 1 point
something e,g 1.1, 1,2 ,1.3 arrange them in order please

Figure 1. Learning from data Source ()


Robert Tibshirani, a statistician and machine learning expert at Stanford, calls machine learning
“glorified statistics."
Nowadays, both machine learning and statistics techniques are used in pattern recognition,
knowledge discovery and data mining. The two fields are converging more and more even
though the below figure may show them as almost exclusive.

Fig 1. : Venn diagram that shows Machine learning and Statistics are related. Source
(https://www.kdnuggets.com/2016/11/machine-learning-vs-statistics.html)

Machine learning and statistics share the same goal: learning from data. Both these methods
focus on drawing knowledge or insights from the data. But, their methods are affected by their
inherent cultural differences.

They’re related, sure. But their parents are different.

i. Machine learning is a subfield of computer science and artificial intelligence. It deals with building
systems that can learn from data, instead of explicitly programmed instructions.
ii. A statistical model, on the other hand, is a subfield of mathematics.
iii. Machine learning is comparatively a new field.
iv. Cheap computing power and availability of large amounts of data allowed data scientists to train
computers to learn by analyzing data. But, statistical modeling existed long before computers were
invented.
Moreover, Machine learning requires no prior assumptions about the underlying relationships
between the variables. You just have to throw in all the data you have, and the algorithm processes
the data and discovers patterns, using which you can make predictions on the new data set.
Machine learning treats an algorithm like a black box, as long it works. It is generally applied to
high dimensional data sets, the more data you have, the more accurate your prediction is.

In contrast, statisticians must understand how the data was collected, statistical properties
of the estimator (p-value, unbiased estimators), the underlying distribution of the population they
are studying and the kinds of properties you would expect if you did the experiment many times.
You need to know precisely what you are doing and come up with parameters that will provide
the predictive power. Statistical modeling techniques are usually applied to low dimensional data
sets.

Data Science, Big data and Data Analytic in relation to Statistical Analysis Package and Ai
Program.
(Rouse, 2014), Statistical Analysis is components of data analytics. In context of business
intelligence (BI), statistical analysis involves collecting and scrutinizing every data samples in a
set of items from which samples can be drawn using statistical analysis package. A sample, in
statistics, is a representative selection drawn from a total population.

The goal of statistical analysis is to identify trends. A retail business, for


example, might use statistical analysis to find patterns in unstructured and semi structured
customer data that can be used to create a more positive customer experience and increase sales.
The following are some of the trending field of computer science also;
i. Data Science
ii. Big Data
iii. Data Analytics
References Commented [D5]: Not apa format

apps, N. (2017). Artificial Intelligence VS Machne Learning VS Data Science.


Barker, Jonathan Stuart Ward and Adam. (2013). Undefined By Data: A Survey of Big Data Definitions.
Cornell University.
EBA, E. a. (JC 2016). European Joint Committee Discussion Paper on the Use of Big Data. United State: U.S.
Federal Trade Commission Report.
Maloof, M. (2017). Artificial Intelligence: An Introduction. Georgetown University: Washington, DC.
Musa, D. S. (2017, December). Artificial Intelligence. pp. 1-2.
Poole et al. (1998). Compututional Intelligence: A logical Approach. In Poole, Compututional Intelligence:
A logical Approach (p. 1). New York: Oxford University Press.
Pregibon, W. A. (1984). Artificial Intelligence Research in Statistics. AI Magazine , Volume 5 Number 4, 1-
4.
Rodger, C. S. (1991). Where's the AI:. AI Magazine, Vol `12 no 8 P 38.
Rouse, M. (2014). Business Intelligence-Business Analytics, 1-2.
Russell & Norving. (2009). Artificial Intelligence: A Mordern Approach (3rd Ed). Saddle River, New Jersey:
Prentice Hall.

You might also like