Amar Sahay - Business Analytics, Volume II - A Data Driven Decision Making Approach For Business-Business Expert Press (2019) PDF

B ig D ata , B usiness A nalytics , and S mart T echnology C ollection
SAHAY
Mark Ferguson, Editor
BUSINESS ANALYTICS,
BUSINESS ANALYTICS, VOLUME II VOLUME II
A Data-Driven Decision-Making Approach for Business
Amar Sahay, PhD A Data-Driven Decision-Making

This business analytics (BA) text discusses the models based on fact-based data to measure past business Approach for Business
performance to guide an organization in visualizing and predicting future business performance and outcomes.
It provides a comprehensive overview of analytics in general with an emphasis on predictive analytics. Given
the booming interest in analytics and data science, this book is timely and informative. It brings many terms,
tools, and methods of analytics together.
The first three chapters provide an introduction to BA, importance of analytics, types of BA–descriptive,
predictive, and prescriptive–along with the tools and models. Business intelligence (BI) and a case on
BUSINESS ANALYTICS, VOLUME II

descriptive analytics are discussed. Additionally, the book discusses the most widely used predictive models,
including regression analysis, forecasting, data mining, and an introduction to recent applications of predictive
analytics–machine learning, neural networks, and artificial intelligence. The concluding chapter discusses the
current state, job outlook, and certifications in analytics.
Dr. Amar Sahay is a professor of decision sciences engaged in teaching, research,

consulting, and training. He holds a BS in production engineering (BIT, India), MS in industrial
engineering, and a PhD in mechanical engineering--both from the University of Utah, USA.
He has taught and is teaching at several institutions in Utah, including the University of Utah
(School of Engineering and Management), SLCC, Westminster College, and others. Amar is
a certified Six Sigma Master Black Belt and is also lean manufacturing/lean management
certified. He has contributed a number of research papers in national and international journals/proceedings to
his credit. Amar has authored around 10 books in the areas of data visualization, business analytics, Six Sigma,
statistics and data analysis, modeling, and applied regression. He is also associated with QMS Global LLC, a
company engaged in data visualization, analytics, quality, lean six sigma, manufacturing, and systems analysis
services. Amar is a senior member of the Industrial & Systems Engineers, the American Society for Quality (ASQ),
and Data Science.
AMAR SAHAY, PhD

Business Analytics
Praise for Business Analytics
“In this second volume on business analytics, Dr. Sahay provides a
useful overview of analytics in general with an emphasis on predictive
analytics. Given the booming interest in analytics and data science, his
book is timely and informative. It brings many terms, tools, and methods
together in a meaningful way. It is common for practitioners and even
scholars to conflate terms such as business intelligence, data analytics,
and data mining. Dr. Sahay clarifies such terms and helps differentiate
their meanings. I found the glossaries at the end of the early chapters to
be especially useful in making sense of all of the terms that have emerged
recently and are often used interchangeably.
Being an expert on quality management and Six Sigma, Dr. Sahay also
incorporated quality tools into the analytics process, something that is
rare, but in my opinion extremely important and helpful. Moreover, his
treatment of the tools for predictive analytics not only explains the tools,
but goes a step further in clarifying when each should be used and how
the tools fit together. Such clarification is often presented in tabular form,
which makes it easy to refer back to whenever the information is needed.
Finally, I found the incorporation of practical examples to make the

concepts much more concrete. The implementation of the regression and
other tools in both Excel and Minitab makes them accessible to almost
all practitioners, which makes it much more likely that the reader will be
able to grasp the concepts and apply the tools in her or his own work.
I think that Business Analytics: A Data-Driven Decision-Making Approach
for Business Volume II (Predictive Analytics Model) will serve as a nice
introduction to anyone who wants an introduction to predictive business
analytics”.
—Don G. Wardell, Professor of Operations Management, David Eccles
School of Business, University of Utah, Salt Lake City, Utah
Business Analytics
A Data-Driven Decision-Making
Approach for Business
Volume II (Predictive Analytics)
Amar Sahay, PhD

Business Analytics: A Data-Driven Decision-Making Approach for Business,
Volume II (Predictive Analytics)
Copyright © Business Expert Press, LLC, 2020.
All rights reserved. No part of this publication may be reproduced, stored

in a retrieval system, or transmitted in any form or by any means—
electronic, mechanical, photocopy, recording, or any other except for
brief quotations, not to exceed 250 words, without the prior permission
of the publisher.
First published in 2020 by

Business Expert Press, LLC
222 East 46th Street, New York, NY 10017
www.businessexpertpress.com
ISBN-13: 978-1-63157-479-5 (paperback)

ISBN-13: 978-1-63157-480-1 (e-book)
Business Expert Press Big Data, Business Analytics, and Smart Technology
Collection
Collection ISSN: 2333-6749 (print)

Collection ISSN: 2333-6757 (electronic)
Cover and interior design by S4Carlisle Publishing Services Private Ltd.,

Chennai, India
Cover image licensed by Ingram Image, StockPhotoSecrets.com
First edition: 2020
10 9 8 7 6 5 4 3 2 1
Printed in the United States of America.

Dedication
This book is dedicated to
Priyanka Nicole
Our Love and Joy
Abstract
This business analytics (BA) text discusses the models based on fact-based
data to measure past business performance to guide an organization in
visualizing and predicting future business performance and outcomes.
It provides a comprehensive overview of analytics in general with an
emphasis on predictive analytics. Given the booming interest in analytics
and data science, this book is timely and informative. It brings many
terms, tools, and methods of analytics together. The first three chap-
ters provide an introduction to BA, importance of analytics, types of
BA—descriptive, predictive, and prescriptive—along with the tools and
models. Business intelligence (BI) and a case on descriptive analytics are
discussed. Additionally, the book discusses the most widely used predic-
tive models, including regression analysis, forecasting, data mining, and
an introduction to recent applications of predictive analytics—machine
learning, neural networks, and artificial intelligence. The concluding
chapter discusses the current state, job outlook, and certifications in
analytics.
Keywords
analytics; business analytics; business intelligence; data analysis; decision
making; descriptive analytics; predictive analytics; prescriptive analytics;
statistical analysis; quantitative techniques; data mining; predictive mod-
eling; regression analysis; modeling; time series forecasting; optimization;
simulation; machine learning; neural networks; artificial intelligence
Contents
Preface...................................................................................................xi
Acknowledgments.................................................................................xvii
Chapter 1 Business Analytics at a Glance............................................1
Chapter 2 Business Analytics and Business Intelligence.....................23
Chapter 3 Analytics, Business Analytics, Data Analytics, and
How They Fit into the Broad Umbrella of Business
Intelligence......................................................................33
Chapter 4 Descriptive Analytics—Overview, Applications,
and a Case........................................................................57
Chapter 5 Descriptive versus Predictive Analytics.............................71
Chapter 6 Key Predictive Analytics Models (Predicting Future
Business Outcomes Using Analytic Models).....................83
Chapter 7 Regression Analysis and Modeling..................................103
Chapter 8 Time Series Analysis and Forecasting..............................195
Chapter 9 Data Mining: Tools and Applications in Predictive
Analytics........................................................................239
Chapter 10 Wrap-Up, Overview, Notes on Implementation, and
Current State of Business Analytics................................263
Appendices..........................................................................................281
Additional Readings............................................................................ 373
About the Author.................................................................................377
Index..................................................................................................379
Preface
This book deals with business analytics (BA)—an emerging area in mod-
ern business decision making.
BA is a data-driven decision-making approach that uses statistical

and quantitative analysis, information technology, management sci-
ence (mathematical modeling and simulation), along with data min-
ing and fact-based data to measure past business performance to guide
an organization in business planning, predicting the future outcomes,
and effective decision making.
BA tools are also used to visualize and explore the patterns and trends
in the data to predict future business outcomes with the help of forecast-
ing and predictive modeling.
In this age of technology, companies collect massive amounts of data.
Successful companies view their data as an asset and use them to gain
a competitive advantage. These companies use BA tools as an organiza-
tional commitment to data-driven decision making. BA helps businesses
in making informed business decisions. It is also critical in automating
and optimizing business processes.
BA makes extensive use of data, statistical analysis, mathematical and
statistical modeling, and data mining to explore, investigate, and understand
the business performance. Through data, BA helps to gain insight and drive
business planning and decisions. The tools of BA focus on understanding
business performance based on the data. It uses a number of models derived
from statistics, management science, and operations research areas.
The BA area can be divided into different categories depending upon
the types of analytics and tools being used. The major categories of BA are:
• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
xii PREFACE
Each of the above categories uses different tools, and the use of these
analytics depend on the type of business and the operations a company
is involved in. For example, an organization may only use descriptive
analytics tools; whereas another company may use a combination of de-
scriptive and predictive modeling and analytics to predict future business
performance to drive business decisions.
The different types of analytics and the tools used in these analytics
are described below:
1. Descriptive analytics involves the use of descriptive statistics, in-

cluding graphical and numerical methods to describe the data.
Successful use and implementation of descriptive analytics requires
the understanding of types of data and visual/graphical techniques
using computer. The other aspect of descriptive analytics is an un-
derstanding of numerical methods including the measures of central
tendency, measures of position, measures of variation, and measures
of shape. Also, it requires the knowledge and understanding of dif-
ferent statistical measures and how statistics are used to summarize
and draw conclusions from the data. Some other topics of inter-
est are the understanding of empirical rule and the relationship
between two variables—the covariance and correlation coefficient.
The tools of descriptive analytics are helpful in understanding the
data, identifying the trend or patterns in the data, and making sense
from the data contained in the databases of companies. The under-
standing of databases, data warehouse, web search and query, and
Big Data concepts is important in extracting and applying descrip-
tive analytics tools.
Besides the descriptive statistics tools, an understanding of a

number of other analytics tools is critical in describing and drawing
meaningful conclusion from the data. These include: (a) probability
theory and its role in decision making, (b) sampling and inference
procedures, (c) estimation and confidence intervals, (d) hypothesis
testing/inference procedures for one and two population parameters,
and (e) chi-square and nonparametric tests. The understanding of
PREFACE
xiii
these tools is critical in understanding and applying inferential statis-

tics tools—a critical part of data analysis, decision making, and pre-
dictive analytics.
Highlight of This Book: Business

Analytics—Volume II
Unlike the first volume, volume II mainly focuses on predictive analyt-
ics—a critical part of BA that focuses on predictive models to predict
future business trends. A brief explanation of predictive analytics and as-
sociated models is discussed below.
1. Predictive Analytics: As the name suggests, predictive analytics

is the application of predictive models to predict future trends.
The most widely used models are regression, forecasting, data min-
ing, and machine learning–based models. Variations of regression
models include: (a) simple regression models, (b) multiple regres-
sion models, (c) nonlinear regression models including the qua-
dratic or second-order models, and polynomial regression models,
(d) regression models with indicator or qualitative independent
variables, and (e) regression models with interaction terms or in-
teraction models. Regression models are one of the most widely
used models in various types of applications. These models are used
to explain the relationship between a response variable and one
or more independent variables. The relationship may be linear or
curvilinear. The objective of these regression models is to predict
the response variable using one or more independent variables or
predictors.
The predictive models also involve a class of time series analysis,

forecasting, data mining, and machine learning models. The commonly
used forecasting models are regression-based models that use regression
analysis to forecast future trend. Many regression and time series models
are discussed in subsequent chapters. Most predictive models are used to
forecast the future trend.
xiv PREFACE
Other Models and Tools Used in Predictive Modeling

and Analytics
Data mining, machine learning, and neural network applications are also
an integral part of predictive analytics. The following topics are intro-
duced in this text:
Data Mining and Advanced Data Analysis

Introduction to Machine Learning, Neural Networks, Artificial
Intelligence
Business Intelligence (BI) and Online Analytical Processing tools
Data Visualization and Applications
Different Regression Models
Introduction to Classification and Clustering Techniques
2. Prescriptive Analytics: Prescriptive analytics is concerned with op-

timal allocation of resources in an organization. A number of op-
erations research and management science tools have been applied
for allocating the limited resources in the most effective way. The
operations management tools derived from management science and
industrial engineering including the simulation tools are also used
to study different types of manufacturing and service organizations.
These are proven tools and techniques in studying and understand-
ing the operations and processes of organizations. The tools of op-
erations management can be divided mainly into three areas. These
are (a) planning, (b) analysis, and (c) control tools. The analysis part
is the prescriptive analytics part that uses the operations research,
management science, and simulation tools. The control part is used
to monitor and control the product and service quality. There are a
number of prescriptive analytics tool. These include:
1. Linear Optimization Models including maximization and mini-
mization of different resources, computer analysis, and sensitivity
analysis
2. Integer Linear Optimization Models
3. Nonlinear Optimization Models
4. Simulation Modeling and Applications
5. Monte Carlo Simulation
PREFACE
xv
The analytics tools come under the broad area of Business Intelligence
(BI) that incorporates Business Analytics (BA), data analytics, and
advanced analytics. All these areas come under the umbrella of BI and
use a number of visual and mathematical models.
Modeling is one of the most important parts of BA. Models are of
different types. An understanding of different types of models is critical
in selecting and applying the right model or models to solve business
problems. The widely used models are: (a) graphical models, (b) quantita-
tive models, (c) algebraic models, (d) spreadsheet models, and (e) other
analytic tools.
Most of the tools in descriptive, predictive, and prescriptive analyt-
ics are described using one or the other type of model which are usually
graphical, mathematical, or computer models. Besides these models, sim-
ulation and a number of other mathematical models are used in analytics.
BA is a vast area. It is not possible to provide a complete and in-depth
treatment of all the BA topics in one concise book; therefore, the book is
divided into two parts:
• Business Analytics: A Data-Driven Decision-Making Approach

for Business—Volume I
• Business Analytics: A Data-Driven Decision-Making Approach
for Business—Volume II
The first volume is available through amazon (www.amazon.com).

The first volume provides an overview of BA, BI, and data analytics
and the role and importance of these in the modern business decision
making. It introduces the different areas of BA: (1) descriptive analyt-
ics, (2) predictive analytics, and (3) prescriptive analytics. The tools and
topics covered under each area of these analytics along with their ap-
plications in decision-making process are discussed in the first volume.
The main focus of the first volume is descriptive analytics and its
applications.
The focus of this second volume is predictive analytics. The introductory
chapters of this volume outline the broad view of BI that constitutes not only
BA but also data analytics and advanced analytics. An overview of all these
areas is presented in the first two chapters followed by predictive analytics
xvi PREFACE
topics which is the focus of this text. The topics and the chapters contained in
the second volume are outlined below. The specific topics covered in this second
volume are:
Chapter 1: Business Analytics (BA) at a Glance

Chapter 2: Business Intelligence and Business Analytics
Chapter 3: Analytics, Business Analytics (BA), Data Analytics, and
How They Fit into Broad Umbrella of Business Intel-
ligence (BI)
Chapter 4: Descriptive Analytics: Overview, Applications, and a
Case
Chapter 5: Descriptive versus Predictive Analytics
Chapter 6: Key Predictive Analytics Models (Predicting Future
Business Outcomes using Analytic Models)
Regression, forecasting, data mining techniques, and
simulation
Chapter 7: Regression Analysis and Modeling
Chapter 8: Time Series Analysis and Forecasting
Chapter 9: Data Mining: Tools and Applications in Predictive
Analytics
Chapter 10: Wrap-Up, Overview, Notes on Implementation, and
Current State of Analytics
Salt Lake City, UTAH, U.S.A.

amar@xmission.com
amar@realleansixsigmaquality.com
Acknowledgments
I would like to thank the reviewers who took the time to provide excellent
insights, which helped shape this book.
I would especially like to thank Mr. Karun Mehta, a friend and
engineer. I greatly appreciate the numerous hours he spent correcting,
formatting, and supplying distinctive comments. The book would not
have been possible without his tireless effort.
I would like to express my gratitude to Prof. Susumu Kasai, Professor of
CSIS, for reviewing and providing invaluable suggestions.
I am very thankful to Prof. Edward Engh for his thoughtful advice
and counsel. Ed has been a wonderful friend and colleague.
Special thanks to Dr. Don Wardell, Professor of Operations Manage-
ment at the University of Utah. His comments and suggestions greatly
helped shape this book.
Special thanks are due to Mr. Anand Kumar, Domain Transformation
Leader at the Tata Consulting Services (TCS) for reviewing and providing
invaluable suggestions.
Thanks to all of my students for their input in making this book pos-
sible. They have helped me pursue a dream filled with lifelong learning.
This book couldn’t have been a reality without them.
I am indebted to senior acquisitions editor, Scott Isenberg; Charlene
Kronstedt, director of production; Sheri Dean, director of marketing; all
the reviewers; and the publishing team at Business Expert Press for their
counsel and support during the preparation of this book. I also thank
Mark Ferguson, editor, for reviewing the manuscript and providing help-
ful suggestions for improvement.
I acknowledge the help and support of Chithra Amaravel, project
manager at S4Carlisle Publishing Services, Chennai, India. I thank her
entire team for their help with editing and publishing.
xviii ACKNOWLEDGMENTS
I thank my parents who always emphasized the importance of what

education brings to the world. Lastly, I have a special word of appre-
ciation for my lovely wife Nilima, my daughter Neha and her husband
Dave, my daughter Smita, and my son Rajeev—both engineers—for
their creative comments and suggestions. I am grateful for their love, sup-
port, and encouragement.
CHAPTER 1
Business Analytics
at a Glance
Chapter Highlights
• Introduction to Business Analytics—What Is It?
• Analytics and Business Analytics
• Business Analytics and Its Importance in Modern Business Decisions
• Types of Business Analytics
?? Tools of Business Analytics
• Descriptive Analytics: Graphical and Numerical Methods and

Tools of BA
?? Tools of Descriptive Analytics
• Predictive Analytics
?? Most Widely Used Predictive Analytics Models
▪▪ Data Mining, Regression Models, and Time Series

Forecasting
• Background and Prerequisites to Predictive Analytics Tools
• Other Areas Associated with Predictive Analytics
• Recent Applications and Tools of Predictive Modeling
?? Machine Learning, Data Mining, Artificial Neural Network,
and Deep Learning

• Prescriptive Analytics and Tools of Prescriptive Analytics
• Analytical Models and Decision Making Using Models
• Types of Models
• Applications and Implementation of Analytics
• Summary and Application of Business Analytics (BA) Tools
• Summary
• Glossary of Terms Related to Analytics
2 BUSINESS ANALYTICS, VOLUME II
Introduction to Business Analytics—What Is It?

This book deals with business analytics (BA)—an emerging area in mod-
ern business decision making. This chapter provides an overview of ana-
lytics and BA used as decision-making tools in businesses today. These
terms are used interchangeably, but there are slight differences in the
terms of tools and the methods they use. BA uses a number of tools and
algorithms ranging from statistics and data analysis, management science,
information systems to computer science that are used in data-driven de-
cision making in companies. This chapter also discusses the broad mean-
ing of the terms—analytics, BA, different types of analytics, the tools of
analytics—and how they are used in business decision making. Today,
companies collect and analyze massive amounts of data. Because of the
huge volume, these data are referred to as big data. Data mining is a
way to extract information from big data. We discuss data mining and
the techniques data mining use to extract useful information from mas-
sive amounts of data in subsequent sections and chapters.. Currently, the
emerging field of analytics uses machine learning, artificial intelligence,
neural networks, and deep learning techniques. These areas are becoming
an essential part of analytics and are extensively used in developing algo-
rithms and models to draw conclusions from big data.
Analytics and Business Analytics

Analytics [18] is the science of analysis—the processes by which we ana-
lyze data, draw conclusions, and make decisions. Business analytics (BA)
goes well beyond simply presenting data and creating visuals, crunching
numbers, and computing statistics. The essence of analytics lies in the ap-
plication—making sense from the data using prescribed methods of sta-
tistical analysis, mathematical and statistical models, and logic to draw
meaningful conclusion from the data. It uses methods, logic, intelligence,
algorithms, and models that enable us to reason, plan, organize, analyze,
solve problems, understand, innovate, and make data-driven decisions, in-
cluding the decisions from dynamic real-time data.
BA covers a vast area. It is a complex field that encompasses visualiza-
tion, statistics and modeling, optimization, simulation-based modeling,
Business Analytics at a Glance 3
and statistical analysis. It uses descriptive, predictive, and prescriptive

analytics, including text and speech analytics, web analytics, and other
application-based analytics and much more. The following explanation
of business analytics shows the vast area it covers.
Business analytics is a data-driven decision-making approach that uses

statistical and quantitative analysis, information technology, manage-
ment science (mathematical modeling, simulation), along with data
mining and fact-based data to measure past business performance to guide
an organization in business planning and effective decision making.
BA has three broad categories—descriptive, predictive, and prescrip-

tive analytics. Each type of analytics uses a number of tools that may
overlap depending on the applications and problems being solved. The
descriptive analytics tools are used to visualize and explore the patterns
and trends in the data. Predictive analytics uses the information from
descriptive analytics to model and predict future business outcomes with
the help of regression, forecasting, and predictive modeling.
In this age of technology, companies collect massive amount of data.
Successful companies use their data as an asset and use them for com-
petitive advantage. Most businesses collect and analyze massive amounts
of data referred to as big data using specially designed big data software
and data analytics. Big data analysis is now becoming an integral part
of BA. The organizations use BA as an organizational commitment to
data-driven decision making. BA helps businesses in making informed
business decisions and in automating and optimizing business processes.
To understand the business performance, BA makes extensive use of
data and descriptive statistics, statistical analysis, mathematical and statis-
tical modeling, and data mining to explore, investigate, draw conclusions,
and predict and optimize business outcomes. Through data, BA helps to
gain insight and drive business planning and decisions. The tools of BA
focus on understanding business performance using data. It uses a num-
ber of models derived from statistics, management science, and opera-
tions research areas. BA also uses statistical, mathematical, optimization,
and quantitative tools for explanatory and predictive modeling [15].
Predictive modeling uses different types of regression models to pre-

dict outcomes [1] and is synonymous with the field of data mining and
machine learning. It is also referred to as predictive analytics. We will pro-
vide more details and tools of predictive analytics in subsequent sections.
Business Analytics and Its Importance in Modern

Business Decision
Business analytics (BA) helps to address, explore, and answer a number of
questions that are critical in driving business decisions. It tries to answer
the following questions:
What is happening and why did something happen?

Will it happen again?
What will happen if we make changes to some of the inputs?
What the data is telling us that we were not able to see before?
BA uses statistical analysis and predictive modeling to establish trends,

figuring out why things are happening, and making a prediction about how
things will turn out in the future. It combines advanced statistical analysis
and predictive modeling to give us an idea of what to expect so that one
can anticipate developments or make changes now to improve outcomes.
BA is more about anticipated future trends of the key performance
indicators, and is about using the past data and models to learn from the
existing data (descriptive analytics) and make predictions. It is different
from reporting in business intelligence (BI). Analytics models use the data
with a view to drawing out new, useful insights to improve business plan-
ning and boost future performance. BA helps the company adapt to the
changes and take advantage of future developments.
One of the major tools of analytics is data mining, which is a part
of predictive analytics. In business, data mining is used to analyze huge
amount of business data. Business transaction data, along with other
customer and product-related data, are continuously stored in the da-
tabases. The data mining software is used to analyze the vast amount
of customer data to reveal hidden patterns, trends, and other customer
behavior. Businesses use data mining to perform market analysis to iden-

tify and develop new products, analyze their supply chain, find the root
cause of manufacturing problems, study the customer behavior for prod-
uct promotion, improve sales by understanding the needs and require-
ments of their customer, prevent customer attrition, and acquire new
customers. For example, Wal-Mart collects and processes over 20 million
point-of-sale transactions every day. These data are stored in a central-
ized database and are analyzed using data mining software to understand
and determine customer behavior, needs, and requirements. These data
are analyzed to determine sales trends and forecasts, develop marketing
strategies, and predict customer-buying habits (http://www.laits.utexas
.edu/~anorman/BUS.FOR/course.mat/Alex/).
A large amount of data and information about products, companies,
and individuals are available through Google, Facebook, Amazon, and
several other sources. Data mining and analytics tools are used to extract
meaningful information and pattern to learn customer behavior. Finan-
cial institutions analyze data of millions of customers to assess risk and
customer behavior. Data mining techniques are also used widely in the
areas of science and engineering, such as bioinformatics, genetics, medi-
cine, education, and electrical power engineering.
BA, data analytics, and advanced analytics are growing areas. They all
come under the broad umbrella of business intelligence (BI). There is going
to be an increasing demand of professionals trained in these areas. Many
of the tools of data analysis and statistics discussed here are prerequisite to
understanding data mining and BA. We will describe the analytics tools,
including data analytics, and advanced analytics later in this chapter.
Types of Business Analytics

The BA area is divided into different categories depending upon the types
of analytics and tools being used. The major categories of BA are:
• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
Each of the above mentioned categories uses different tools, and the
use of these analytics depends on the type of business and the operations
a company is involved in. For example, one organization may use only
descriptive analytics tools or a combination of descriptive and predictive
modeling and analytics to predict future business performance to drive
business decisions. Other companies may use prescriptive analytics to op-
timize business processes.
Tools of Business Analytics

The different types of analytics and the tools used in each type of analytics
are detailed below.
Descriptive Analytics: Graphical, Numerical Methods, and Tools

of Business Analytics
Descriptive analytics involves the use of descriptive statistics, including

the graphical and numerical methods to describe the data. Descriptive
analytics tools are used to understand the occurrence of certain business
phenomenon or outcomes and explain these outcomes through graphical,
quantitative, and numerical analysis. Through the visual and simple an-
alysis, using the collected data we can visualize and explore what has been
happening and the possible reasons for the occurrence of certain phenom-
enon. Many of the hidden patterns and features not apparent through
mere examination of data can be exposed through graphical and numer-
ical analyses. Descriptive analytics uses simple tools to uncover many of
the problems quickly and easily. The results enable us question many of
the outcomes so that corrective actions can be taken.
Successful use and implementation of descriptive analytics requires the
understanding of different types of data (structured vs. unstructured data),
graphical/visual representation of data, and graphical techniques using
specialized computer software capable of handling big data. Big data anal-
ysis is an integral part of BA. Businesses now collect and analyze massive
amounts of data referred to as big data. Recently, interconnections of the
devices in IOT (Internet of Things) generate huge amounts of data provid-
ing opportunities for big data applications. An overview of graphical and
visual techniques is discussed in Chapter 3. The descriptive analytics tools
include the commonly used graphs and charts along with some newly
developed graphical tools such as bullet graphs, tree maps, and data dash-
boards. Dashboards are now becoming very popular with big data. They
are used to display the multiple views of the business data graphically.
The other aspect of descriptive analytics is an understanding of num-
erical methods, including the measures of central tendency, measures of
position, measures of variation, measures of shape, and how different
measures and statistics are used to draw conclusions and make decision
from the data. Some other topics of interest are the understanding of em-
pirical rule and the relationship between two variables—the covariance
and correlation coefficient. The tools of descriptive analytics are helpful in
understanding the data, identifying the trend or patterns in the data, and
making sense from the data contained in the databases of companies. The
understanding of databases, data warehouse, web search and query, and
big data concepts are important in extracting and applying descriptive
analytics tools. A number of statistical software are used for statistical an-
alysis. Widely used software are SAS, MINITAB, and R—programming
language for statistical computing. Volume I of this book is about descrip-
tive analytics that deals with a number of applications and a detailed case
to explain and implement the applications.
Tools of descriptive analytics: Figure 1.1 outlines the tools and
methods used in descriptive analytics. These tools are explained in subse-
quent chapters.
Figure 1.1 Tools of descriptive analytics

A detailed treatment of the topics in Figure 1.1 are provided in

Volume I of this book.
Predictive Analytics
As the name suggests, predictive analytics is the application of predictive

models to predict future business outcomes and trends.
Most Widely Used Predictive Analytics Models

The most widely used predictive analytics models are regression, forecast-
ing, and data mining techniques. These are briefly explained below.
Data mining techniques are used to extract useful information from

huge amounts of data using predictive analytics, computer algo-
rithms, software, and mathematical and statistical tools. Regression
models are used for predicting the future outcomes. Variations of re-
gression models include (a) simple regression models; (b) multiple
regression models; (c) nonlinear regression models including the
quadratic or second-order models, and polynomial regression mod-
els; (d) regression models with indicator or qualitative independent
variables; and (e) regression models with interaction terms or inter-
action models. Regression models are one of the most widely used
models in various types of applications. These models explain the
relationship between a response variable and one or more indepen-
dent variables. The relationship may be linear or curvilinear. The
objective of these regression models is to predict the response vari-
able using one or more independent variables or predictors.
Forecasting techniques are widely used predictive models that involve a
class of time series analysis and forecasting models. The commonly used
forecasting models are regression-based models that use regression
analysis to forecast future trend. Other time series forecasting models
are simple moving average, moving average with trend, exponential
smoothing, exponential smoothing with trend, and forecasting sea-
sonal data. All these predictive models are used to forecast the future
trend. Figure 1.2 shows the widely used tools of predictive analytics.
Figure 1.2 Tools of predictive analytics
Background and Prerequisites to Predictive

Analytics Tools
Besides the tools described in Figure 1.2, an understanding of a num-
ber of other analytics tools is critical in describing and drawing mean-
ingful conclusions from the data. These include (a) probability theory,
probability distributions, and their role in decision making; (b) sam-
pling and inference procedures; (c) estimation and confidence intervals;
(d) hypothesis testing/inference procedures for one and two population
parameters; and (e) analysis of variance (ANOVA) and experimental
designs. The understanding of these tools is critical in understanding
and applying inferential statistics tools in BA. They play an important
role in data analysis and decision making. These tools are outlined in
Figure 1.3.
Figure 1.3 Prerequisite to predictive analytics
Other Areas Associated with Predictive Analytics

Figure 1.4 outlines recent applications and tools of predictive analyt-
ics and modeling. The tools outlined in Figure 1.4 are briefly explained
below. Extensive applications have emerged in recent years using these
methods, which are hot topics of research. A number of applications in
business, engineering, manufacturing, medicine, signal processing, and
computer engineering using machine learning, neural networks, and deep
learning [25] are being reported.
Figure 1.4 Recent applications and tools of predictive

modeling
Recent Applications and Tools of Predictive Modeling

Machine Learning, Data Mining, and Neural Networks [5]
In the broad area of data and predictive analytics, machine learning is a

method used to develop complex models and algorithms that are used to
make predictions. The analytical models in machine learning allow the
analysts to make predictions by learning from the trends, patterns, and
relationships in the historical data. Machine learning automates model
building. The algorithms in machine learning are designed to learn itera-
tively from data without being programmed.
According to Arthur Samuel, machine learning gives “computers the
ability to learn without being explicitly programmed.”[2, 3] Samuel, an
American pioneer in the field of computer gaming and artificial intelli-
gence, coined the term “machine learning” in 1959 while at IBM.
Machine learning algorithms are extensively used for data-driven pre-
dictions and in decision making. Some applications where machine learn-
ing has been used are e-mail filtering, detection of network intruders or
detecting a data breach, optical character recognition (OCR), learning to
rank, computer vision, and a wide range of engineering and business ap-
plications. Machine learning is employed in a range of computing tasks.
Often designing and programming explicit algorithms that are reproduc-
ible and have repeatability with good performance is difficult or infeasible.
Machine Learning and Data Mining
Machine learning and data mining are similar in some ways and often
overlap in applications. Machine learning is used for prediction based on
known properties learned from the training data, whereas data mining
algorithms are used for discovery of (previously) unknown patterns. Data
mining is concerned with knowledge discovery in databases (or KDD).
Data mining uses many machine learning methods. On the other
hand, machine learning also employs data mining methods as “unsuper-
vised learning” or as a preprocessing step to improve learner accuracy.
The goals are somewhat different. The performance of machine learn-
ing is usually evaluated with respect to the ability to reproduce known
knowledge. Data mining, which is knowledge discovery from the data

(KDD), the key task is the discovery of previously unknown knowledge.
Unlike machine learning, which is evaluated with respect to known
knowledge, data mining uses uninformed or unsupervised methods that
often outperform compared with other supervised methods. In a typical
KDD task, supervised methods cannot be used due to the unavailability
of training data.
Machine Learning Tasks
Machine learning tasks are typically classified into following three broad
categories, depending on the nature of the learning “signal” or “feedback”
available to a learning system. These are as follows [20]:
• Supervised learning: The computer is presented with example in-

puts and their desired outputs given by the analyst, and the goal is
to learn a general rule that maps inputs to outputs.
• Unsupervised learning: As the name suggests, in unsupervised
learning no labels are given to the program. The learning algo-
rithm is expected to find the structure in its input. The goals of
unsupervised learning may be finding hidden pattern in the large
data. Thus, unsupervised learning process are not based on general
rule of training the algorithms.
• Reinforcement learning: In this type of learning, the designed
computer program interacts with a dynamic environment in
which it has a specific goal to perform. This differs from standard
supervised learning as no input/output pairs are provided, which
involves finding a balance between exploration (of uncharted ter-
ritory) and exploitation (of current knowledge) [6]. Examples of
reinforced learning are playing a game against an opponent. In this
type of learning, the computer program is provided feedback in
terms of rewards and punishments as it navigates its problem space.
Another application of machine learning is in the area of deep learn-

ing, which is based on artificial neural networks. In these applications,
the learning tasks may contain more than one hidden layer or task that
contains a single hidden layer known as shallow learning.
The other categorization of machine learning tasks arises when one

considers the desired output of a machine-learned system [5]. Some of
these are classification, clustering, and regression.
• In classification, inputs are divided into two or more classes, and

the learner must produce a model that assigns unseen inputs to one
or more multilabel classification of these classes. This is typically
tackled in a supervised way. Spam filtering is an example of clas-
sification, where the inputs are e-mail (or other) messages and the
classes are “spam” and “not spam.”
• In regression, also a supervised problem, the outputs are continu-
ous rather than discrete. Various types of regression models are
used based on the objectives.
• In clustering, a set of inputs is to be divided into groups. Unlike
classification, the groups are not known beforehand, making this
typically an unsupervised task.
• Machine learning and statistics are closely related fields.
Artificial Neural Networks [7]
An artificial neural network (ANN) learning algorithm, usually called

“neural network” (NN), is a learning algorithm that is inspired by the
structure and functional aspects of biological neural networks. Com-
putations are structured in terms of an interconnected group of artifi-
cial neurons, processing information using a connectionist approach to
computation. Modern NNs are nonlinear statistical data modeling tools.
They are usually used to model complex relationships between inputs and
outputs, to find patterns in data, or to capture the statistical structure in
an unknown joint probability distribution between observed variables.
Deep Learning [7]
Falling hardware prices and the development of graphics processing units

(GPUs) for personal use in the last few years have contributed to the de-
velopment of the concept of deep learning, which consists of multiple hid-
den layers in an ANN. This approach tries to model the way the human
brain processes light and sound into vision and hearing. Some successful
applications of deep learning are computer vision and speech recognition.
Note: Neural networks use machine learning algorithms extensively, whereas machine learn-
ing is an application of artificial intelligence that automates analytical model building by
using algorithms that iteratively learn from data without being explicitly programmed [1].
Prescriptive Analytics and Tools

of Prescriptive Analytics
Prescriptive analytics tools are used to optimize certain business process
and use a number of different tools that depend on specific application
area. Some of these tools are explained here.
Prescriptive analytics is concerned with optimal allocation of resources
in an organization. A number of operations research and management
science tools have been applied for allocating the limited resources in the
most effective way. The operations management tools that are derived from
management science and industrial engineering including the simulation
tools and have also been used to study different types of manufacturing and
service organizations. These are proven tools and techniques in studying
and understanding the operations and processes of organizations. In addi-
tion, operations management has wide applications in analytics. The tools
Figure 1.5 Prescriptive analytics tools

Figure 1.6 Descriptive, predictive, and prescriptive analytics tools

of operations management can be divided into mainly three areas: (a) plan-
ning, (b) analysis, and (c) control tools. The analysis part is the prescriptive
analysis part that uses the operations research, management science, and
simulation. The control part is used to monitor and control the product and
service quality. The prescriptive analytics models are shown in Figure 1.5.
Figure 1.6 outlines the tools of descriptive, predictive, and prescrip-
tive analytics tools together. This flow chart is helpful in outlining the dif-
ference and details of the tools for each type of analytics. The flow chart in
Figure 1.6 shows the vast areas of business analytics (BA) that come under
the umbrella of business intelligence (BI).
Analytical Models and Decision Making Using Models
A major part of analytics is about solving problems using different types

of models. The following are the most commonly used models and are
parts of descriptive, predictive, or prescriptive analytics models. Some of
these models are listed below and will be discussed later.
Types of Models
(i) Graphical models, (ii) quantitative models, (iii) algebraic models,
(iv) spreadsheet models, (v) simulation models, (vi) process optimization
models, and (vii) other—predictive and prescriptive models.
Applications and Implementation of Analytics
Business analytics (BA) practice deals with extraction, exploration, and

analysis of a company’s information in order to make effective and timely
decisions. The information to make decisions is contained in the data.
The companies collect enormous amounts of data that must be processed
and analyzed using appropriate means to draw meaningful conclusions.
Much of the analysis using data and information can be attributed to
statistical analysis. In addition to the statistical tools, BA uses predictive
modeling tools. Predictive modeling uses data mining techniques, includ-
ing anomaly or outlier detection, techniques of classification and cluster-
ing, and different types of regression and forecasting models to predict
future business outcomes. Another set of powerful tools in analytics is

prescriptive modeling tools. These include optimization and simulation
tools to optimize business processes.
Although the major objective of BA is to empower companies to make
data-driven decisions, it also helps companies to automate and optimize
business processes and operations.
Summary and Application of Business Analytics Tools
• Descriptive analytics tools uses statistical, graphical, and nu-

merical methods to understand the occurrence of certain business
phenomenon. These simple tools of descriptive analytics are very
helpful in explaining the vast amount of data collected by busi-
nesses. The quantitative, graphical, and visual tools along with
simple numerical methods provide insights that are very helpful in
data-driven fact-based decisions.
• Predictive modeling or predictive analytics tools are used to pre-
dict future business phenomenon. Predictive models have many
applications in business. Some examples include the spam detec-
tion in messages and fraud detection. It has been used in outlier
detection in the data that can point toward fraud detection. Other
areas where predictive modeling tools have been used or being used
are customer relationship management (CRM) and predicting
customer behavior and buying patterns. Other applications are in
the areas of engineering, management, capacity planning, change
management, disaster recovery, digital security management, and
city planning. One of the major applications of predictive model-
ing is data mining. Data mining involves exploring new patterns
and relationships from the collected data.
• Data mining is a part of predictive analytics. It involves analyzing
massive amount of data. In this age of technology, businesses collect
and store massive amount of data at enormous speed every day. It
has become increasingly important to process and analyze the huge
amount of data to extract useful information and patterns hidden
in the data. The overall goal of data mining is knowledge discov-
ery from the data. Data mining involves (i) extracting previously
unknown and potential useful knowledge or patterns from massive

amount of data collected and stored and (ii) exploring and analyz-
ing these large quantities of data to discover meaningful pattern
and transforming data into an understandable structure for fur-
ther use. The field of data mining is rapidly growing and statistics
plays a major role in it. Data mining is also known as knowledge
discovery in databases (KDD), pattern analysis, information har-
vesting, business intelligence, business analytics, and so on. Besides
statistics, data mining uses artificial intelligence, machine learn-
ing, database systems and advanced statistical tools, and pattern
recognition.
• Prescriptive analytics tools have applications in optimizing and
automating business processes. Prescriptive analytics is concerned
with optimal allocation of resources in an organization. A number
of operations research and management science tools are used for
allocating limited resources in the most effective way. The com-
mon prescriptive analytics tools are linear and nonlinear optimiza-
tion model, including linear programming, integer programming,
transportation, assignment, scheduling problems, 0-1 program-
ming, simulation problems, and many others. Many of the oper-
ations management tools that are derived from management
science and industrial engineering, including the simulation tools,
are also part of prescriptive analytics.
Descriptive, Predictive, and Prescriptive Modeling
The first volume of this book provided the details of descriptive analytics
and outlined the tools of predictive and prescriptive analytics. The predic-
tive analytics is about predicting the future business outcomes. The sec-
ond volume of this book is about predictive modeling which provides the
background and the models used in predictive modeling with applications
and cases. We have explained the distinction between descriptive, predic-
tive, and prescriptive analytics. The prescriptive analytics is about optimiz-
ing certain business activities. A complete treatment of the topics used in
predictive and prescriptive analytics is not possible in one brief volume of
analytics book; therefore, this volume II focuses on predictive modeling.
Summary
Business analytics (BA) uses data, statistical analysis, mathematical and
statistical modeling, data mining, and advanced analytics tools, includ-
ing forecasting and simulation, to explore, investigate, and understand
the business performance. Through data, BA helps to gain insight and
drive business planning and decisions. The tools of BA focus on under-
standing business performance based on the data and a number of mod-
els derived from statistics, management science, and different types of
analytics tools.
BA helps companies to make informed business decisions and can
be used to automate and optimize business processes. Data-driven com-
panies treat their data as a corporate asset and leverage it for competi-
tive advantage. Successful business analytics depends on data quality and
skilled analysts who understand the technologies. BA is an organizational
commitment to data-driven decision making.
This chapter provided an overview of the field of BA. The tools
of BA, including the descriptive, predictive, and prescriptive analyt-
ics along with advanced analytics tools were discussed. This chapter
also introduced a number of terms related to and used in conjunction
with BA. Flow diagrams outlining the tools of each of the descriptive,
predictive, and prescriptive analytics were presented. This second vol-
ume of business analytics book is a continuation of the first volume.
A preview of this second volume entitled Business Analytics: A Data-
Driven Decision-Making Approach for Business: Volume II is provided
in this chapter.
Glossary of Terms Related to Analytics

Big Data A collection of data sets so large and complex that it becomes dif-
ficult to process using on-hand database management tools or traditional data
processing application [Wikipedia]. Most businesses collect and analyze massive
amounts of data referred to as big data using specially designed big data software
and data analytics. Big data analysis is an integral part of business analytics.
Big Data Definition (As per O’Reilly media): Big data is data that exceeds the
processing capacity of conventional database systems. The data is too big, moves
too fast, or does not fit the structures of your database architectures. To gain value
from these data, one must choose an alternative way to process it.
Gartner was credited with the three “Vs” of big data. Gartner’s definition of big
data is as follows: high-volume, high-velocity, and high-variety information assets
that demand cost-effective, innovative forms of information processing that en-
able enhanced insight, decision making, and process automation.
Gartner is referring to the size of data (large volume), speed with which the
data is being generated (velocity), and the different types of data (variety), and
this seemed to align with the combined definition of Wikipedia and O’Reilly
media.
Mike Gualtieri of Forrester said that the three “Vs” mentioned by Gartner
are just measures of data. He insisted that following definition is more actionable
and can be seen as follows:
Big data is the frontier of a firm’s ability to store, process, and access (SPA)
all the data it need to operate effectively, make decisions, reduce risks, and serve
customers.
Algorithm A mathematical formula or statistical process used to analyze data.
Analytics Involves drawing insights from the data, including big data. Analyt-
ics uses simple to advanced tools depending upon the objectives. Analytics may
involve visual display of data (charts and graphs), descriptive statistics, making
predictions, forecasting future outcomes, or optimizing business processes. The
more recent terms is Big Data Analytics that involves making inferences using
very large sets of data. Thus, analytics can take different form depending on the
objectives and the decisions to be made. They may be descriptive, predictive, or
prescriptive analytics. These are briefly described here.
Descriptive Analytics If you are using charts and graphs or time series plots to
study the demand or the sales patters, or the trend for the stock market, you are
using descriptive analytics. Also, calculating statistics from the data such as the
mean, variance, median, or percentiles are all examples of descriptive analytics.
Some of the recent software are designed to create dashboards that are useful in
analyzing business outcomes. The dashboards are examples of descriptive analyt-
ics. Of course, a lot more details can be created from the data by plotting and
performing simple analyses.
Predictive Analytics As the name suggests, predictive analytics is about predict-
ing the future business outcomes. It also involves forecasting demand, sales, and
profits for a company. The commonly used techniques for predictive analytics are
different types of regression and forecasting models. Some advanced techniques
are data mining, machine learning, neural networks, and advanced statistical
models. We will discuss the regression and forecasting techniques as well as the
related terms later in this book.
Prescriptive Analytics Prescriptive analytics involves analyzing the results of the
predictive analytics and “prescribes” the best category to target and minimize or
maximize the objective(s). It builds on predictive analytics and often suggests
the best course of action, leading to best possible solution. It is about optimizing
(maximizing or minimizing) an objective function. The tools of prescriptive ana-
lytics are now used with big data to make data-driven decisions by selecting the
best course of actions involving multicriteria decision variables. Some examples
of prescriptive analytics models are linear and nonlinear optimization models,
different types of simulations, and others.
Data Mining Data mining involves finding meaningful patterns and deriving
insights from large data sets. It is closely related to analytics. Data mining uses
statistics, machine learning, and artificial intelligence techniques to derive mean-
ingful patterns.
Analytical Models The most commonly used models that are parts of descrip-
tive, predictive, or prescriptive analytics are graphical models, quantitative mod-
els, algebraic models, spreadsheet models, simulation models, process models,
and other analytic models—predictive and prescriptive models.
IoT Stands for Internet of Things or IOT. It means the interconnection of com-
puting devices in embedded objects (sensors, cars, fridges, etc.) via Internet
with capabilities of sending or receiving data. The devices in IOT generate huge
amounts of data providing opportunities for big data applications and data ana-
lytics opportunities.
Machine Learning Machine learning is a method of designing systems that can
learn, adjust, and improve based on the data fed to them. Machine learning works
based on predictive and statistical algorithms that are provided to these machines.
The algorithms are designed to learn and improve as more data flow through the
system. Fraud detection, e-mail spam, and GPS systems are some examples of
machine learning applications.
R “R” is a programming language for statistical computing. It is one of the popu-
lar languages in data science.
Structured vs. Unstructured Data Refer to the “volume” and “variety”—the
“Vs” of big data. Structured data is the data that can be stored in the relational
databases. This type of data can be analyzed and organized in such a way that
can be related to other data via tables. Unstructured data cannot be directly put
in the databases or analyzed or organized directly. Some examples are e-mail/text
messages, social media posts, and recorded human speech, etc.
CHAPTER 2
Business Analytics and

Business Intelligence
Chapter Highlights
• Business Analytics and Business Intelligence—Overview
• Types of Business Analytics and Their Objectives
• Input to Business Analytics, Types of Business Analytics, and
Their Purpose
• Business Intelligence and Business Analytics: Differences
• Business Intelligence and Business Analytics: A Comparison
• Summary
Business Analytics and Business

Intelligence—Overview
The terms analytics, business analytics (BA), and business intelligence (BI)
are used interchangeably in the literature and are related to each other.
Analytics is a more general term and is about analyzing the data using
data visualization and statistical modeling to help companies make ef-
fective business decisions. The tools used in analytics, BA, and BI often
overlap. The overall analytics process includes descriptive analytics, in-
volving processing and analyzing big data, applying statistical techniques
(numerical methods of describing data, such as measures of central ten-
dency, measures of variation, etc.), and statistical modeling to describe the
data. Analytics also uses predictive analytics methods, such as regression,
forecasting, data mining, and prescriptive analytics tools of management
science and operations research. All these tools help businesses in making
informed business decisions. The analytics tools are also critical in auto-
mating and optimizing business processes.
The types of analytics are divided into different categories. Accord-
ing to the Institute of Operations Research and Management Science
(INFORMS)—(www.informs.org)—the field of analytics is divided into
three broad categories: descriptive, predictive, and prescriptive. We dis-
cussed each of the three categories along with the tools used in each one.
The tools used in analytics may overlap and the use of one or the other
type of analytics depends on the applications. A firm may use only the
descriptive analytics tools or a combination of descriptive and predictive
analytics depending upon the types of applications, analyses, and deci-
sions they encounter.
Types of Business Analytics and Their Objectives

The term business analytics (BA) involves modeling and analysis of busi-
ness data. BA is a powerful and complex field that incorporates wide
application areas including descriptive analytics including data visual-
ization, statistical analysis and modeling; predictive analytics, text and
speech analytics, web analytics, decision processes, prescriptive analytics
including optimization models, simulation, and much more. Table 2.1
briefly describes the objectives of each of the analytics.
Table 2.1 Objective of each of the analytics

Type of Analytics Objectives
Descriptive Use graphical and numerical methods to describe the data.
The tools of descriptive analytics are helpful in understand-
ing the data, identifying the trend or pattern in the data,
and making sense from the data contained in the databases
of companies.
Predictive Predictive analytics is the application of predictive models
that are used to predict future trends.
Prescriptive Prescriptive analytics is concerned with optimal allocation
of resources in an organization using a number of operations
research, management science, and simulation tools.
Business Analytics and Business Intelligence 25
Input to Business Analytics, Types of Business

Analytics, and Their Purpose
The flow chart in Figure 2.1 shows the overall business analytics (BA)
process. It shows the inputs to the process that mainly consist of business
intelligence (BI) reports, business database, and cloud data repository.
Figure 2.1 Input to the business analytics process, types of analytics,

and description of tools in each type of analytics
Figure 2.1 lists the purpose of each of the analytics—descriptive, pre-

dictive, and prescriptive—and the problems they attempt to address are
outlined below the top input row. For each type of BA, the analyses per-
formed and a brief description of the tools are also presented.
Tools of Each Type of Analytics and Their Objectives

A summary of the tools used in each type of analytics and their objectives
is listed in Tables 2.2, 2.3, and 2.4. The tables also outline the questions
each of the analytics tries to answer.
The three types of analytics are dependent and overlap in applications.
The tools of analytics sometimes are used in combination. Figure 2.2
shows the interdependence of the tools used in analytics.
Table 2.2 Descriptive analytics, questions they attempt to answer,

and their tools
Analytics Attempts to Answer Tools
Descriptive How can we understand • Concepts of data, types of data, data
the occurrence of certain quality, and measurement scales for
business phenomenon or data.
outcomes and explain: • Data visualization tools—graphs and
• Why did something charts along with some newly de-
happen? veloped graphical tools such as bullet
• Will it happen again? graphs, tree maps, and data dash-
• What will happen if we boards. Dashboards are used to display
make changes to some of the multiple views of the business data
the inputs? graphically. Big data visualization and
• What the data is telling analysis.
us that we were not able • Descriptive statistics including the
to see before? measures of central tendency, meas-
• Using data, how can we ures of position, measures of variation,
visualize and explore and measures of shape.
what has been happen- • Relationship between two variables—
ing and the possible rea- the covariance and correlation
sons for the occurrence coefficient.
of certain phenomenon? • Other tools of descriptive analyt-
ics are helpful in understanding the
data, identifying the trend or pat-
terns in the data, and making sense
from the data contained in the data-
bases of companies. The understand-
ing of databases, data warehouse,
web search and query, and big data
applications.
Table 2.3 Predictive analytics, questions they attempt to answer, and

their tools
Predictive • How the trends and pat- • Regression models including: (a)
terns identified in the simple regression models; (b) mul-
data can be used to pre- tiple regression models; (c) nonlinear
dict the future business regression models, including the
outcome(s)? quadratic or second-order models, and
• How can we identify polynomial regression models; (d)
appropriate prediction regression models with indicator or
models? qualitative independent variables; and
• How the models can be (e) regression models with interaction
used in making predic- terms or interaction models.
tion about how things • Forecasting techniques. Widely used
will turn out in the predictive models involve a class of
future—what will hap- time series analysis and forecasting mod-
pen in the future? els. The commonly used forecasting
• How can we predict the models are regression-based models
future trends of the key that use regression analysis to fore-
performance indica- cast future trend. Other time series
tors using the past data forecasting models are simple moving
and models and make average, moving average with trend,
predictions? exponential smoothing, exponential
smoothing with trend, and forecasting
seasonal data.
• Analysis of variance (ANOVA) and
design of experiments techniques.
• Data mining techniques—used to
extract useful information from huge
amounts of data known as knowledge
discovery from database (KDD) using
predictive data mining algorithms,
software, and mathematical and statis-
tical tools.
• Prerequisite for predictive modeling:
(a) probability and probability dis-
tributions and their role in decision
making, (b) sampling and inference
procedures, (c) estimation and confi-
dence intervals, (d) hypothesis testing/
inference procedures for one and two
population parameters, and (e) chi-
square and nonparametric tests.
• Other tools of predictive analytics: ma-
chine learning, artificial intelligence,
neural networks, and deep learning
(discussed later).
Table 2.4 Prescriptive analytics, questions they attempt to answer,

and their tool
Prescriptive • How can we optimally A number of operations research and man-
allocate resources in an agement science tools
organization? • Operations management tools derived
• How can the linear, from management science and indus-
nonlinear optimization, trial engineering including the simula-
and simulation tools tion tools.
can be used for optimiz- • Linear and nonlinear optimization
ing business processes models.
and optimal allocation • Linear programming, integer linear
of resources? programming, simulation models, deci-
sion analysis models, and spread-sheet
models.
Figure 2.2 Interconnection between the tools of different types of

analytics
Business Intelligence and Business Analytics: Differences

Business intelligence (BI) and business analytics (BA) are sometimes used
interchangeably, but there are alternate definitions.[14] One definition
contrasts the two, stating that the term business intelligence refers to col-
lecting business data to find information primarily through asking ques-
tions, reporting, and online analytical processes (OLAPs). BA, on the
other hand, uses statistical and quantitative tools and models for explana-
tory, predictive, and prescriptive modeling.[15]
BI programs can also incorporate forms of analytics, such as data min-
ing, advanced predictive analytics, text mining, statistical analysis, and big
data analytics. In many cases, advanced analytics projects are conducted
and managed by separate teams of data scientists, statisticians, predic-
tive modelers, and other skilled analytics professionals, whereas BI teams
oversee more straightforward querying and analysis of business data.
Thus, it can be argued that the BI is the “descriptive” part of data an-
alysis, whereas BA means BI plus the predictive and prescriptive elements,
and all the visualization tools and extra bits and pieces that make up the
way we handle, interpret visualize, and analyze data. Figure 2.3 shows the
broad area of BI that comprises BA, advanced analytics, and data analytics.
Figure 2.3 The broad area of business intelligence (BI)
Business Intelligence and Business Analytics:

A Comparison
The flow chart in Figure 2.4 compares the business intelligence (BI) with
business analytics (BA). The overall objectives and functions of a BI pro-
gram are outlined. The BI originated from reporting but later emerged as
an overall business improvement process that provides the current state of
Figure 2.4 Comparing business intelligence (BI) and business analytics (BA)
the business. The information about what went wrong or what is happen-
ing in the business provides opportunities for improvement.
BI may be seen as the descriptive part of data analysis but when combined
with other areas of analytics—predictive, advanced, and data analytics—
provides a powerful combination of tools. These tools enable the analyst and
data scientists to look into the business data, the current state of the business,
and make use of predictive, prescriptive, data analytics tools as well as the
powerful tools of data mining to guide an organization in business planning,
predicting the future outcomes, and make effective data-driven decisions.
The flow chart in Figure 2.4 also outlines the purpose of BA program
and briefly mentions the tools and the objectives of BA. Different types of
analytics and their tools are discussed earlier and are shown in Table 2.2.
The terms business analytics (BA) and business intelligence (BI) are
used interchangeably and often the tools are combined and referred to as
business analytics or business intelligence program. Figure 2.5 shows the
31
Figure 2.5 Business intelligence (BI) and business analytics (BA) tools
tools of BI and BA. Note that the tools overlap in the two areas. Some of
these tools are common to both.
Summary
This chapter provided an overview of business analytics (BA) and busi-
ness intelligence (BI) and outlines the similarities and differences between
them. The BA, different types of analytics—descriptive, predictive, and
prescriptive—and the overall analytics process were explained using a
flow diagram. The input to the analytics process and the types of ques-
tions each analytics attempts to answer along with their tools were dis-
cussed in detail. The chapter also discussed BI and a comparison between
BA and BI. Different tools used in each type of analytics—descriptive,
predictive, and prescriptive—and their relationship were described. The
tools of analytics overlap in applications, and in many cases, a combina-
tion of these tools are used. The interconnection between different types
of analytics tools were explained. Finally, a comparison between the BI
and BA was presented. BA, data, analytics, and advanced analytics fall
under the broad area of BI. The broad scope of BI and the distinction
between the BI and BA tools were outlined.
CHAPTER 3
Analytics, Business
Analytics, Data Analytics,
and How They Fit into the
Broad Umbrella of Business
Intelligence
Chapter Highlights
• Introduction: Analytics, Business Analytics, and Data Analytics
?? Analytics
?? Business Analytics
?? Data Analysis and Analytics
?? Requirements of Data Analytics
?? Data and Data Quality
?? Tools and Applications of Data Analytics
• Business Intelligence—Defined
• Origin of Business Intelligence
• How Does Business Intelligence Fit into Overall Analytics?
• Business Intelligence and Support Systems
• Applications of Business Intelligence
• Tools of Business Intelligence
• BI Functions and Applications Explained
?? Reporting
?? Online Analytical Processing (OLAP)
?? Business Process Management

• More Application Areas of Analytics

?? Data Mining
?? Process Mining
?? Business Performance Management
?? Text Mining/Text Analytics
• Application Areas of Analytics

• Analytics as Applied to Different Areas
?? Supply Chain Analytics
?? Web Analytics
?? Marketing Analytics Human Resource Analytics
?? Financial Analytics
• Advanced Analytics
• BI Programs in Companies
• Specific Areas of BI Applications in an Enterprise
• Success Factors for BI Applications
• Comparing BI with BA
• Difference between BA and BI
• Glossary of Terms Related to Business Intelligence
• Summary
Introduction: Analytics, Business Analytics,

and Data Analytics
In this chapter, we discuss analytics, business analytics (BA), and data analytics
(DA) as decision-making tools in businesses today. Although these terms are
used interchangeably, there are slight differences in the terms of the tools they
use. The BA tools combined with the applications in the computer science
and information technology are critical for the success of BA in a company.
This chapter discusses the broad meaning of the terms analytics, BA, DA, and
business intelligence (BI) and how they are used in business decision making.
Analytics
Analytics is the science of analysis—the processes by which we analyze and

interpret data, draw conclusions, and make decisions. Business analytics
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 35
(BA) goes beyond simply presenting data and creating visuals, crunch-
ing numbers, and computing statistics. The essence of analytics lies in
the application—making sense from the data using prescribed statistical
methods, tools, and logic to draw meaningful conclusion from the data.
It uses logic, learning, intelligence, and mental models that enable us to
reason, organize, analyze, and solve problems, and understand the data,
learn, and make data-driven decisions.
Business Analytics
Business analytics (BA) covers a vast area. It is a complex field that en-
compasses visualization, statistics, statistical analysis, and modeling. It
uses descriptive, predictive, and prescriptive analytics, including text and
speech analytics, web analytics, decision processes, and much more.
Data Analysis and Data Analytics
Data analysis is the process of systematically applying statistical tech-

niques to collect, describe condense, illustrate, analyze, interpret, and
evaluate data. It is a process of summarizing, organizing, and com-
municating information using a number of graphical tools, including
histograms, stem-and-leaf plots, box plots, distribution charts, and sta-
tistical inferential tools. The methods of displaying data using charts and
graphs vary widely, including those of displaying big data that often use
dashboards.
Data analytics (DA) is about drawing conclusion by examining and
analyzing data sets. It uses specialized systems and software. DA tech-
niques are widely used in industries to enable organizations to make
more-informed data-driven business decisions.
DA is about extracting meaning from raw data using specialized
computer systems and software that organize, transform, and model the
data to draw conclusions and identify patterns. It is all about running
the business in a better way, make data-driven informed decisions (not
based on the assumption), improving the market share, and profitability.
Today, DA is often associated with the analysis of large volumes of data and/
or high-velocity data, which presents unique data preparation, handling, and
computational challenges. DA professionals have expertise in statistics and

statistical modeling and in using data analysis and big data software. The
skilled DA professionals are called data scientists.
DA techniques have wide applications in research, medicine, and
other areas listed below. The techniques are used to draw inference and
to prove or disprove theories and hypotheses. Some of the areas where
analytics techniques are being used are marketing—gaining customer
insights—retail solutions, digital marketing, Internet security, manufac-
turing and supply chain analytics, science and medicine, engineering, risk
management, and financial analysis.
Requirements of Data Analytics
Data analytics (DA) also involves cleansing, organizing, presenting,

transforming, and modeling data to gain insight and discover useful
information. One of the most important requirements and criteria of DA
is data quality.
Prerequisites to Data Analytics: Data Preparation

for Data Analytics
Before the data can be used effectively for analysis, the following data
preparation steps are essential:
1. Data cleansing
2. Scripting
3. Data transformation
4. Data warehousing
Data and Data Quality
In data analysis and analytics, data can be viewed as information. Data

are also measurements. The purpose of data analysis is to make sense from
data. Data when collected (in its raw form) is known as raw data. These
are the data not processed.
In data analysis, data need to be converted into a form suitable for

reporting and analytics [http://searchdatamanagement.techtarget.com/
definition/data].
It is acceptable for data to be used as a singular subject or a plural
subject. Raw data is a term used to describe data in its most basic digital
format.
Data quality is affected by the way data are collected, entered in the
system, stored, and managed. Efficient and accurate storage (data ware-
house), cleansing, and data transformation are critical for assuring data
quality. The following are important considerations in assuring data qual-
ity. Aspects of data quality include [http://searchdatamanagement.tech-
target.com/definition/data-quality].
Accuracy Completeness
Update status Relevance
Consistency across data sources Reliability
Appropriate presentation Accessibility
Within an organization, acceptable data quality is crucial to opera-

tional and transactional processes. The abovementioned aspects are pre-
requisites to data analytics.
Tools and Applications of Data Analytics
Data analytics (DA) predominantly refers to an assortment of applica-

tions including basic business intelligence (BI), reporting and online ana-
lytical processing (OLAP), and various forms of advanced analytics. It is
similar in nature to BA, which is about analyzing data to make informed
data-driven decisions relating to business applications—DA has a broader
focus. The expansive view of the term isn’t universal, though. In some cases,
people use DA specifically to mean analytics and advanced analytics, treating
BI as a separate category. We will provide more details of BI in Chapter 4.
Overall, DA applications include extracting data from the databases and
data warehouse; querying, reporting, online analytical processing (OLAP),
advanced analytics, process mining, business performance management,
benchmarking, text mining, mobile and real-time BI, cloud-based applica-

tions, tools for reporting and displaying business metrics, and key performance
indicators (KPIs) of business. In addition, the applications of BI include the
tools of predictive and prescriptive analytics. These terms are explained below.
Business Intelligence—Defined
According to David Loshin, business intelligence (BI) is “…the processes,
technologies and tools needed to turn data into information, information into
knowledge, and knowledge into plans that drive profitable business actions.”
According to Larissa Moss’, BI is “… an architecture and a collection of
integrated operational as well as decision-support applications and databases
that provide the business community easy access to business data.”
BI is a technology-driven process for processing and analyzing data to
make sense from huge quantities of data that businesses collect and obtain
from various sources. In a broad sense, BI is both visualization and ana-
lytics. The purpose of visualization or graphic presentation of data is to
obtain meaningful and useful information to help management, business
managers, and other end-users make more-informed business decisions.
BI uses a wide variety of tools, applications, and methodologies that en-
able organizations to collect data from internal systems and processes as
well as external sources. The collected data may be both structured and
unstructured. The first challenge is to prepare the data to run queries,
perform analysis, and create reports.
One of the major tasks is to create dashboards and other forms of
data visualizations and make the analysis results available to corporate
decision makers as well as the managers and others involved in the
decision-making process [http://searchbusinessanalytics.techtarget.com/
definition/business-intelligence-BI [9,10]].
Origin of Business Intelligence

Business intelligence (BI) has evolved from business reporting which in-
volves reporting of operational and financial data by a business enterprise.
Reports are generated using powerful and easy-to-use data analysis or DA
tools that may be simple to more complex. With the advancement of
technology and computing power, visuals and data dashboards are com-
monly used in business reporting.
The BI tools, technologies, and technical architectures are used in
the collection, analysis, presentation, and dissemination of business in-
formation. The analysis of business data provides historical as well as
current and future views of the business performance. Specialized data
analysis and software are now available that are capable of processing
and analyzing big data. They can create multiple views of the busi-
ness performance in form of dashboards, which are extremely helpful
in displaying current business performance. The big data software is
now being used for analyzing vast amount of data. They are extremely
helpful in the decision-making process. Besides data visualization, a
number of models described earlier are used to predict and optimize
future business outcomes.
How Does Business Intelligence Fit

into Overall Analytics?
Business intelligence (BI) combines a broad set of data analysis applica-
tions. It includes applications such as querying, enterprise reporting,
online analytical processing (OLAP), mobile BI, real-time BI, oper-
ational BI, and cloud applications. BI technology also includes data
visualization software for designing charts and other infographics, as
well as tools for building BI dashboards and performance scorecards
that display visualized data on business metrics and key performance
indicators in an easy-to-grasp way. BI applications can be bought sep-
arately from different vendors or as part of a unified BI platform from
a single vendor.
BI programs can also incorporate forms of advanced analytics such
as predictive analytics, data mining, text mining, statistical analysis, and
big data analytics. In many cases, advanced analytics projects are con-
ducted and managed by separate teams of data scientists, statisticians,
predictive modelers, and other skilled analytics professionals, whereas
BI teams oversee more straightforward querying and analysis of busi-
ness data.
Business Intelligence and Support Systems

Business intelligence (BI) uses a set of techniques, algorithms, and tools
to analyze raw data from multiple sources in a business. BI tools and
methods are helpful in getting insights using data analysis that enables
managers to make fact-based decisions.
The overall objectives of the BI and BA programs are similar. Com-
panies are adopting these programs because these are critical in driving
business decisions based on the current data. In a broad sense, BI incor-
porates the tools of BA, including the descriptive, predictive, and pre-
scriptive analytics that help companies in data-driven business decisions.
BI comprises the processes, technologies, strategies, and analytical
methods that transform the data or information into knowledge that
are critical to driving business decisions. These tools, methods, and tech-
nologies help companies in running business operations and making
fact-based data-driven decisions.
In 1989, Howard Dresner (later a Gartner analyst) proposed “business
intelligence” as an umbrella term to describe “concepts and methods to
improve business decision making by using fact-based support systems”
[10–12]. It was not until the late 1990s that this usage was widespread.
Figure 3.1 shows the broad areas the BI comprises.
Figure 3.1 Business intelligence and support systems
Applications of Business Intelligence

Business intelligence (BI) applications apply software for data visualiza-
tion, dashboards, infographics, and big data analysis software. These tools
fall under the descriptive analytics and statistical modeling.
BI also uses a broad range of applications, including statistical analysis

tools (descriptive, inferential statistics, and advanced statistical applications).
Tools of Business Intelligence

In addition to the above applications and reporting, BI uses the tools
of predictive and prescriptive analytics as well as advanced analytics tools
that include data mining applications. Data mining applications and the
growing applications of machine learning and artificial intelligence also
come under the broad category of BI. Big data analytics and a number of
optimization and advanced modeling that come under prescriptive ana-
lytics are also application areas of BI.
BI technologies are capable of handling large amounts of structured and
sometimes unstructured data. The tools of BI are designed for the analysis and
interpretation of massive amounts of data (big data) that help businesses iden-
tify potential problems, opportunities for improvement, and develop strategic
business plans. These are critical for the businesses to maintain a competitive
edge, long-term stability, and improve market share and profitability. BI tech-
nologies provide historical, current, and predictive views of business oper-
ations. Common functions and applications of BI technologies along with
different types of analytics in use are shown in Figure 3.2. These applications
are helpful in developing and creating new strategic business opportunities.
Figure 3.2 Functions and application areas of business intelligence (BI)

Business Intelligence Functions

and Applications Explained
The major areas and functions of BI in Figure 3.2 are explained below.
Reporting
Business reporting or enterprise reporting is “the reporting of opera-

tional and financial data by a business enterprise.” It is about providing
information to decision makers within an organization to aid in business
decisions.
Online Analytical Processing
Online analytical processing (OLAP) is a reporting tool and is a part of

the broader category of BI. Application of OLAP includes multidimen-
sional analytical (MDA) queries in computing and reporting based on
relational database and data mining.
The applications of OLAP also include business reporting for sales,
marketing, management reporting, business process management (BPM) [3],
budgeting, forecasting, financial reporting, and similar areas. The term
OLAP was created as a slight modification of the traditional database
term online transaction processing (OLTP).
Business Process Management
*Business process management (BPM) uses operations management

tools to manage the internal operations of an organization. It is about the
management of production and service systems of a company that uses
various tools and techniques to plan, model, analyze, improve, optimize,
control, and automate business processes.
The objective of operations management in a company is to convert in-
puts (raw materials, labor, and energy) into outputs (in the form of useful
products and/or services) using different types of transformation processes.
The tools of operations management are used in allocating the lim-
ited resources in the most effective way. In the services, operations man-
agement is used to manage and run the banking systems and hospitals,
to name a few. Operations management has wide applications in strat-

egy, management, supply chains, marketing, finance, human resources,
and production. Operations management has three broad categories—
planning, analysis, and control of business operations—and uses a num-
ber of tools in each of these phases. It uses forecasting, capacity planning,
aggregate planning, materials requirement planning (MRP), product
and process designs, strategy, and a number of analysis tools, including
scheduling, resource allocation, project management, quality and Lean
Six Sigma, and others.
The major objective of BPM is on improving business performance
by managing processes. A number of software tools and technologies are
used in BPM. As an approach, BPM views processes as important assets
of an organization that must be capable of delivering value-added prod-
ucts and services to clients or customers. This approach closely resembles
Lean Six Sigma, and total quality management methodologies—the ob-
jective of which are the removal of waste and defects from any process,
service, or manufacturing [13].
More Applications Areas of Analytics

Data Mining
Data mining involves exploring new patterns and relationships from the
collected data. Data mining is a part of predictive analytics. It involves
processing and analyzing huge amount of data to extract useful infor-
mation and patterns hidden in the data. The overall goal of data min-
ing is knowledge discovery from huge amounts of data businesses collect.
Data mining techniques are used in (i) extracting previously unknown
and potential useful knowledge or patterns from massive amount of data
collected and stored and (ii) exploring and analyzing these large quanti-
ties of data to discover meaningful pattern and transforming data into
an understandable structure for further use. The field of data mining is
a rapidly growing and statistics plays a major role in it. Data mining is
also known as knowledge discovery in databases (KDD), pattern analysis,
information harvesting, BI, BA, and so on. Besides statistics, data min-
ing uses artificial intelligence, machine learning, database systems and
advanced statistical tools, and pattern recognition.
Process Mining
Process mining is a process management [14] technique used to ana-

lyze business processes. The purpose is to create a better understanding
of the processes so that improvement efforts can be directed to improve
the process efficiency. Process mining is also known as automated business
process discovery (ABPD).
Business Performance Management [14]
Business performance management is the management of an organiza-

tion’s performance and comes under the broad category of business pro-
cess management (BPM). It consists of the activities that ensure the set
goals of the company are consistently met in an effective and efficient
manner. The term is synonymously used with corporate performance
management (CPM) and enterprise performance management.
The major activities of business performance management are:
(1) identification and selection of an organization’s goals, (2) a measure-
ment process to measure an organization’s progress against the goals, and
(3) comparing the actual performance to the set goals and taking correc-
tive actions to improve the future performance.
Business performance management activities involve the collection
and reporting of large volumes of data that require the use of BI software
to assist in this process. It is important to note that the business perfor-
mance management does not necessarily rely on software systems. It is
often a misconception that BPM is a software-dependent system and BI
software is a definitive approach to business performance management.
Text Mining [16]
Text mining is also referred to as text data mining. It is somewhat similar

to the text analytics, which is the process of deriving high-quality infor-
mation from text. This high-quality information is typically derived using
patterns and trends using means such as statistical pattern learning.
Text mining usually involves the process of structuring the input text
(usually parsing, along with the addition of some derived linguistic fea-
tures and the removal of others, and subsequent insertion into a database),
deriving patterns within the structured data, and finally evaluation and
interpretation of the output. “High quality” in text mining usually refers
to some combination of relevance (how well a retrieved document or set
of documents meets the information need of the user).
Typical text mining tasks include text categorization, text clustering [1],
concept/entity extraction, production of granular taxonomies, sentiment
analysis, document summarization, and entity relation modeling (i.e.,
learning relations between named entities).
Text analysis involves information retrieval, lexical analysis to study
word frequency distributions, pattern recognition, information extraction,
data mining techniques including link and association analysis, visualiza-
tion, and predictive analytics. The overall goal is to transform text into data
for analysis using natural language processing (NLP) [2] and analytical
methods.
A typical application is to scan a set of documents written in a natural
language. It is also known as ordinary language—any language that has
evolved naturally in humans through use and repetition without con-
scious planning or premeditation. Natural languages can take different
forms, such as speech or signing (sign language). They are distinguished
from constructed and formal languages such as those used to program
computers or to study logic [17].
Text Analytics
The term text analytics describes a set of linguistic applications (the sci-
entific [1] study of languages and involves an analysis of language). It uses
statistical and machine learning techniques to model and structure the
information content of textual sources. The term is synonymous with text
mining. Ronen Feldman modified a 2000 description of “text mining” [4]
in 2004 to describe “text analytics” [5]. The latter term is now used more
frequently in business settings.
The term text analytics also describes the application of text analyt-
ics to respond to business problems, whether independently or in con-
junction with query and analysis of fielded, numerical data. In general,
approximately 80 percent of business-relevant information originates in
unstructured form, primarily text [7]. The techniques of text analytics
processes, discover, and present knowledge—facts, business rules, and

relationships—that are otherwise locked in textual form [16].
Purpose of Analytics [18]

Analytics, as discussed in Chapter 2, is the use of mathematical and sta-
tistical techniques, including modeling, computer programming, soft-
ware applications, and operations research to gain insight from the data.
Organizations may apply analytics to business data to describe, predict,
and improve business performance.
Analytics as Applied to Different Areas

Figure 3.3 shows different types of analytics as applied to different appli-
cations. Analytics is known by different names and it has different forms
Figure 3.3 Types of analytics

when applied to different areas. The types of analytics based on applica-

tions may be supply chain analytics, retail analytics, optimization based
(prescriptive analytics), marketing analytics involving marketing optimi-
zation and marketing mix modeling, *web analytics, and sales analytics for
sales force sizing and optimization. Analytics require extensive computa-
tion using software applications. The algorithms and software used for an-
alytics use methods in computer science, statistics, and mathematics [1].
Web analytics is one of the commonly used analytics related to web data.
It frequently appears in the analytics literature and applications. We de-
scribe it briefly here.
*Web analytics is the collection, analysis, and reporting of web data
for purposes of understanding and optimizing web usage [1]. However,
one of the major applications of web analytics is measuring web traffic. It
has a number of applications in almost every field now, including market-
ing, sales, finance, and advertising, and is a major tool for business and
market research. It is used to estimate how traffic to a website changes
when changes in the advertising campaign for a company occur or how
the web traffic compares with other businesses in the same class. One of
the common and very useful applications of web analytics is keeping track
of the number of visitors to a website, the number of pages, and the time
spend by the visitors. It also helps monitor and manage traffic flow and
popularity, customer behavior, and trends, which are useful for making
future predictions. It is now one of the major tools for market research.
Web analytics is also used to manage, control, analyze, and improve web-
sites using a wide variety of applications and analytical tools. A number of web
and business applications using machine learning and artificial intelligence
are now being reported. The recent research areas in web analytics include
artificial neural networks and deep learning. These tools are applied exten-
sively to study, improve, and predict future business outcomes. Web analytics
applications are also used to help companies measure the results of traditional
print or broadcast advertising campaigns. Web analytics applications help one
to estimate how traffic to a website changes after the launch of a new prod-
uct. Web analytics is not limited to the applications mentioned above. New
applications and tools are emerging as a result of research and development.
The newer applications of machine learning, artificial intelligence, and neural
networks are emerging areas of research and are finding new applications.
Steps of Web Analytics: Most web analytics processes can be divided

into four essential stages or steps [2], which are as follows:
• Collection of data: This stage is the collection of the basic, elementary

data. The objective of this stage is to gather the data, which is com-
plex and takes many forms depending on the types of data collected.
• Processing of data into information. Before the data can be of any
use, it must be processed.
• Developing key performance indicators: This stage focuses on
using the ratios (and counts) and infusing them with business
strategies, referred to as key performance indicators (KPIs). Many
times, KPIs deal with conversion aspects, but not always. It de-
pends on the organization.
• Formulating online strategy: This stage is concerned with the on-
line goals, objectives, and standards for the organization or busi-
ness. These strategies are usually related to making money, saving
money, or increasing market share.
Advanced Analytics [19]

Advanced analytics is a broad category of inquiry that is used for forecast-
ing future trends using simulation that are helpful in conducting what-if
analyses (simulation or risk analysis) to see the effects of potential
changes in business strategies. Simulation and risk analysis are very use-
ful decision-making tools under risk and uncertainty. They help drive
changes and improvements in business practices.
The analytical categories that fall under advanced analytics: Predic-
tive analytics, data mining, big data analytics, and machine learning appli-
cations are just some of the analytical categories that fall under the heading
of advanced analytics. These technologies are widely used in industries,
including marketing, health care, risk management, and economics.
Business Intelligence Programs in Companies

BI is most effective when it combines data derived from the market
in which a company operates (external data) with data from company
sources internal to the business such as financial and operations data (in-
ternal data). When combined, external and internal data can provide a
more complete picture, which in effect, creates an “intelligence” that can-
not be derived by any singular set of data [3].
BI along with BA empower organizations to gain a better under-
standing of the existing markets and customer behavior. The tools of
BI are being used to study the markets, analyze massive amounts of
data to learn about customer behavior, conduct risk analysis, assess
demand and suitability of products and services for different m arket
segments, and predict and optimize business processes to name a
few [10–12].
Specific Areas of Business Intelligence Applications

in an Enterprise
BI can be applied to the following business purposes in order to drive
business decisions and value.
BI applications are applied to:
1. Performance measurement or measurement of performance

metrics—program that creates a hierarchy of performance metrics
and benchmarking that informs business leaders about progress to-
ward business goals (business process management).
Metrics are the variables whose measured values are related to the
performance of the organization. They are also known as the per-
formance metrics because they are performance indicator.
Metrics may be finance based that focus on the performance of
the organization. Some metrics are designed to measure require-
ments and value (customer wants and needs, what the customer
wants, satisfaction level, etc.). In manufacturing, metrics are de-
signed to measure the quality level and other key performance. In
quality and Six Sigma projects, one of the major metrics is defects
per million opportunities (DPMOs), percent yield, first-pass yield,
and sigma level, which are an indicator of quality level and parts per
million defects. In project management, including Six Sigma proj-
ects, a number of metrics are measured that help plan, analyze, and
control the projects. These metrics may be indicators of time, cost,

resources, overall quality, and performance. In technical call centers,
metrics must be designed to measure the internal as well the external
performance. Some of the metrics to consider may be call waiting
time, dropped call, call routing time, quality of the service, service
level, average time to resolve calls, and so on.
Defining the critical performance metrics is important. This
must include establishing the customer requirements, identifying
quantifiable process outputs and their measurement plan, and estab-
lishing targets against which the measured metrics will be compared.
2. Analytics—The types of analytics used are the core of BI program.
These are qualitative, visualization, and quantitative models used to
drive business decisions and finding the optimal decisions through
knowledge discovery, and modeling. We explained the process, pur-
pose, and tools of analytics earlier. They involve visualization, data
mining, process mining, statistical analysis, predictive analytics, pre-
dictive modeling including machine learning and artificial intelligence
applications, business process modeling, and prescriptive analytics.
3. Enterprise reporting/business reporting—a critical part of any
organization. These are means and programs that provide infrastruc-
ture for strategic reporting to the management regarding the busi-
ness. These reporting tools include data visualization, management
information system, OLAP, and others.
4. Collaboration/collaboration platform—program that connects
both inside and outside of the company to exchange and use data
through data sharing and electronic data interchange.
5. Knowledge management—program to make the company
data-driven using strategies and practices that identify, create, dis-
tribute, and manage the knowledge through learning management.
In addition to the above, BI can be designed to provide alert system

and functionality with a capability to immediately warn the end-user if
certain requirements are not met. For example, if certain critical business
metric exceeds a predefined threshold, a warning may be issued via e-mail
or another monitoring service to alert the responsible person so that a
corrective action may be taken. This is similar to automating the system
to better manage and take timely actions.
Success Factors for Business Intelligence

Implementation
According to Kimball et al., there are three critical areas that organiza-
tions should assess before getting ready to do a BI project [21]:
1. The level of commitment and sponsorship of the project from senior

management
2. The level of business need for creating a BI implementation
3. The amount and quality of business data available
Comparing Business Intelligence

with Business Analytics
BI and BA are sometimes used interchangeably, but there are alternate
definitions [14]. One definition contrasts the two, stating that the term
business intelligence refers to collecting business data to find information
primarily through asking questions, reporting, and online analytical pro-
cesses. BA, on the other hand, uses statistical and quantitative tools for
explanatory and predictive modeling [15].
Viewed in this context, BA is a powerful and complex field that in-
corporates wide application areas including statistical analysis, predictive
analytics, text and speech analytics, web analytics, visualization, causal
analysis, decision processes, and much more.
Where Does the Business Analytics Fit in the Scope

of Business Intelligence?
So what distinguishes BA from BI? Analytics is the science of analysis—
the processes by which we interpret data, draw conclusions, and make
decisions. BA goes well beyond simply presenting data and creating visu-
als, crunching numbers, and computing statistics. The essence of analyt-
ics lies in the application—making sense from the data using prescribed
methods, tools, and logic to draw meaningful conclusion from the data.
It uses logic, intelligence and mental processes that enable us to reason,
organize, plan, analyze, solve problems, understand, innovate, learn, and
make decisions.
BA covers a vast area. It is a complex field that encompasses visual-

ization, statistics and statistical modeling, statistical analysis, predictive
analytics, text and speech analytics, web analytics, decision processes, and
much more.
BI combines a broad set of data analysis applications as mentioned
above. It includes applications including querying, enterprise reporting,
online analytical processing (OLAP), mobile BI, real-time BI, operational
BI, and cloud applications. BI technology also includes data visualiza-
tion software for designing charts and other infographics, as well as tools
for building BI dashboards and performance scorecards that display vi-
sualized data on business metrics and key performance indicators in an
easy-to-grasp way. BI applications can be bought separately from different
vendors or as part of a unified BI platform from a single vendor.
BI programs can also incorporate forms of advanced analytics, such
as data mining, predictive analytics, text mining, statistical analysis, and
big data analytics. In many cases although advanced analytics projects are
conducted and managed by separate teams of data scientists, statisticians,
predictive modelers, and other skilled analytics professionals, BI teams
oversee more straightforward querying and analysis of business data.
Difference between Business Analytics

and Business Intelligence [21]
Business intelligence (BI) and business analytics (BA) have somewhat
similar goals. Both of these programs are about increasing the efficiency
of the business by utilizing data analysis. The purpose of both BA and BI
is to drive the business decisions based on data. They exist to increase the
efficiency and viability of a business and are data-driven decision-making
approaches. Most businesses use the terms business intelligence and busi-
ness analytics interchangeably. If you really want to understand where
people draw the line, there are different opinions.
There is no real consensus on exactly what constitutes BI and what
constitutes BA, or where the lines are drawn. However, it seems logical to
differentiate the two in the following way. Table 3.1 outlines the differ-
ence between BI and BA.
Table 3.1 Difference between business intelligence (BI) and business

analytics (BA)
Business Intelligence (BI) Business Analytics (BA)
BI is about accessing and analyzing big data BA is about predicting future trends using
using specialized big data software and data analysis, statistics, statistical, and
infrastructure using powerful BI software quantitative modeling and analysis.
specially designed to handle big data.
BI is more concerned with the whats and the BA tries to answer the following questions:
hows than the whys. Through data explora- Why did something happen?
tion it tries to answer the following questions: Will it happen again?
What happened? What will happen if we make changes to
When? some of the inputs?
Who? What the data is telling us that we were
How? not able to see before?
BI looks into past or historical data to bet- BA uses statistical analysis and predictive
ter understand the business performance. It modeling to establish trends, figuring out
is about improving performance and creat- why things are happening, and making an
ing new strategic opportunities for growth. educated guess about how things will turn
out in the future.
Through data analysis, BI determines what BA primarily predicts what will happen
has already occurred This insight is very in the future. It combines advanced sta-
helpful in process improvement efforts. tistical analysis and predictive modeling
For example, through this analysis, you to give you an idea of what to expect so
clearly see what is going well, but also learn that you can anticipate developments or
to recover from what went wrong. make changes now to improve outcomes.
Tools of BI: BA is more about anticipated future
BI lets you apply chosen metrics to poten- trends of the key performance indicators.
tially huge, structured and unstructured This is about using the past data and
datasets, and covers: querying, reporting, models to make predictions.
online analytical processing (OLAP), ana- This is different from the reporting in
lytics, data mining, process mining, business BI. Analytics models use the data with a
performance management, benchmarking, view to drawing out new, useful insights
text mining, predictive analytics, and pre- to improve business planning and
scriptive analytics. boost future performance. BA helps the
BI tells you what happened, or what is company to meet the changes and then
happening right now in your business—it de- take advantage of coming developments.
scribes the situation to you. Not only that, BA can be seen as a part of BI.
a good BI platform describes this to you Tools of BA:
in real time in as much granular, forensic Statistics/statistical modeling
detail you need.
Quantitative analysis
BI is the “descriptive” part of data analysis,
Data mining
whereas, BA means BI plus the predictive
element and all the extra bits and pieces Predictive modeling/analytics
that make up the way you handle, interpret, Text analytics
and visualize data. Other types of analytics
Prescriptive analytics and tools
Is there really a difference between business analytics and business intel-

ligence? The answer is Yes and No.
Both BI and BA have similar objectives. They are data-driven
decision-making programs. BI uses masses of raw data to learn about
what is happening to the business. If you want to learn about the cur-
rent state of the business through data analysis, reporting, application of
big data software, descriptive statistics, and data dashboards, and you are
drawing conclusions and interpretation without extensively using predic-
tive modeling tools, you will likely fall under BI, rather than BA.
Both approaches are valuable and critical to decision making in their
own ways. It is important to know and understand whether your objective
is descriptive analytics or you want to predict and optimize business out-
comes. In the latter case, you will need predictive analysis and prescriptive
analytics. This understanding is critical before investing in such programs.
To summarize, business intelligence (BI) is the “descriptive” part of data
analysis, whereas business analytics (BA) means BI plus the predictive and
prescriptive elements, plus all the visualization tools and extra bits and pieces
that make up the way you handle, interpret, visualize, and analyze data.
And lastly, there are those who say that there is hardly any distinction
between the two. It does not matter what you call them as long as you can
reach your goals. There is no genuine difference between the two—or, if there
is, it’s not worth paying attention to. It seems like they do differ in the tools
they use and how you use them to meet your objectives.
The tools of BA and BI depend upon whether your data requirements are
geared more toward descriptive or predictive analytics to direct your business
in the right direction—regardless of the terminology behind the tool [22].
The potential benefits of BI and BA programs include accelerating and
improving decision making, optimizing internal business processes, increas-
ing operational efficiency, driving new revenues, and gaining competitive
advantages over business rivals. BI systems can also help companies identify
market trends and spot business problems that need to be addressed.
Summary
This chapter discussed analytics, business analytics (BA), data analyt-
ics (DA), and business intelligence (BI) as decision-making tools in
businesses today. The connection between BI and BA tools combined

with the applications in the computer science and information technol-
ogy are critical for the success of BA in a company. The chapter discussed
the concepts of BI, BA, DA, and computer and information technology
and how they are used in business decision making. The chapter also
distinguishes between BI and BA and the specific tools used in each case.
BI comprises the processes, technologies, strategies, and analytical
methods that transform the data or information into knowledge that
are critical to driving business decisions. These tools, methods, and tech-
nologies help companies in running business operations and making
fact-based data-driven decisions.
The overall objectives of the BI and BA programs are similar. Compa-
nies are adopting these programs because these are critical in driving busi-
ness decisions based on the current data. In a broad sense, BI incorporates
the tools of BA, including the descriptive, predictive, and prescriptive
analytics that help companies in data-driven business decisions.
BI looks into past or historical data to better understand the busi-
ness performance. It is about improving performance and creating new
strategic opportunities for growth. BA is more about anticipated future
trends of the key performance indicators. This is about using the past
data and models to describe the data and make predictions This is differ-
ent from the mere reporting in BI. Analytics models use the data with a
view to drawing out new, useful insights to improve business planning
and boost future performance. BA primarily predicts what will happen
in the future. It combines advanced statistical analysis and predictive and
prescriptive modeling to give you an idea of what to expect so that you
can anticipate developments or make changes now to improve outcomes.
The specific tools of BA are all types of regression models, data mining,
machine learning, and, more recently, neural networks and deep learning.
Glossary of Terms Related to Business Intelligence

Business Intelligence Dashboard. A business intelligence dashboard is a data
visualization tool that displays the current status of metrics and key performance
indicators (KPIs) for an enterprise. Dashboards consolidate and arrange num-
bers, metrics, and several of views of key business activities on one display. It also
can display performance scorecards on a single screen. The essential features of
a BI dashboard product include a customizable interface and the ability to pull

real-time data from multiple sources.
Metric. A metric is the measurement of a particular characteristic of a company’s
performance or efficiency. Metrics are the variables whose measured values are
tied to the performance of the organization. They are also known as the perform-
ance metrics because they are performance indicators.
Metrics may be finance based that focus on the performance of the organiza-
tion. Some metrics are designed to measure requirements and value (customer
wants and needs, what the customer wants, satisfaction level, etc.).
Key Performance Indicators [23]. Key performance indicators (KPIs) are busi-
ness metrics (measured key variables which are indicative of the performance of a
company). These measurements are used by corporate executives and managers
to track, analyze, and understand the factors that are critical to the success of an
organization. Effective KPIs focus on the business processes and functions that
senior management sees as most important for measuring progress toward meet-
ing strategic goals and performance targets. The metrics provide opportunities for
future improvement.
Data Warehouse. A data warehouse is a repository for all the data that an enter-
prise collects from internal and external sources. It may contain data of different
types. The data are readily used for creating analytical and visual reports through-
out the enterprise. Besides creating the reports, the stored data are used to model
and perform analytics for the different operations in an enterprise, including
sales, finance, marketing, engineering, and others. Before performing analyses on
the data, cleansing, transformation, and data quality are critical issues. Typically,
a data warehouse is housed on an enterprise mainframe server or increasingly
in the cloud. The term data warehouse was coined by William H. Inmon, who
is known as the father of data warehousing. He described a data warehouse as
being a subject-oriented, integrated, time-variant, and nonvolatile collection of
data that supports management’s decision-making process.
Data Cleansing. Data cleansing or data cleaning is the process of detecting and
correcting (or removing) corrupt or inaccurate records from a record set, table, or
database and refers to identifying incomplete, incorrect, inaccurate, or irrelevant
data and then replacing, modifying, or deleting the corrupt data [1].
Data Transformation. In computing, data transformation is the process of con-
verting data from one format or structure into another format or structure. It is a
fundamental aspect of most data integration and data management tasks, such as
data warehousing*, data integration, and application integration. Data trans-
formation can be simple or complex based on the required changes to the data
between the source (initial) data and the target (final) data.
Data Quality. Data quality is about assuring and making the data ready for
analysis. The data must meet certain criteria before they can be used for analysis.
Business Intelligence (BI). Business intelligence looks into past or historical
data to better understand the business performance. It is about improving per-
formance and creating new strategic opportunities for growth.
Business Analytics (BA). Business analytics is about predicting future trends
using data analysis, statistical, and quantitative modeling and analysis.
CHAPTER 4
Descriptive Analytics—
Overview, Applications,
and a Case
Chapter Highlights
• Overview: Descriptive Analytics
• Descriptive Analytics—Applications—A Business
Analytics Case
• Case Study: Buying Pattern of Online Customers in a Large
Department Store
• Summary
Overview: Descriptive Analytics

Descriptive analytics tools are used to understand the occurrence of cer-
tain business phenomenon or outcomes and explaining these outcomes
through graphical, quantitative, and numerical analyses. Through the
visual and simple analysis, descriptive analytics explores the current per-
formance of the business and the possible reasons for the occurrence of
certain phenomenon. Many of the hidden patterns and features not ap-
parent through mere examination of data can be exposed through graphi-
cal and numerical analyses. Descriptive analytics uses simple tools to
uncover many of the problems quickly and easily. The results enable us
question many of the outcomes so that corrective actions can be taken.
Successful use and implementation of descriptive analytics require
the understanding of data, types and sources of data, data preparation
for analysis (data cleansing, transformation, and modeling), difference

between unstructured and structured data, and data quality. Graphical/
visual representation of data and graphical techniques using computer are
basic requirements of descriptive analytics. These concepts related to data,
data types, and the graphical and visual techniques are explained in detail
in this chapter. The visual techniques of descriptive analytics tools include
the commonly used graphs and charts along with some newly developed
graphical tools such as bullet graphs, tree maps, and data dashboards.
Dashboards are now becoming very popular with big data. They are used
to display the multiple views of business data graphically.
The other aspect of descriptive analytics is an understanding of simple
numerical methods, including the measures of central tendency, meas-
ures of position, measures of variation, and measures of shape, and how
different measures and statistics are used to draw conclusions and make
decision from the data. Some other topics of interest are the understand-
ing of empirical rule and the relationship between two variables—the
covariance and correlation coefficient. The tools of descriptive analytics
are helpful in understanding the data, identifying the trend or patterns in
the data, and making sense from the data contained in the databases of
companies. The understanding of databases, data warehouse, web search
and query, and big data concepts are important in extracting and apply-
ing descriptive analytics tools. The flow chart in Figure 4.1 outlines the
tools and methods used in descriptive analytics.
Descriptive Analytics Applications: A Business

Analytics Case
A case analysis showing different aspects of descriptive analytics is presented
here. The case demonstrates the graphical and numerical analyses per-
formed in an online order database of a retail store and is described below.
Case Study: Buying Pattern of Online Customers in a

Large Department Store
The data file “Case-Online Orders.xlsx” contains data on 500 customer
orders. The data were collected over a period of several days from the
Figure 4.1 Tools and methods of descriptive analytics
59
customers placing orders online. As the orders are placed, customer infor-
mation is recorded in the database. Data on several categorical and num-
erical values are recorded. The categorical variables shown in the data file
are day of the week, time (morning, midday), payment type (credit, debit
cards, etc.), region of the country order was placed from, order volume,
sale or promotion item, free shipping offer, gender, and customer survey
rating. The quantitative variables include order quantity and the dollar
value of the order placed or “Total Orders.” Table 4.1 shows the part of
the data.
The operations manager of the store wants to understand the buying
pattern of the customers by summarizing and displaying the data visu-
ally and numerically. He believes that using the descriptive analytics tools
including the data visualization tools, numerical methods, graphical dis-
plays, dashboards, and tables of collected data can be created to gain more
insight into the online order process. They will also provide opportunities
for improving the process.
The manager hired an intern and gave her the responsibility to pre-
pare a descriptive analytics summary of the customer data using graphical
and numerical tools that can help understand the buying pattern of the
customers and help improve the online order process to attract more on-
line customers to the store.
The intern was familiar with one of the tools available in EXCEL—
the Pivot Table/Pivot Chart that she thought can be used in extracting
information from a large database. In this case, the pivot tables can help
break the data down by categories so that useful insight can be obtained.
For example, this tool can create a table of orders received by the geo-
graphical region or summarize the orders by the day or time of the week.
She performed analyses on the data to answer the questions and concerns
the manager expressed in the meeting. As part of the analysis, the follow-
ing graphs, tables, and numerical analyses were performed.
1. A pivot table, a bar chart, and a pie chart of the pivot table providing
a summary of number of orders received on each day of the week were
created to visually see the orders received by the online department
on each day (Figures 4.2 and 4.3). The table and graphs show that the
maximum number of orders were received on Saturday and Sunday.
Table 4.1 Partial data online orders
Buying Pattern of Online Customer in a Large Department Store
Payment Order Order Sale/ Free Total Customer Survey
Day Time Type Region Volume Quality Promotion Shipping Order Gender Rating
Mon Morning Visa North High 6 1 Yes 194.12 Male Good
Mon Morning Visa North Low 2 1 No 40.38 Male Good
Mon Morning Visa North High 7 1 Yes 270.87 Female Fair
Mon Morning Visa North Medium 4 0 No 186.88 Male Excellent
Tues Morning Visa North High 6 0 Yes 279.52 Female Good
Tues Morning Visa North High 2 1 Yes 220.30 Female Fair
Tues Morning Visa North High 7 1 No 279.57 Female Excellent
Tues Morning Visa North Medium 5 0 Yes 160.70 Male Poor
Tues Midday Visa North Medium 4 1 Yes 184.96 Male Good
Tues Midday Visa North High 8 1 Yes 205.39 Male Good
Wed Midday MasterCard North High 7 1 Yes 272.88 Male Excellent
Mon Midday Store Card North Medium 5 1 Yes 191.83 Male Excellent
Mon Midday MasterCard North High 7 1 Yes 288.94 Male Excellent
Tues Midday Store Card North High 3 0 Yes 270.75 Male Fair
Tues Midday MasterCard North High 8 0 Yes 275.27 Male Poor
Wed Afternoon Store Card North Medium 4 1 Yes 174.58 Male Good
Wed Afternoon MasterCard South Medium 4 0 Yes 152.30 Male Good
Thurs Afternoon Store Card South Medium 5 1 No 172.39 Male Fair
Wed Afternoon MasterCard South High 7 1 Yes 215.69 Male Excellent
Wed Afternoon Store Card South Low 3 0 No 80.89 Male Excellent
Thurs Afternoon MasterCard South High 8 0 Yes 184.19 Male Good
Fri Afternoon Store Card South Medium 4 1 Yes 181.28 Male Good
Fri Afternoon MasterCard South Medium 4 1 Yes 158.96 Male Poor
61
Fri Afternoon Store Card South Medium 4 1 Yes 198.28 Male Poor

Figure 4.2 Number of orders by day
Figure 4.3 Number and percent of orders
2. Table 4.2 and Figure 4.4 show the count of number of orders by the
time of the day (morning, midday, etc.). A bar chart and a pie chart of
the pivot table were created to visually see the orders received online
by the time of day. The pie chart shows both the numbers and the
percent for each category. The table and the pie chart indicate that
more orders are placed during night hours.
3. Orders by the region: The bar chart and the pie chart (Figures 4.5 and
4.6) summarize the number of orders by the region. These plots show
that the maximum orders were received from the North and South
regions. Marketing efforts are needed to target the other regions.
Descriptive Analytics—Overview, Applications, and a Case 63
Table 4.2 Number of orders by time

Row Labels Count of Time
Afternoon 112
Evening 65
Late afternoon 20
Midday 92
Morning 33
Night 178
Grand total 500
Figure 4.4 Plot of number of orders by time
Figure 4.5 Number of orders by region

Figure 4.6 Percent of orders by region
4. A pivot table (Table 4.3) and a bar graph (Figure 4.7) were created
to summarize the customer rating by gender where the row labels
show “Gender” and the column labels show the count of “Customer
Survey Ratings” (excellent, good, fair, poor). A bar chart of the count
of “Customer Survey Ratings” (excellent, good, fair, poor) on the
y-axis and gender on the x-axis is shown below the table. This infor-
mation provided the customer opinion and was important to view
and improve the process.
Table 4.3 Customer ratings

Count of Customer
Survey Rating Column Labels
Grand
Row Labels Excellent Fair Good Poor Total
Female 25 48 45 38 156
Male 89 62 110 83 344
Grand total 114 110 155 121 500
5. The descriptive statistics of the “total orders ($)” was calculated and
displayed in Table 4.4 and the plot below. The statistics show the
measures of central tendency and the measures of variation along
with other useful statistics of the total orders.
Figure 4.7 Customer ratings by gender
Table 4.4 Descriptive statistics of total orders

Descriptive Statistics: Total Order ($)
Standard Deviation
Maximum
Minimum
SE Mean
Variable
Median
Mean
Q1
Q3
N*
N
Total 500 0 223.87 3.77 84.23 30.09 167.95 252.62 287.54 371.40
order
($)
6. From the calculated statistics in part (5), it seems appropriate to con-

clude that the total orders data are left skewed so that Chebyshev’s rule
can be applied. This rule applies to any distribution, symmetrical or
skewed, and relates the mean and standard deviation to provide more in-
sight. This rule is too general and does not provide definite conclusions.
More definite conclusions can be drawn using the other widely used
rule known as the empirical rule that applies to symmetrical or normal
distribution. This rule also provides a relationship between the mean and
standard deviation of the data and provides a more definite conclusion.
7. If the total orders data can be assumed to be approximately symmetri-
cal, what conclusions can we draw about the “total orders” (Figure 4.8)
received? Use the mean and standard deviation calculated in part (5).
66
Figure 4.8 Graphical summary of the total orders data
A symmetrical or bell-shaped data that is characterized by a normal

distribution is often used to draw conclusion by combining the
mean and standard deviation. If the data can be approximated by
a normal distribution, an empirical rule applies. For our case data,
if we can assume that the “total orders” data is approximately sym-
metrical, we can draw the following conclusions relating the mean
and standard deviation of the “total orders” data that were calculated
in part (5).
Conclusions using Empirical Rule are shown in Table 4.5.
The mean and standard deviation of total orders are:
Variable: Total Orders ($): Mean: x = 223.87

Standard Deviation: s = 84.23
Table 4.5 Conclusions using the empirical rule

Approximately 68 percent of the orders are between the mean and ± 1 standard
deviation, or, x – 1s = (223.87 – 1(84.23)) = (139.64 – 308.10) or between $139.64 to
$308.10
Approximately 95 percent of the orders are between the mean and ± 2 standard
deviation, or, x – 2s = (223.87 – 2(84.23)) = (55.41.64 – 392.33) or between $55.41 to
$392.33
Approximately 99.7 percent of the orders are between the mean and ± 3 standard
deviation, or, x – 3s = (223.87 – 3(84.23)) = (–28.82 – 476.56) or between $0 to
$476.56
8. A dashboard (shown in Figure 4.9) provides several views of the on-

line orders data on one plot. The dashboard below shows several
plots, including order map showing the business activities in differ-
ent regions of the country, sales by months, percent of orders by time,
and total orders by region. The plots are self-explanatory and provide
useful information that provide opportunities for improvement.
The graphical and numerical analyses performed on the online order
data provide meaning and insight that is not apparent just by look-
ing into the data. The analyses performed here are some examples of
descriptive analytics.
The type of analytics that goes beyond descriptive analytics is predic-

tive analytics. Applying the tools of descriptive analytics enables one to
68
Figure 4.9 A dashboard of online orders data
gain insight and learn from the data. These tools help to understand what
has happened in the past and is very helpful in predicting future business
outcome. Predictive analytics tools help answer these questions. The rest
of the book explores predictive analytics tools and applications.
Summary
In this chapter, we provided a brief description of descriptive analytics
and a case to illustrate the tools and applications of visual techniques used
in descriptive analytics. The descriptive analytics is critical in studying the
current state of the business and to learn what has happened in the past
using the company’s data. The knowledge from the descriptive analytics
lays a foundation for further analysis and leads to predictive analytics. As
mentioned, the knowledge obtained by descriptive analytics helps us to
learn what has happened in the past. This information is used to create
predictive analytics models.
The subsequent chapters discuss the predictive analytics and back-
ground information needed for predictive analytics along with the ana-
lytical tools. Specific predictive analytics models and their applications
are the topics of chapters that follow The rest of this book covers mostly
predictive analytics.
CHAPTER 5
Descriptive versus Predictive

Analytics
Chapter Highlights
• What Is Predictive Analytics and How Is It Different from
Descriptive Analytics?
• Exploring the Relationships between the Variables—Qualitative
Tools
• An Example of Logic-Driven Model—Cause-and-Effect
Diagram
• Data-Driven Predictive Models and Their Applications—
Quantitative Models
• Prerequisites and Background for Predictive Analytics
• Summary
What Is Predictive Analytics and How Is It Different

from Descriptive Analytics?
The case on descriptive analytics discussed in Chapter 4 explained how
the data could be explored to reveal information. One of the best ways
to learn from the data is to use visual techniques to create charts, graphs,
and numerical summary. In case of big data, the dashboards are excellent
tools to obtain relevant information from data. Data exploration provides
invaluable information for future planning and setting goals for advance-
ment of companies. The descriptive analytics is the process of exploring
and explaining data to reveal trends and patterns and to obtain information
not apparent otherwise. The objective is to obtain useful information that can
help organizations achieve its goals. Predictive analytics is about identifying
future business trends, creating, and describing predictive models to explore
the trends and relationships. The descriptive analytics tools are useful in visu-
alizing some of the trends and relationships among the variables, predictive
analytics provides information on what types of predictive models can be used
to predict the future business outcomes.
Exploring the Relationships between

the Variables—Qualitative Tools
Before developing the data-driven models for prediction, it is import-
ant to establish and understand the relationships between variables. A
number of qualitative or graphical tools—commonly known as the qual-
ity tools—can be used to study the relationships between the factors or
independent variables and the response or dependent variable(s). Some
of the tools used for this purpose are cause-and-effect diagrams, influence
diagrams, and others. These are visual tools and are an easy way to gain a
first look at the data. These are known as logic-driven models and are very
helpful before a formal data-driven predictive model can be developed.
Here we describe some of these tools.
An Example of Logic-Driven Model—

Cause-and-Effect Diagram
Figure 5.1 shows a cause-and-effect diagram related to the production of
a product. The probable reasons for each of the causes for different cat-
egory are shown in Figure 5.1. This diagram is an excellent brainstorming
tool to study the possible causes of a problem and finding their solutions.
Data-Driven Predictive Models and Their

Applications—Quantitative Models
The other types of models in predictive analytics are data-driven models.
Predictive analytics uses advanced statistical tools, including a number
Figure 5.1 Logic-driven model of predictive analytics
73

of regression and forecasting models, data mining and machine learning,

as well as information technology, and operations research methods to
identify key independent variables and factors to build predictive models
that are used to predict future business behavior using the independent
variables, also known as predictors. Many of the trends and relationships
among key factors obtained from descriptive analytics can be used to
build predictive models to forecast future trends.
Prerequisites and Background for Predictive Analytics

The data-driven models in predictive analytics use a number of statis-
tical and analytical models. Here we provide a list of commonly used
predictive models and their applications. These models are prerequisites
to the application of predictive modeling necessary to apply the models.
Figure 5.2 briefly explain these predictive analytics models and tools. We
will introduce each of these topics and their applications in detail in sub-
sequent chapters. It is important to note that each of the topics is studied
as a separate chapter in an analytics course. A brief description of the top-
ics along with possible applications is explained in Table 5.1.
Figure 5.2 Prerequisite and models for predictive analytics
Table 5.1 outlines the statistical tools, their brief description, and ap-
plication areas of predictive analytics models.
The next chapter discusses the details of the above data-driven predic-
tive models with applications.
Table 5.1 Statistical models and prerequisites for predictive modeling
Statistical Tools and Models Brief Description Application Areas
Probability concepts
One of the main reasons for applying statistics in Probability is used to answer the questions in the
analytics is that statistics allows us to draw conclu- following situations:
sions using limited data, that is, we can draw con- What is the probability that the Reserve Bank will
clusions about the population using sample data. start raising the interest rate soon?
The process of making inferences about the popula- What is the probability that the Dow Jones stock
tion using the sample involves uncertainty. index will go up by 15% by the end of this year?
Probability is used in situations where uncertainty What is the probability of me winning the Power-
exits; it is the study of random phenomenon or ball lottery?
events. A random event is an event in which the What is the probability that a customer will default
outcome cannot be predicted in advance. on a loan and is a potential risk?
Probability can tell us the likelihood of the random
event(s).
In the decision-making process, uncertainty almost
always exists. One question that is of usual concern
when making decisions under uncertainty is the
probability of success of the outcome.
(Continued)
75

Table 5.1 (Continued)
76
Probability distributions
Discrete and continuous probability distributions Although probabilities are the way of dealing with Computer simulation is often used to study the
Most processes produce random outcomes that can uncertainty and the rules of probability provide behavior of a call center or a drive through of
be described using a random variable. us with a way of dealing with uncertainties, the fast-food restaurants. In such applications, the ar-
A random variable can be discrete or continuous. concept and understanding of probability distribu- rivals of calls in a call center and customer arrival
The random variable that can assume only a count- tions is critical in modeling and decision making in in a drive through are modeled using some type of
able number of values is called a discrete random vari- analytics. distribution.
able (e.g., number of defective products or number The probability distribution assigns probabilities to The customer or calls arriving are random phenom-
of voters). Random variables that are not countable each value of a random variable. enon that are random and can be modeled using a
but correspond to the points in an interval are A random variable is a variable that can take numerical discrete distribution known as a Poisson distribu-
known as continuous random variables (e.g., delivery values that are associated with the random outcomes tion. In a fast-food drive through, the customer
time, length, diameter, volume). of an experiment or a process under study. A ran- waiting time and service time can be described using
Continuous random variables are infinite and dom variable is usually an outcome of a statistical a continuous distribution known as exponential
uncountable. experiment or a process that generates random distribution.
These random variables are described using either outcomes and can be modeled using a probability A very widely used continuous distribution in real
a discrete or a continuous probability distribution distribution—discrete or continuous depending on world is normal or Gaussian distribution.
depending on the nature of variable. the outcome of variable (s) of interest. These dis- Some examples where this distribution can be ap-
Distributions have wide applications in data analy- tributions are used to determine the probabilities of plied are the length of time to assemble an electron-
sis, decision making, and computer simulation. outcomes and draw conclusions from the data. ic appliance, the life span of a satellite power source,
The distributions are applied based on the trend or fuel consumption in miles-per-gallon of new model
pattern from the data or when certain conditions of a car, the inside diameter of a manufactured cyl-
are met. For example, a bell-shaped pattern from a inder, the waiting time of patients at an outpatient
data is usually described using a normal distribution, clinic, etc.
whereas customer arrival or calls coming to a call
center is random and can be modeled using a Pois-
son distribution.
The normal distribution is a continuous distribu-
tion, whereas the Poisson distribution falls in the
category of discrete distribution. There are a number
of distributions used in data analytics.
Sampling and sampling distribution
Sampling is a systematic way of selecting a few items In data analysis, we almost always rely on sample A manufacturer of computer and laser printers has
from the population. Samples are analyzed to draw data to draw conclusion about the population from determined that the assembly time of one of its
conclusion(s) about the entire population. which the data are collected. computers is normally distributed with mean μ = 18
minutes and standard deviation σ = 4 minutes.
In most cases, the parameters of the population are Sample is a part of population. One of the obvious To improve this process, a random sample is used.
unknown and we can estimate a population param- reasons of using samples in statistics and data an- The analysis of the sample data can answer the
eter using a sample statistic. alysis is because in most cases, the population can probability that the mean assembly time will take
Sampling distribution is the probability distribution be huge and it is not practical to study the entire longer than 19 minutes.
of a sample statistic. population. The other example involves predicting the presi-
Different sampling techniques are used to collect sample Samples are used to make inferences about the dential election using poll results. Polls are con-
data. Sample statistics are then calculated to draw con- population and this can be done through a sampling ducted to gather sample data and lots of planning
clusion about the population. Sample statistic may be distribution. goes into this.
a sample mean x, sample variance s2, sample stan- Here we will try to answer questions related to sam- Refer to Gallup polls in http://www.gallup.com that
dard deviation s, or a sample proportion, p. pling and surveys conducted in the real world. conducts and reports numerous poll results. They
The central limit theorem has major applications in In sampling theory, we need to consider several fac- use sample data to predict the polls (the percent or
sampling and other areas of statistics and data an- tors and answer questions, such as why do we use proportion of voters that favor different candidates).
alysis. It tells us that if we take a large sample, that samples? Why do we need to have a homogeneous This is an example of using a sample of voters to
is a sample size of 30 or more or (n ≥ 30), we can sample? What are different ways of taking samples? predict the population of voters who favor certain
use the normal distribution to calculate the prob- What is a sampling distribution and what is the pur- candidates.
ability and draw conclusion about the population pose of it?
77
parameter.
(Continued)

78
In data analysis, sample data is used to draw conclu- A population is described by its parameters The first example studied the mean, whereas the
sion about the population. (population parameters) and a sample is described poll example is about studying proportion.
by its statistics (sample statistics). It is important
to note that a population parameter is always a
constant, whereas a sample statistic is a random
variable. Similar to other random variables, each
sample statistic can be described using a probability
distribution.
Inference procedure: estimation and confidence intervals
Statistical inference: The objective of statistical inference The parameters of a process are generally unknown; A recent poll report entitled: “Link between exer-
is to draw conclusions or make decisions about a popula- they change over time and must be estimated. This cising regularly and feeling good about appearance”
tion based on the samples selected from the population. is done using inference procedures. came to the following conclusion: 56% of Amer-
The idea of drawing conclusion about the popula- Statistical inference is an extremely important area of icans who exercise two days per week feel good
tion parameters such as the population mean (μ), statistics and data analysis and is used to estimate about their looks. This jumps to the 60% range
population variance (σ 2), population proportion the unknown population parameters. among those who exercise three to six times per
(p) comes under estimation where these population For example, estimating the mean, μ, or the stan- week. The results are based on telephone interviews
parameters are estimated using the sample statistics dard deviation, σ, using the corresponding sample conducted as part of the Gallup-Healthways Well-
referred to as the sample mean (x), sample variance statistic (sample mean, x, or the sample standard Being Index survey from January 1 to June 23, 2014
(s2), and the sample proportion (p). deviation, s). There are two major tools of infer- with a random sample of 85,143 adults aged 18 and
The reason for estimating these parameters is that ential statistics: estimation and hypothesis testing. older, living in all 50 U.S. states.
the values of the population parameters are un- These techniques are the basis for many of the For the results based on the total sample of national
known, and therefore, we must use the sample data methods of data analysis and statistical quality con- adults, the margin of sampling error is ±5.0 percentage
to estimate them. trol. Here we explain the concept of estimation. points at the 95% confidence level.
Estimation is the simplest form of inferential sta- Parts of the claims made here may not make any
tistics in which a sample statistic is used to draw sense and perhaps you are wondering about some
conclusions regarding the unknown population of the statements. For example, what do the margin
parameter. of sampling error of ±5.0 percentage points and a 95%
Two types of estimates are used in parameter estima- confidence level mean? Also, how can using a sample of
tion: point estimate and interval estimate or confidence only a few thousand allows for a conclusion to be drawn
interval. about the entire population?
Estimation and confidence intervals answer
the above questions. They enable us to draw
conclusion(s) about the entire population using the
sample from the population.
Inference procedure: hypothesis testing
Hypothesis testing is a major tool of inferential Here we extend the concept of inferential statistics Let us look into a real-world example. In recent
statistics that uses the information in the sample to hypothesis testing. Hypothesis tests enable us to years, there has been a great deal of interest in hy-
data to make a decision about a hypothesis. The hy- draw conclusions about a population by analyzing brid cars. Consumers are attracted to buy hybrids be-
pothesis may be about a mean, proportion, variance, the information obtained from a sample. cause of the high miles per gallon (mpg) these cars
and so on. Suppose the Department of Labor claims A hypothesis test involves a statement about a pop- claim to provide. If you are interested in purchasing
that the average salary of graduates with a Data ulation parameter (such as a population mean, or a a hybrid, there are many makes and models from
Analytics degree is $80,000, this can be written as a population proportion). The test specifies a value for different manufacturers to choose from. It seems
hypothesis. the parameter that lies in a region. Using the sample that just about every manufacturer offers a hybrid
To verify this claim, a sample of recent graduates data, we must decide whether the hypothesis is con- to compete in the growing market of hybrid cars.
may be evaluated and a conclusion can be reached sistent with or supported by the sample data. The following are the claims made by some of the
about the validity of this claim. A hypothesis may manufactures of hybrid cars: Toyota Prius claims to
test a claim, a design specification, a belief, or a provide about 50 mpg in the city and 48 mpg on the
theory, and sample data are used to verify these. highway. It tops the list of fuel-efficient hybrids. The
79
estimated annual fuel cost is less than $800.
(Continued)

80
Ford Fusion Hybrid claims to provide 41 mpg in
the city and 36 mpg on the highway. The average
annual fuel cost of less than $1,000 makes it attract-
ive to customers.
Honda Civic Hybrid claims to provide 40 mpg in
the city and 45 mpg on the highway. Estimated an-
nual fuel costs are less than the Ford Fusion.
These days we find several claims like the ones
above in consumer magazines and television com-
mercials. Should the consumers believe these
claims? Hypothesis testing may provide the answer
to such questions. Hypothesis testing will enable us
to make inferences about a population parameter by
analyzing the difference between the stated popula-
tion parameter value and the results obtained from
the sample data. It is a very widely used technique
in real world.
Correlation analysis
Correlation is a numerical measure of linear associ- Coefficient of correlation is a numerical measure of Suppose we are interested in investigating and
ation between two variables. It provides the degree the linear association between the two variables x knowing the strength of relationships between the
of association between two variables of interest and and y. The correlation coefficient is denoted by rxy sales and the profit or the relationship between the
tells us how week or strong the relationship is be- and its value is between −1 and +1. The rxy value sales and advertisement expenditures for a company.
tween variables. tells us the degree of association between the two Similarly, we can study the relationship between
variables. It also tells us how strong or weak the the home-heating cost and the average temperature
correlation is between the two variables. using the correlation analysis. Usually, the first step
in correlation analysis starts by constructing a scat-
ter plot.
These plots are very useful in visualizing whether
the relationship between the variables is positive or
negative and linear or nonlinear. The next step is
calculating a numerical measure. For example, if the
calculated coefficient of correlation r = +0.902, it
shows a very strong positive correlation between the
sales and advertisement. These scatter plots are very
helpful in describing bivariate relationships or the
relationship between the two quantitative variables
and can be easily created using computer packages.
These are very helpful in data analysis and model
building.
81

Summary
Predictive analytics is about predicting the future business outcomes.
This phase of analytics uses a number of models that can be divided into
logic-driven models and data-driven models. We discussed both types of
models and the difference between the two. The key discussion area of
this chapter was to introduce the readers to a number of tools and statis-
tical models—the understanding of which are critical in understanding
and applying the predictive analytics models. These are background infor-
mation and we call them prerequisite to predictive analytics. The chapter
provided a brief description and application areas of prerequisite tools.
These are probability concepts, probability distributions, sampling and
sampling distributions, correlation analysis, estimation and confidence
intervals, and hypothesis testing. These topics are investigated in detail in
the Appendix that accompanies this book. The appendix is available as a
free download.
Appendix A–D
The appendix contains the topics that are the prerequisite to data-driven
predictive analytics models. The concepts discussed here are essential in
applying predictive models. The Appendix A–D discuss the following sta-
tistical tools and models of business analytics: concept of probability, role
of probability distributions in decision making, sampling and sampling
distribution, inference procedures: estimation and confidence interval,
and inference procedures for one and two-population parameters—
hypothesis testing.
Note: The following chapters discuss predictive analytics models—regression
analysis, modeling, time series forecasting, and data mining
CHAPTER 6
Key Predictive Analytics

Models (Predicting Future
Business Outcomes Using
Analytic Models)
Chapter Highlights
• Key Predictive Analytics Models and Their Brief Description
and Applications
• Regression Models
• Forecasting Models
• Analysis of Variance (ANOVA)
• Data Mining
• Simple Regression, Multiple Regression, Nonlinear Regression
• Forecasting Models
• Summary
Key Predictive Analytics Models and Their Description

and Applications
This chapter explains the key models used in predictive modeling. The
details of the models along with applications are presented in detail.
Figure 6.1 outlines the tools. Table 6.1 provides brief explanation and
application areas.
Figure 6.1 Predictive modeling tools
Table 6.1 outlines key predictive analytics tools, the types of questions
they try to answer, and briefly explains the applications of the tools.
The descriptions and application areas of the statistical tools in predic-
tive analytics are outlined in Table 6.2.
Table 6.1 Predictive analytics, questions they attempt to answer, and their tools
Predictive Analytics Attempts to Answer Tools and Applications
Regression models • How the trends and patterns identified in the data • Regression models: (a) simple regression models; (b) multiple
can be used to predict the future business outcome(s)? regression models; (c) nonlinear regression models, including the
• How can we identify appropriate prediction models? quadratic or second-order models, and polynomial regression mod-
• How the models be used in making prediction about els; (d) regression models with indicator or qualitative independent
how things will turn out in the future—what will hap- variables; (e) regression models with interaction terms or inter-
pen in the future? action models; and (f) logistic regression models.
• How can we predict the future trends of the key per-
formance indicators using the past data and models
and make predictions?
Forecasting models How to predict the future behavior of the key business Forecasting techniques: Widely used predictive models involve a class
outcomes or variables using different forecasting tech- of time series analysis and forecasting models. The commonly used fore-
niques suited to predict future business phenomena? casting models fall into the following categories:
How different forecasting models using both the qualita- • Techniques using average: simple moving average, weighted moving
tive and quantitative forecasting techniques can be ap- average, exponential smoothing
plied to predict a number of future business phenomena? • Techniques for trend: linear trend equation (similar to simple re-
How some of the key variables, including the sales, gression), double moving average or moving average with trend,
revenue, number of customers, demand, inventory, cus- exponential smoothing with trend or trend-adjusted exponential
tomer behavior, number of visits to the business website, smoothing
and many others, can be predicted using a number of • Techniques for seasonality: forecasting data with seasonal pattern
proven techniques? • Associative forecasting techniques: simple regression, multiple re-
The prediction and forecasting methods use a number of gression analysis, nonlinear regression, regression involving categor-
time series models as well as data mining techniques. ical or indicator variables, and other regression models
How the forecast can be used in short-term and long- • Regression-based models that use regression analysis to forecast fu-
term business planning? ture trends. Other time series forecasting models are simple moving
average, moving average with trend, exponential smoothing, expo-
85
nential smoothing with trend, and forecasting seasonal data.
(Continued)
86
Predictive Analytics Attempts to Answer Tools and Applications
ANOVA (analysis of ANOVA in its simplest form is a way to study multiple ANOVA and DOE techniques include single-factor ANOVA, two-
variance) means. Single-factor, two- and multiple factor ANOVA factor ANOVA, and multiple factor ANOVA. Factorial designs and
along with design of experiment (DOE) techniques are DOE tools are used to create models involving multiple factors.
powerful tools used in data analysis to study and identify
key variables and build prediction equations. These
models are used in modeling and predictive analytics to
predict future outcomes.
Data mining Determines meaningful patterns and deriving insights Data mining techniques are used to extract useful information from
from large data sets. It is closely related to analytics. huge amounts of data using predictive analytics, computer algorithms,
Data mining uses statistics, machine learning, and arti- software, mathematical, and statistical tools.
ficial intelligence techniques to derive meaningful pat-
terns and make predictions.
Other tools of predictive Machine learning is a method used to design systems Machine learning, artificial intelligence, neural networks, and deep
analytics: that can learn, adjust, and improve based on the data learning have been used successfully in fraud detection, e-mail spam,
Machine learning, artificial fed to them. Machine-learning works based on predic- GPS systems, medicine, medical diagnosis, and predicting and treat-
intelligence, neural net- tive and statistical algorithms that are provided to these ing a number of medical conditions. There are other applications of
works, and deep learning machines. The algorithms are designed to learn and im- machine learning.
prove as more data flow through the system.
Table 6.2 Statistical tools and application areas
Statistical Tools
and Models Brief Description Application Areas
Simple regression Background: Regression analysis is The purpose of simple regression analysis is to develop a statistical model that can be used to predict the
model used to investigate the relationship value of a response or dependent variable using an independent variable.
between two or more variables. For example, we might be interested in predicting the profit using the number of customers or we might be
Often we are interested in predict- interested in predicting the time required to produce certain number of products in a production situation.
ing a variable using one or more In these cases, the variable profit or the variable time that we are trying to predict is known as the depend-
independent variables. ent or the response variable, and the other variable, sales or the number of products, is referred to as the in-
In general, we have one dependent or dependent variable or predictor.
response variable y and one or more In a simple linear regression method, we study the linear relationship between two variables, the dependent or
independent variables x1, x2,…,xk. the response variable (y) and the independent variable or predictor (x). The following is an example relating the
The independent variables are also advertising expenditure and sales of a company. The relationship is linear and the objective is to predict sales—
called predictors. If there is only one response variable (y) using advertisement—the independent variable or predictor (x). A scatter plot as shown in
independent variable x that we are Figure 6.2 is one of the first steps in studying the relationship (linear or nonlinear) between the variables.
trying to relate to the dependent
variable y, then this is a case of
simple regression. On the other hand,
if we have two or more independent
variables that are related to a single
response or dependent variable, we
have a case of multiple regression.
87
Figure 6.2 Scatter plot of sales versus advertising
(Continued)
88
Statistical Tools
Multiple regression In regression analysis, we have one A pharmaceutical company is concerned about declining sales of one of its drugs. The drug was introduced
models dependent or response variable y and in the market approximately two-and-a half years ago. In the recent few months the sales of this product
one or more independent variables, is in constant decline and the company is concerned about losing its market share as it is one of the major
x1, x2,…,xk. The independent vari- drugs the company markets. The head of the sales and marketing department wants to investigate the pos-
ables are also called predictors. If sible causes and evaluate some strategies to boost the sales. He would like to build a regression model of the
there is only one independent vari- sales volume and several independent variables believed to be strongly related to the sales. A multiple re-
able x that we are trying to relate gression model will help the company to determine the important variables and also predict the future sales
to the dependent variable y, then volume. The marketing director believes that the sales volume is directly related to three major factors: dol-
this is a case of simple regression. On lars spent on advertisement, commission paid to the salespersons, and the number of salespersons deployed
the other hand, if we have two or for marketing this drug. A multiple regression model can be built to study this problem.
more independent variables that In a multiple regression, the least squares method determines the best fitting plane or the hyperplane
are related to a single response or through the data points that ensures that the sum of the squares of the vertical distances or deviations from
dependent variable, then we have a the given points and the plane are a minimum.
case of multiple regression.
Figure 6.3 below shows a multiple regression model with two independent variables. The response y with
The relationship between the two independent variables x1 and x2 forms a regression plane.
dependent and independent vari-
able or variables are described by
a mathematical model known as a
regression equation. The regression
model is described in the form of a
regression equation that is obtained
using the least squares method. In
case of a multiple linear regression
the equation is of the form:
y = b0 + b1x1 + b2x2 + b3x3 + … + bnxn,
where b0, b1, b2, …, bn are the re-
gression coefficients and x1, x2,…,xk
are the independent variables.
Figure 6.3 A multiple regression model
Nonlinear regression The above models—simple and A nonlinear (second-order) regression model is described here:
(quadratic and poly- multiple regression—are based The life of an electronic component is believed to be related to the temperature in the operating environ-
nomial) models on the assumption of linearity, ment. A scatter plot shown below was created to study the relationship. The scatter plot in Figure 6.4 shows
that is, the relationship between the life of the components (in hours) and the corresponding operating temperature (in ° F). From the scat-
the independent variable(s) and ter plot, it is clear that the relationship between the variables is not linear. An appropriate model in this
the response variable can be well case would be a second-order or quadratic model that will predict the life of the component. In this case,
approximated by a linear model. the life of the component is the dependent variable (y) and the operating temperature is the independent
However, in certain situations the variable (x).
relationship between the variables
is not linear but may be described
by quadratic or second-order model.
Sometimes the relationships can
be described using a polynomial
model.
Figure 6.4 Scatter plot of life (y) versus

operating temp. (x)
(Continued)
89
90
Statistical Tools
Figure 6.5 shows a second-order model with the regression equation that can be used to predict the life of
the components using temperature.
Figure 6.5 A second-order regression model

Multiple regression In regression we often encounter Application of a regression model with a dummy variable:
using dummy or indi- qualitative or indicator variables We would like to write a model relating the mean profit of a grocery chain. It is believed that the profit to a
cator variables that need to be included as one of large extent depends on the location of the stores. Suppose that the management is interested in three spe-
the independent variables in the cific locations where the stores are located. We will call these locations A, B, and C. In this case, the store
model. location is a single qualitative variable, which is at three levels corresponding to the three locations A, B,
To include such qualitative vari-
and C. The prediction equation relating the mean profit () and the three locations can be written as:
ables in the model, we use a dummy
or indicator variable. The use of y = b 0 + b1 x1 + b2 x 2
dummy or indicator variable in
a regression model allows us to
 1 if location B
include qualitative variables in the x1 = 
model. For example, to include the  0 if not
sex of employees in a regression
model as an independent variable,
we define this variable as  1 if location C
x2 = 
0 if not
1 
x1 = 
0
The variables x1 and x2 are known as the dummy variables that make the model function.
In the above formulation, a “1” in-
dicates that the employee is a male
and a “0” means the employee is a
female. Which one of the male or
female is assigned the value of 1 is
arbitrary. In general, the number of
dummy or indicator variable need-
ed is one less than the total number
of indicator variables to be included
in the model.
91
(Continued)
92
Statistical Tools
All subset and step- Finding the best set of predictor
wise regression variables to be included in the
model
Some other regression models:
Reciprocal transform- This transformation can produce a linear relationship and is of the form
ation of x variable
 1
y = β0 + β1   + ε
 x
This model is appropriate when x and y have an inverse relationship. Note that the inverse relationship is not linear.
Log transformation of The logarithmic transformation is of the form
x variable y = β0 + β1 ln(x) + ε
Log transformation of This is a useful curvilinear form where ln(x) is the natural logarithm of x and x > 0 .
x and y variables ln(y) = β0 + β1 ln(x) + ε
The purpose of this transformation is to achieve a linear relationship. The model is valid for positive values of x and y. This transformation is
more involved and is difficult to compare it with other models with y as the dependent variable.
Logistic regression This model is used when the response variable is categorical. In all the regression models we developed in this book, response variable was a
quantitative variable. In cases, where the response is categorical or qualitative, the simple and multiple least-squares regression model violates the
normality assumption. The correct model in this case is logistic regression.
Statistical Tools
Forecasting models A forecast is a statement about the Usually the first step in forecasting is to plot the historical data. This is critical in identifying the pattern in
future value of a variable of interest
the time series and applying the correct forecasting method. If the data are plotted over time, such plots are
such as demand. known as time series plots. This plot involves plotting the time on the horizontal axis and the variable of
Forecasting is used to make informed interest on the vertical axis. The time series plot is a graphical representation of data over time where the
decisions and may be long-range or data may be weekly, monthly, quarterly, or annually. Some of the common time series patterns are discussed.
short-range. Figure 6.6 below shows that the demand data are fluctuating around an average. The averaging techniques
Forecasts affect decisions and such as simple moving average or simple exponential smoothing can be used to forecast such patterns. The
activities throughout an organiza- actual data and the forecast are shown in Figure 6.7.
tion. Produce-to-order companies
depend on demand forecast to plan
their production. Inventory plan-
ning and decisions are affected by
forecast. Following are some of the
areas where forecasting is used.
Forecasting methods are classified
as qualitative or quantitative.
Qualitative forecasting methods use
expert judgment to develop fore-
casts. These methods are used when
historical data on the variable being
forecast are usually not available.
The method is also known as judg- Figure 6.6 Plot of demand over time
mental as they use subjective inputs.
93
(Continued)
94
Statistical Tools
These forecasts may be based on
consumer surveys, opinions of sales
and marketing, market sensing, and
Delphi method that uses opinions
of managers or consensus.
The objective of forecasting is to
predict the future outcome based
on the past pattern or data. When
the historical data are not avail-
able, qualitative methods are used.
These methods are used in absence
of past data or in cases when a new
product is to be launched for which
information is not available. Quali- Figure 6.7 Demand and forecast
tative methods forecast the future
outcome based on opinion, judge-
Figures 6.8 and 6.9 show the sales data for a company over a period of 65 weeks. Clearly, the data are
ment, or experience.
fluctuating around an average and showing an increasing trend. Forecasting techniques such as double
moving average or exponential smoothing with a trend can be used to forecast such patterns. The plot in
Figure 6.10 shows the sales and forecast for the data.
The forecast may be based on the
consumer/customer surveys, execu-
tive opinions, sales force opinions,
surveys of similar competitive
products, Delphi method, expert
knowledge, and opinions of man-
agers, achieving a consensus on the
forecast.
Quantitative forecasting is based on
historical data. The most common
methods are time series and associa-
tive forecasting methods. These are
discussed in detail in the subse-
quent sections. The forecasting Figure 6.8 Sales over time
methods and models can be divided
into following categories:
Techniques using average
Simple moving average
Weighted moving average
Exponential smoothing
Techniques for trend
Linear trend equation (similar to
simple regression)
(Continued)
95
96
Statistical Tools
Double moving average or moving
average with trend, exponential
smoothing with trend or trend-
adjusted exponential smoothing
Techniques for seasonality
Forecasting data with seasonal
pattern
Associative forecasting techniques
Simple regression
Multiple regression analysis
Nonlinear regression
Regression involving categorical or
indicator variables, and Figure 6.9 Sales and forecast for the data in
Figure 6.8
Other regression models
The other forecasting techniques involve a number of regression models, and forecasting seasonal patterns.
Figure 6.10 shows a seasonal pattern and forecast.Figure 6.10 A seasonal pattern
Figure 6.10 A seasonal pattern
(Continued)
97
98
Statistical Tools
ANOVA (analysis of A single-factor completely random- Consider an example in which the marketing manager of a franchise wants to know whether there is a dif-
variance) ized design is the simplest experi- ference in the average profit among four of their stores. He randomly selected four stores and recorded the
mental design. This design involves profit for these stores. The data would look like Table 6.1. In this case, the single factor of interest is store.
one factor at different levels that Since there are four stores, store 1, store 2, store 3, and store 4, we have four levels of the same factor. Recall
can be dealt with a single-factor that the levels of a factor are also known as treatments or groups; therefore, we can say that there are four
factorial experiment. The analysis treatments or groups. The manager wants to study the profit for the selected stores; therefore, profit is the re-
method for such problems is known sponse variable. The response variable is the variable that is measured in an experiment. This is an example
as ANOVA. of a one-factor ANOVA where the single-factor store is at four levels and the response variable is profit.
In ANOVA, the procedure uses
variances to determine whether the Table 6.1
means of multiple groups are differ-
Store 1 Store 2 Store 3 Store 4
ent. The process works by compar-
ing the variance between group 30 37 25 23
vs. the variance within groups and 34 33 21 26
determines whether the groups are 26 39 24 29
all part of a single population or
30 42 25 28
separate populations.
Design of experiment (DOE) is a 25 37 18 25
powerful tool. Many variations of 29 40 25 25
design involve two-factor factorial
design using a two-factor ANOVA. The null and alternate hypotheses for a one-factor ANOVA involving k treatments or groups tests whether
More than two factors can be the k treatment means are equal.
studied using specially designed
experiments.
Data mining Data mining involves exploring Data mining is one of the major tools of predictive analytics. In business, data mining is used to analyze busi-
new patterns and relationships ness data. Business transaction data along with other customer and product related data are continuously
from the collected data—a part of stored in the databases. The data mining software are used to analyze the vast amount of customer data to
predictive analytics that involves reveal hidden patterns, trends, and other customer behavior. Businesses use data mining to perform market
processing and analyzing huge analysis to identify and develop new products, analyze their supply chain, find the root cause of manufactur-
amounts of data to extract useful ing problems, study the customer behavior for product promotion, improve sales by understanding the needs
information and patterns hidden and requirements of their customer, prevent customer attrition, and acquire new customers. For example,
in the data. The overall goal of Wal-Mart collects and processes over 20 million point-of-sale transactions every day. These data are stored
data mining is knowledge discov- in a centralized database and are analyzed using data mining software to understand and determine customer
ery from the data. Data mining behavior, needs, and requirements. The data are analyzed to determine sales trends and forecasts, develop
techniques are used to (i) extract marketing strategies, and predict customer-buying habits http://www.laits.utexas.edu/~anorman/BUS.FOR/
previously unknown and potential course.mat/Alex
useful knowledge or patterns from The success with data mining and predictive modeling has encouraged many businesses to invest in data
massive amount of data collected mining to achieve a competitive advantage. Data mining has been successfully applied in several areas of
and stored and (ii) exploring and business and industry, including customer service, banking, credit card fraud detection, risk management,
analyzing these large quantities of sales and advertising, sales forecast, customer segmentation, and manufacturing.
data to discover meaningful pat-
Data mining is “the process of uncovering hidden trends and patterns that lead to predictive modeling using
tern, and transforming data into an
a combination of explicit knowledge base, sophisticated analytical skills and academic domain knowledge
understandable structure for further
(Luan, Jing, 2002).” Data mining has been used successfully in science, engineering, business, and finance
use. The field of data mining is rap-
to extract previously unknown patterns in the databases containing massive amount of data and to make
idly growing and statistics plays a
predictions that are critical in decision making and improving the overall system performance.
major role in it. Data mining is also
known as knowledge discovery in In recent years, data mining combined with machine learning/artificial intelligence is finding larger and
databases (KDD), pattern analysis, wider applications in analyzing business data, thereby predicting future business outcomes. The reason for
information harvesting, business this is the growing interest in knowledge management and in moving from data to information and finally
intelligence, analytics, etc. Besides to knowledge discovery.
statistics, data mining uses artificial
intelligence, machine learning, da-
99
tabase systems, advanced statistical
tools, and pattern recognition.
(Continued)
100
Statistical Tools
In this age of technology, compan-
ies collect massive amount of data
automatically using different means.
A large quantity of data is also col-
lected using remote sensors and
satellites. With the huge quantities
of data collected today—usually re-
ferred to as big data, traditional tech-
niques of data analysis are infeasible
for processing the raw data. The
data in its raw form have no mean-
ing unless processed and analyzed.
Among several tools and techniques
available and currently emerging
with the advancement of technology
and computers, it is now possible to
analyze big data using data mining,
machine learning, and artificial in-
telligence (AI) techniques.
Machine learning Machine learning methods use Machine-learning algorithms have extensive applications in data-driven predictions and are a major
complex models and algorithms that decision-making tool. Some applications where machine learning has been used are e-mail filtering, cyber
are used to make predictions. These security, signal processing, fraud detection, and others. Machine learning is employed in a range of comput-
models allow the analysts to make ing tasks. Although machine-learning models are being used in a number of applications, it has limitations
predictions by learning from the in designing and programming explicit algorithms that are reproducible and have repeatability with good
trends, patterns, and relationships in performance. With current research and the use of newer technology, the field of machine learning and arti-
the historical data. The algorithms are ficial intelligence are becoming more promising.
designed to learn iteratively from data
without being programmed. In a way,
machine learning automates model
building.
KEY PREDICTIVE ANALYTICS MODELS 101
Summary
This chapter provided a brief description and applications of key predic-
tive analytics models. These models are the core of predictive analytics
and are used to predict future business outcomes.
CHAPTER 7
Regression Analysis
and Modeling
Chapter Highlights
• Introduction to Regression and Correlation
• Linear Regression
?? Regression Model
• The Estimated Equation of Regression line

• The Method of Least Squares
• Illustration of Least Squares Regression Method
• Analysis of a Simple Regression Problem
• Regression Analysis using Computer
?? Simple Regression using EXCEL
?? Simple Regression using MINITAB
?? Analysis of Regression Output
?? Model Adequacy Test
?? Assumptions of Regression Model and Checking the
Assumptions using MINITAB Residual Plots

?? Checking the Assumptions of Regression using Residual Plots
• Multiple Regression: Computer Analysis and Results

?? Introduction to Multiple Regression
?? Multiple Regression Model
• The Least Squares Multiple Regression Model

• Models with Two Quantitative Independent Variables x1 and x2
• Assumptions of Multiple Regression Model
• Computer Analysis of Multiple Regression
2
?? The Coefficient of Multiple Determination (r )
?? Hypothesis Tests in Multiple Regression

?? Testing the Overall Significance of Regression

?? Hypothesis Tests on Individual Regression Coefficients
• Multicollinearity and Autocorrelation in Multiple Regression
• Summary of the Key Features of Multiple Regression Model
• Model Building and Computer Analysis
?? Model with a Single Quantitative Independent Variable
?? First-order Model/ Second-order Model/ Third-order Model
• A Quadratic (second-order) Model: Second-order Model using

MINITAB
?? Analysis of Computer Results
• Models with Qualitative Independent (Dummy) Variables

?? One Qualitative Independent Variable at Two Levels
• Model with One Qualitative Independent Variable at Three

Levels
• Example: Dummy Variables
• Overview of Regression Models
• Implementation Steps and Strategy for Regression Models
Introduction to Regression and Correlation

This chapter provides an introduction of regression and correlation an-
alysis. The techniques of regression enable us to explore the relationship
between variables. We will discuss how to develop regression models that
can be used to predict one variable using the other variable, or even mul-
tiple variables. Also, the following features related to regression analysis
are the topic of this chapter.
I. Concepts of dependent or response variable and independent vari-

ables or predictors,
II. The basics of the least squares method in regression analysis and its
purpose in estimating the regression line,
III. Determining the best-fitting line through the data points,
IV. Calculating the slope and y-intercept of the best fitting regression
line and interpreting the meaning of regression line, and
V. Measures of association between two quantitative variables - the
covariance and the coefficient of correlation
Regression Analysis and Modeling 105
Linear Regression
Regression analysis is used to investigate the relationship between two or
more variables. Often we are interested in predicting a variableusing one
or more independent variables x1 , x 2 ,.., xk . For example, we might be
interested in the relationship between two variables: sales and profit for
a chain of stores, number of hours required to produce a certain number
of products, number of accidents vs. blood alcohol level, advertising ex-
penditures and sales, or the height of parents compared to their children.
In all these cases, regression analysis can be applied to investigate the
relationship between the two variables.
In general, we have one dependent or response variable, y and one or
more independent variables, x1 , x 2 ,..., xk . The independent variables are
also called predictors. If there is only one independent variable x that we
are trying to relate to the dependent variable y, then this is a case of simple
regression. On the other hand, if we have two or more independent vari-
ables that are related to a single response or dependent variable, then we
have a case of multiple regression. In this section, we will discuss simple
regression, or to be more specific, simple linear regression. This means
that the relationship we obtain between the dependent or response vari-
able y and the independent variable x will be linear. In this case, there is
only one predictor or independent variable (x) of interest that will be used
to predict the dependent variable (y).
In regression analysis, the dependent or response variable y is a ran-
dom variable; whereas the independent variable or variables x1 , x 2 ,.., xn
are measured with negligible error and are controlled by the analyst. The
relationship between the dependent and independent variable or variables
are described by a mathematical model known as a regression model.
The Regression Model
In a simple linear regression method, we study the linear relationship

between two variables, the dependent or the response variable (y)and the
independent variable or predictor (x).
Suppose that the Mountain Power Utility company is interested in de-
veloping a model that will enable them to predict the home heating cost
based on the size of homes in two of the western states that they serve. This
model involves two variables: the heating cost and the size of the homes.
We will denote them by y and x respectively. The manager in charge of

developing the model believes that there is a positive relationship between
x and y meaning that the larger homes (homes with larger square-footage)
tend to have higher heating cost. The regression model relating the two
variables— home heating cost y as the dependent variable and the size of the
homes as the independent variable x – can be denoted using equation (7.1).
Equation (7.1) shows the relationship between the values of x and y,
or the independent and dependent variable and an error term in a simple
regression model.
y = β0 + β1 x + ε (7.1)
where y = dependent variable x = independent variable

β0 = y - intercept (population) β1 = slope of the population regression line
ε = random error term (ε is the Greek letter “epsilon”)
The model represented by equation (7.1) can be viewed as a popula-

tion model in which β0 and β1 are the parameters of the model. The error
term ε represents the variability in y that cannot be explained by the rela-
tionship between x and y.
In our example, the population consists of all the homes in the region.
This population consists of sub-populations consisting of each home of
size, x. Thus, one subpopulation may be viewed as all homes with 1,500
square-feet, another subpopulation consisting of all home with 2,100
square-feet, and so on. Each of these subpopulations consisting of size
x will have a corresponding distribution of y values with the mean or
expected value E(y). The relationship between the expected value of y or
E(y) and x is the regression equation given by:
E ( y ) = β0 + β1 x (7.2)
where E(y)= is the mean or expected value of y for a given value of x

β0 = y- intercept of the regression line β1 = slope of the regression line
The regression equation represented by equation (7.2) is an equation

of a straight line describing the relationship between E(y) and x. This
relationship shown in Figure 7.1 (a) – (c) can be described as positive,
107
Figure 7.1 Possible linear relationship between E(y) and x in simple linear regression

negative, or no relationship. The positive linear relationship is identified

by a positive slope. It shows that an increase in the value of x causes an
increase in the mean value of y or E(y), whereas a negative linear relation-
ship is identified by a negative slope and indicates that an increase in the
value x causes a decrease in the mean value of y.
The no relationship between x and y means that the mean value of y or
E(y) is the same for every value of x. In this case, the regression equation
cannot be used to make a prediction because of a weak or no relationship
between x and y.
The Estimated Equation of Regression Line

In equation (7.2), β0 and β1 are the unknown population parameters that
must be estimated using the sample data. The estimates of β0 and β1 are
denoted by b0 and b1 that provide the estimated regression equation given
by the following equation.
ŷ = b0 + b1 x (7.3)
where ŷ = point estimator of E(y) or the mean value of y for a given value
of x
b0 = y - intercept of the regression line b1 = slope of the regression line
The regression equation above represents the estimated line of regres-

sion in the slope intercept form. The y-intercept b0 and the slope b1 in
equation (7.3) are determined using the least squares method. Before we
discuss the least squares method in detail, we will describe the process of
estimating the regression equation. Figure 7.2 explains this process.
The Method of Least Squares

The regression model is described in form of a regression equation that is
obtained using the least squares method. In a simple linear regression, the
form of the regression equation is y = b0 + b1 x . This is the equation of a
straight line in the slope intercept form.
109
Figure 7.2 Estimating the regression equation

Figure 7.3 shows a scatter plot of the data of Table 7.1. Scatter plots
are often used to investigate the relationship between two variables. An
investigation of the plot shows a positive relationship between sales and
advertising expenditures therefore, the manager would like to predict the
sales using the advertising expenditure using a simple regression model.
Figure 7.3 Scatterplot of sales and advertisement expenditures
Table 7.1 Sales and advertisement data

Sales ($1,000s) Advertising ($1,000s)
458 34
390 30
378 29
426 30
330 26
400 31
458 33
410 30
628 41
553 38
728 44
498 40
708 48
719 47
658 45
As outlined above, a simple regression model involves two variables

where one variable is used to predict the other variable. The variable to be
predicted is the dependent or response variable, and the other variable is
the independent variable. The dependent variable is usually denoted by y
while the independent variable is denoted by x.
In a scatter plot the dependent variable (y) is plotted on the vertical
axis and the independent variable (x) is plotted on the horizontal axis.
The scatter plot in Figure 7.3 suggests a positive linear relationship
between sales (y) and the advertising expenditures (x). From the figure, it
can be seen that the plotted points can be well approximated by a straight
line of the form y = b0 + b1 x where, b0 and b1 are the y-intercept and
the slope of the line. The process of estimating this regression equation
uses a widely used mathematical tool known as the least squares method.
The least squares method requires fitting a line through the data points
so that the sum of the squares of errors or residuals is minimum. These errors
or residuals are the vertical distances of the points from the fitted line. Thus,
the least squares method determines the best fitting line through the data
points that ensures that the sum of the squares of the vertical distances or
deviations from the given points and the fitted line are a minimum.
Figure 7.4 shows the concept of the least squares method. The fig-
ure shows a line fitted to the scatter plot of Figure 7.3 using the least
squares method. This line is the estimated line denoted using y-hat (ŷ).
The method of estimating this line will be illustrated later. The equation
of this line is given below.
yˆ = −150.9 + 18.33 x
The vertical distance of each point from the line is known as the error
or residual. Note that the residual or error of a point can be positive, nega-
tive, or zero depending upon whether the point is above, below, or on the
fitted line. If the point is above the line, the error is positive, whereas if
the point is below the fitted line, the error is negative.
Figure 7.4 shows graphically the errors for a few points. To demon-
strate how the error or residual for a point is calculated, refer to the data
in Table 7.1.
Figure 7.4 Fitting the regression line to the sales and advertising data
of table 7.1
This table shows that for the advertising expenditure of 40 (or,

x = 40 ) the sales is 498 or ( y = 498 ). This is shown graphically in in
Figure 7.4. The estimated or predicted sales for x = 40 equals the vertical
distance all the way up to the fitted regression line from y = 498 . This
predicted value can be determined using the equation of the fitted line as
yˆ = −150.9 + 18.33 x = −150.9 + 18.33(40) = 582.3
This is shown in Figure 7.4 as yˆ = 582.3 . The difference between

the observed sales, y = 498 , and the predicted value of y is the error or
residual and is equal to
( y − yˆ ) = (498 − 582.3) = −84.3
Figure 7.4 shows this error value. This error is negative because the
point y = 498 lies below the fitted regression line.
Now, consider the advertising expenditure of x = 44 . The observed
sales for this value is 728 or y = 728 (from Table 7.1). The predicted
sales for x = 44 equals the vertical distance from y = 728 to the fitted
regression line. This value is calculated as:
yˆ = −150.9 + 18.33 x = −150.9 + 18.33(44) = 655.62
The value is shown in Figure 7.4. The error for this point is the dif-
ference between the observed and the predicted, or the estimated value
which is
( y − yˆ ) = (728 − 655.62) = 72.38
This value of the error is positive because the point y = 728 lies
above the fitted line.
The errors for the other observed values can be calculated in a similar
way. The vertical deviation of a point from the fitted regression line rep-
resents the amount of error associated with that point. The least squares
method determines the values b0 and b1 in the fitted regression line
ŷ = b0 + b1 x that will minimize the sum of the squares of the errors.
Minimizing the sum of the squares of the errors provides a unique line
through the data points such that the distance of each point from the fit-
ted line is a minimum.
Since the least squares criteria require that the sum of the squares of
the errors be minimized, we have the following relationship:
∑ ( y − yˆ )2 = ∑ ( y − b0 − b1x )2 (7.4)
where y is the observed value and ŷ is the estimated value of the depend-
ent variable given by ŷ = b0 + b1 x
Equation (7.4) involves two unknowns b0 and b1. Using differential
calculus, the following two equations can be obtained:
∑ y = nb0 + b1 ∑ x (7.5)
∑ xy = b0 ∑ x + b1 ∑ x 2
These equations are known as the normal equations and can be solved
algebraically to obtain the unknown values of the slope and y-intercept b0
and b1. Solving these equations yields the results shown below.
b1 =
n∑ xy −(∑ x )(∑ y ) (7.6)
n∑ x − ( ∑ x )
2
2

and b0 = y − b1 x (7.7)
y =
∑y and x = ∑x
where, n n
The values b0 and b1 when calculated using equations (7.6) and (7.7)
minimize the sum of the squares of the vertical deviations or errors. These
values can be calculated easily using the data points ( xi , yi ) which are
the observed values of the independent and dependent variables (the col-
lected data in Table 7.1).
Illustration of Least Squares Regression Method

In this section we will demonstrate the least squares method which is the
basis of regression model. We will also discuss the process of finding the
regression equation using the sales and advertising expenditures data in
Table 7.1. Since the sales manager found a positive linear relationship
between the sales and advertising expenditures through an investigation
of the scatter plot in Figure 7.3, he would now use the data to find the
best fitting line through the points on the scatter plot. The line of best fit
can be obtained by first calculating b0 and b1 using equations (7.6) and
(7.7) above. These values will provide the line of the form y = b0 + b1 x
that can be used to predict the sales (y) using the advertising expendi-
tures (x).
In order to evaluate b0 and b1, we need to perform some inter-
mediate calculations shown in Table 7.2. We must first calculate
∑ x , ∑ y, ∑ xy, ∑ x 2 , x , and y . These values can be calculated using
the data points x and y. For later calculations, we will also need the value
of ∑ y 2 therefore, an extra column for y2, or the squares of the depend-
ent variable (y) is added in this table.
Table 7.2 Intermediate calculations for determining the estimated

regression line
Sales Advertising
($1,000s) ($1,000s)
y x xy x2 y2
1 458 34 15,572 1,156 209,764
2 390 30 11,700 900 152,100
3 378 29 10,962 841 142,884
4 426 30 12,780 900 181,476
5 330 26 8,580 676 108,900
6 400 31 12,400 961 160,000
7 458 33 15,114 1,089 209,764
8 410 30 12,300 900 168,100
9 628 41 25,748 1,681 394,384
10 553 38 21,014 1,444 305,809
11 728 44 32,032 1,936 529,984
12 498 40 19,920 1,600 248,004
13 708 48 33,984 2,304 501,264
14 719 47 33,793 2,209 516,961
15 658 45 29,610 2,025 432,964
∑ y = 7,742 ∑ x = 546 ∑ xy = 295, 509 ∑x 2

= 20, 622 ∑y 2
= 4, 262, 358
Note: n = the number of observations = 15
x =
∑x =
546
= 36.4 y =
∑y =
7, 742
= 516.133
n 15 n 15
Using the values in Table 7.2, and equations (7.6) and (7.7) we first
calculate the value of b1
b1 =
n∑ xy − (∑ x )(∑ y ) = 15(295, 509) − (546)(7, 742) = 18.326
n∑ x − ( ∑ x )
2
2 15(20, 622) − (546) 2
Using the value of b1, we obtain the value of b0.
b0 = y − b1 x = 516.133 − 18.326(36.4) = −150.9

This gives us the following equation for the estimated regression line:
yˆ = −150.9 + 18.33 x
This equation is plotted in Figure 7.5.

The slope (b1) of the estimated regression line has a positive value
of 18.33. This means that as the advertising expenditures (x) increase,
the sales increase. Since the advertising expenditures (x) and the sales
both are measured in $1,000s, the estimated regression equation,
yˆ = −150.9 + 18.33 x means that each unit increase in the value of x (or
every $1,000 increase in the advertising expenditures) will lead to an in-
crease of $18,330 (or 18.33 × 1,000 = 18,330) in expected sales. We can
also use the regression equation to predict the sales for a given value of
x or the advertisement expenditure. For instance, the predicted sales for
x = 40 can be calculated as:
yˆ = −150.9 + 18.33(40) = 582.3
Thus, for the advertising expenditure of $40,000 the predicted sales

would be $582,300.
Figure 7.5 Graph of the estimated regression equation

It is important to check the adequacy of the estimated regression

equation before using the equation to make predictions. In the sections
that follow, we will discuss several tests to check the adequacy of the re-
gression model.
Analysis of a Simple Regression Problem

The example below demonstrates the necessary computations, their inter-
pretation, and application of a simple regression problem using computer
packages. Suppose the operations manager of a manufacturing company
wants to predict the number of hours required to produce a certain num-
ber of products. The data for the number of units produced and the time
in hours to produce those units are shown in the Table 7.3 (Data File:
Hours_Units). This is a simple linear regression problem, so we have one
dependent or response variable that we are trying to relate to one independent
variable or predictor. Since we are trying to predict the number of hours
using the number of units produced; hours is the dependent or response
variable (y) and number of units is the independent variable or predictor (x).
For the data in Table 7.3, we first calculate the intermediate values shown
in Table 7.4. All these values are calculated using the observed values of x
and y in Table 7.3. These intermediate values will be used in most of the
computations related to simple regression analysis.
We will also use computer packages such as MINITAB and EXCEL
to analyze the simple regression problem and provide detailed analysis
of the computer output. First, we will explain the manual calculations
Table 7.3 Data for regression example

ObsNo. 1 2 3 4 5 6 7 8 9 10
Units (x) 932 951 531 766 814 914 899 535 554 445
Hours (y) 16.20 16.05 11.84 14.21 14.42 15.08 14.45 11.73 12.24 11.12
ObsNo. 11 12 13 14 15 16 17 18 19 20
Units (x) 704 897 949 632 477 754 819 869 1,035 646
Hours (y) 12.63 14.43 15.46 12.64 11.92 13.95 14.33 15.23 16.77 12.41
Obs. No. 21 22 23 24 25 26 27 28 29 30
Units (x) 1,055 875 969 1,075 655 1,125 960 815 555 925
Hours (y) 17.00 15.50 16.20 17.50 12.92 18.20 15.10 14.00 12.20 15.50
Table 7.4 Intermediate calculations for data in Table 7.3

n = 30 (number of observations )
x =
∑x = 804.40
∑ x = 24,132 ∑ y = 431.23 ∑ xy = 357, 055 n
∑x 2
= 20, 467, 220 ∑y 2
= 6, 302.3
y =
∑y = 14.374
n
and interpret the results. You will find that all the formulas are written in
terms of the values calculated in Table 7.4.
Constructing a Scatterplot of the Data

We can use EXCEL or MINITAB to do a scatter plot of the data. From
the data in Table 7.3, enter the units (x) in the first column and hours (y)
in second column of EXCEL or MINITAB and construct a scatter plot.
Figure 7.6 shows the scatter plot for this data.
Figure 7.6 Scatter plot of Hours (y) and Units (x)
The above plot clearly shows an increasing trend. It shows a linear re-
lationship between x and y; therefore, the data can be approximated using
a straight line with a positive slope.
Finding the Equation of the Best Fitting Line

(Estimated Line)
The equation of the estimated regression line is given by:
ŷ = b0 + b1 x
where b0 = y-intercept, and b1 = slope. These are determined using the

least squares method. The y-intercept b0 and the slope, b1 are determined
using the equations (7.6) and (7.7) discussed earlier.
Using the values in Table 7.4, first calculate the values of b1 (the slope)
and b0 (the y-intercept) as shown below.
b1 =
n∑ xy −(∑ x )(∑ y ) = 30(357, 055) − (24,132)((431.23) = 0.00964
n∑ x − ( ∑ x )
2
2 30(20, 467, 220) − (24,132)2
and
b0 = y − b1 x = 14.374 − (0.00964)(804.40) = 6.62
Therefore, the equation of the estimated line,
yˆ = b0 + b1 x = 6.62 + 0.00964 x
The regression equation or the equation of the “best” fitting line can
also be written as:
Hours(y) = 6.62 + 0.00964 Units(x)
or simply, yˆ = 6.62 + 0.00964 x

where, y is the hours and x is the number of units produced. The hat (^)
over y means that the line is estimated. Thus, the equation of the line,
in fact, is an estimated equation of the best fitting line. The line is also
known as the least squares line which minimizes the sum of the squares
of the errors. This means that when the line is placed over the scatter plot,
the vertical distance from each of the points to the line is minimized.
The error is the vertical distance of each point from the estimated line.
The error is also known as the residual. Figure 7.7 shows the least squares
line and the residuals for each of the points as the vertical distance from
the point to the estimated regression line.
[Note: The estimated line is denoted by ŷ and the residual for a point
yi is given by ( yi − yˆ )]
Recall that the error or the residual for a point is given by ( y − yˆ )
which is the vertical distance of a point from the estimated line. Figure 7.8
shows the fitted regression line over the scatter plot.
Figure 7.7 The least squares line and residuals
Figure 7.8 Fitted line regression plot

Interpretation of the Fitted Regression Line

The estimated least squares line is of the form y = b0 + b1 x where, b1 is
the slope and b0 is the y-intercept. The equation of the fitted line is
yˆ = 6.62 + 0.00964 x
In this equation of the fitted line, 6.62 is the y-intercept and 0.00964
is the slope. This line provides the relationship between the hours and
the number of units produced. The equation means that for each unit
increase in(the number of units produced), (the number of hours) will
increase by 0.00964. The value 6.62 represents the portion of the hours
that is not affected by the number of units.
Making Predictions Using the Regression Line

The regression equation can be used to predict the number of hours to
produce a certain number of units. For example, suppose we want to pre-
dict the number of hours (y) required to produce 900 units (x). This can
be determined using the equation of the fitted line as:
Hours(y) = 6.62 + 0.00964 Units(x)
Hours(y) = 6.62 + 0.00964 × (900) = 15.296 hours
Thus, it will take approximately 15.3 hours to produce 900 units

of the product. Note that making a prediction outside of the range will
introduce error in the predicted value. For example, if we want to predict
the time for producing 2,000 units; this prediction will be outside of the
data range (see the data in Table 7.3, the range of x values is from 445 to
1,125). The value x = 2, 000 is far greater than all the other x values in
the data. From the scatter plot, a straight line fit with an increasing trend
is evident for the data but we should be cautious about assuming that this
straight line trend will continue to hold for values as large as x = 2, 000 .
Therefore, it may not be reasonable to make this prediction for values that
are far beyond the range of the data values.
The Standard Error of the Estimate(s)

The standard error of the estimate measures the variation or scatter of the
points around the fitted line of regression. This is measured in units of the
response or dependent variable (y). The standard error of the estimate is
analogous to the standard deviation. The standard deviation measures the
variability around the mean, whereas the standard error of the estimate (s)
measures the variability around the fitted line of regression. A large value of s
indicates larger variation of the points around the fitted line of regression.
The standard error of the estimate is calculated using the following formula:
s =
∑ ( y − yˆ )2 (7.7A)
n−2
The equation can also be written and evaluated using the values of b0,
b1 and the values in Table 7.4, the standard error of the estimate can be
calculated as:
s =
∑ y 2 − b0 ∑ y − b1 ∑ xy =
6, 302.3 − 6.62(431.23) − 0.00964(357, 055)
= 0.4
n−2 28
b0 ∑ y − b1 ∑ xy 6, 302.3 − 6.62(431.23) − 0.00964(357, 055)

= = 0.4481 (7.8)
n−2 28
Equation (7.7A) measures the average deviation of the points from

the fitted line of regression. Equation (7.8) is mathematically equivalent
to equation (7.7A) and is computationally more efficient. Thus,
s = 0.4481
A small value of s indicates less scatter of the data points around the fit-
ted line of regression (see Figure 7.8). The value s = 0.4481 indicates that the
average deviation is 0.4481 hours (measured in units of dependent variable y).
Assessing the Fit of the Simple Regression Model: The

Coefficient of Determination (r2) and Its Meaning
The coefficient of determination, r2 is an indication of how well the in-
dependent variable predicts the dependent variable. In other words, it is
used to judge the adequacy of the regression model. The value of r2 lies
between 0 and 1 (0 ≤ r2 ≤ 1) or 0 to 100 percent. The closer the value of r2
to 1 or 100 percent, the better is the model because the r2 value indicates
the amount of variation in the data explained by the regression model.
Figure 7.9 shows the relationship between the explained, unexplained,
and the total variation.
In regression, the total sum of squares is partitioned into two com-
ponents; the regression sum of squares and the error sum of squares giving
the following relationship:
SST = SSR + SSE
SST = total sum of squares for y

SSR = regression sum of squares (measures the variability in y,
accounted for by the regression line, also known as explained variation)
SSE = error sum of squares (measures the variation due to the residual
or error. This is also known as unexplained variation).
yi = any point i; y = average of the y values
Figure 7.9 SST = SSR + SSE

From Figure 7.9, the SST and SSE are calculated as
(∑ y )
2
∑( y − y ) ∑y
2
SST = = 2
− (7.9)
n
and

SSE = ∑ ( y − yˆ )2 = ∑ y 2 − b0 ∑ y − b1 ∑ xy (7.10)
Note that we can calculate SSR by calculating SST and SSE since,
SST = SSR + SSE or SSR = SST − SSE
Using the SSR and SST values, the coefficient of determination, r2 is

calculated using
SSR
r2 = (7.11)
SST
The coefficient of determination, r2 is used to measure the goodness of

fit for the regression equation. It measures the variation in y explained by
the variation in independent variable x or r2 is the ratio of the explained
variation to the total variation.
The calculation of r2 is explained below. First, calculate SST and SSE
using equations (7.9) and (7.10) and the values in Table 7.3.
(∑ y )2 (431.23)2
∑( y − y ) ∑ y2 −
2
SST = = = 6302.3 − = 103.68
80
n 30
SSE = ∑ ( y − yˆ )2 = ∑ y 2 − b0 ∑ y − b1 ∑ xy = 6, 302.3 − 6.62(431.23)) − 0.00964(357

2
− b0 ∑ y − b1 ∑ xy = 6, 302.3 − 6.62(431.23)) − 0.00964(357, 055) = 5.623
Since
SST = SSR + SSE

Therefore,
SSR = SST − SSE = 103.680 − 5.623 = 98.057 (7.12)
and
SSR 98.057
r2 = = = 0.946
SST 103.680
or, r2 = 94.6%
This means that 94.6 percent variation in the dependent variable, y is
explained by the variation in x and 5.4 percent of the variation is due to
unexplained reasons or error.
The Coefficient of Correlation (r) and Its Meaning

The coefficient of correlation, r can be calculated by taking the square
root of r2 or,
r = r2 (7.13)
Therefore,
r = r2 = 0.946 = 0.973
In this case, r = 97.3% indicates a strong positive correlation between

x and y. Note that r is positive if the slope b1 is positive indicating a posi-
tive correlation between x and y. The value of r is between −1 and +1.
−1 ≤ r ≤ 1 (7.14)
The value of r determines the correlation between x and y variables.

The closer the value of r to −1 or +1, stronger is the correlation between
x and y.
The value of the coefficient of correlation r can be positive or negative.
The value of r is positive if the slope b1 is positive; it is negative if b1 is
negative. If r is positive it indicates a positive correlation, whereas a nega-

tive r indicates a negative correlation. The coefficient of correlation r can
also be calculated using the following formula:
(∑ x )(∑ y )
∑ xy − n
r = (7.15)
(∑ x ) (∑ y )
2 2
∑ x2 − n
× ∑ y2 − n

Using the values in Table 7.4, we can calculate r from equation (7.15).
Summary of the Main Features of the Simple

Regression Model Discussed Above
The sections above illustrated the least squares method which is the basis
of regression model. The process of finding the regression equation using
the least squared method was demonstrated using the sales and adver-
tising expenditures data. The problem involved predicting the sales–the
response or the dependent variable (y) using the predictor or independent
variable (x)—the advertising expenditures. Another example involved the
number of hours (y) required to produce the number of products (x) The
analysis of this simple regression problem was presented by calculating
and interpreting several measures. In particular, the following analyses
were performed: (a) constructing a scatterplot of the data, (b) finding
the equation of the best fitting line, (c) interpreting the fitted regression
line, and (d) making predictions using the fitted regression equation.
Other important measures critical to assessing the quality of the regres-
sion model were calculated and explained. These measures include: (a) the
standard error of the estimate (s) that measures the variation or scatter of
the points around the fitted line of regression, (b) the coefficient of deter-
mination (r2) that measures how well the independent variable predicts
the dependent variable or the percent of variation in the dependent vari-
able y explained by the variation in the independent variable, x, (c) the
coefficient of correlation (r) that measures the strength of relationship
between x and y.
Regression Analysis Using Computer

This section provides a step-wise computer analysis of regression model.
In real world, computer software is almost always used to analyze regres-
sion problems. There are a number of computer software in use today
among which MINITAB, EXCEL, SAS, SPSS are few. Here, we have
used Excel and MINITAB computer packages to analyze the regression
models. The applications of simple, multiple, and higher order regres-
sions using EXCEL and MINITAB software are demonstrated in this and
subsequent sections. If you perform regression analysis with substantial
amount of data and need more detailed analyses, the use of statistical
package such as MINITAB, SAS, and SPSS is recommended. Besides
these, a number of software including R, Stata and others are available
readily and are widely used in research and data analysis.
Simple Regression Using EXCEL
The instructions in Table 7.5 will produce the regression output shown in
Table 7.6. If you checked the boxes under Residuals and the Line Fit Plots,
the residuals and fitted line plot will be displayed.
Table 7.5 EXCEL instructions for regression

1. Label columns A and B of EXCEL worksheet with Units (x) and Hours (y) and
enter the data of Table 7.3 or, open the EXCEL data file: Hours_Units.xlsx
2. Click the Data tab on the main menu
3. Click Data Analysis tab (on far right)
4. Select Regression
5. Select Hours(y) for Input y range and Units(x) for Input x range (including the
labels)
6. Check the Labels box
7. Click on the circle to the left of Output Range, click on the box next to output
range and specify where you want to store the output by clicking a blank cell (or
select New Worksheet Ply)
8. Check the Line Fit Plot under residuals. Click OK
You may check the boxes under residuals and normal probability plot as desired.
Table 7.6 shows the output with regression statistics. We calculated

all these manually except the adjusted R-Squared in the previous chapter.
The regression equation can be read from the Coefficients column. The
regression coefficients are b0 and b1; the y-intercept and the slope. In the
128
Table 7.6 EXCEL regression output
coefficients column, 6.620904991 is the y-intercept and 0.009638772 is

the slope. The regression equation from this table is
yˆ = 6.62 + 0.00964 x
This is the same equation we obtained earlier using manual calculations.
The Coefficient of Determination (r2) Using EXCEL

The values of SST and SSR were calculated manually in the previous
chapter. Recall that in regression, the total sum of squares is partitioned
into two components; the regression sum of squares (SSR) and the error
sum of squares (SSE), giving the following relationship: SST = SSR +
SSE. The coefficient of determination r2 which is also the measure of
goodness of fit for the regression equation can be calculated using
SSR
r2 =
SST
The values of SSR, SSE, and SST can be obtained using the ANOVA
table of regression output above which is part of the regression analysis
output of EXCEL. Table 7.7 shows the EXCEL regression output with
SSR and SST values. Using these values, the coefficient of determination,
r 2 = SSR / SST = 0.9458 . This value is reported under regression sta-
tistics in Table 7.7.
The t-test and F-test for the significance of regression can be easily
performed using the information in the EXCEL computer output under
the ANOVA table. Table 7.8 shows the EXCEL regression output with
the ANOVA table.
(1) Conducting the t-Test Using the Regression Output in Table 7.8.
The test statistic for conducting the significance of regression is given by

the following equation:
t n − 2 = b1 sb1
130
131
The values of b1, sb1 and the test-statistic value t n − 2 are labeled in
Table 7.8 below.
Using the test-statistic value, the hypothesis test for the significance
of regression can be conducted. This test is explained here using the com-
puter results. The appropriate hypotheses for the test are:
H 0 : β1 = 0
H1 : β1 ≠ 0
The null hypothesis states that the slope of the regression line is zero.
Thus, if the regression is significant, the null hypothesis must be rejected.
A convenient way of testing the above hypotheses is to use the p-value
approach. The test statistic value t n − 2 and the corresponding p values are
reported in the regression output Table 7.8. Note that the p value is very
close to zero (p = 2.92278E-19). If we test the hypothesis at a 5 percent
level of significance (α = 0.05) then p = 0.000 is less than α = 0.05 and
we reject the null hypothesis and conclude that the regression is signifi-
cant overall.
Simple Regression Using MINITAB
The regression results using MINITAB is explained in this section. We

created a scatter plot, a fitted line plot (a plot with the best fitting line)
and the regression results for the data in Table 7.3. We already analyzed
the results from EXCEL above.
[Note: Readers can download a free 30 days trial copy of the MINITAB
version 17 or 18 software from www.minitab.com]
The scatter plot shown in Figure 7.10 shows an increasing or direct
relationship between the number of units produced (x) and the number
of hours (y). Therefore, the data may be approximated by a straight line of
the form y = b0 + b1 x where, b0 is the y-intercept and b1 is the slope. The
fitted line plot with the regression equation from MINITAB is shown in
Figure 7.11. Also, the “Regression Analysis” and “Analysis of Variance” ta-
bles shown in Table 7.9 will be displayed. We will first analyze the regres-
sion and the analysis of variance tables and then provide further analysis.
Figure 7.10 Scatterplot of Hours (y) and Units (x)
Figure 7.11 Fitted line and regression equation

Analysis of Regression Output in Table 7.9
Refer to the Regression Analysis part. In this table, the regression equation
is printed as Hours(y) = 6.62 + 0.00964 Units(x). This is the equation of
the best fitting line using the least squares method. Just below the regression
equation, a table is printed that describes the model in more detail. The val-
ues under the Coef column means coefficients. The values in this column
refer to the regression coefficients b0 and b1 where b0 is the y-intercept or
constant and b1 is the slope of the regression line. Under the Predictor, the
value of Units (x) is 0.0096388 which is b1 (or the slope of the fitted line).
The Constant is 6.6209. These values form the regression equation.
Table 7.9 The regression analysis and analysis of variance tables

using MINITAB
Refer to Table 7.9 above
1. The regression equation or the equation of the “best” fitting line is:
Hours(Y) = 6.62 + 0.00964 Units(X)
or, yˆ = 6.62 + 0.00964 x where y is the hours and x is the units

produced.
This line minimizes the sum of the squares of the errors. This means
that when the line is placed over the scatter plot, the vertical distance
from each of the points to the line is minimum. The error or the
residual is the vertical distance of each point from the estimated line.
Figure 7.12 shows the least squares line and the residuals. The re-
sidual for a point is given by ( y − y ) which is the vertical distance
of a point from the estimated line.
Figure 7.12 The least squares line and residuals

[Note: The estimated line is denoted by y^ and the residual for a point yi is given by (yi-y^)]
The estimated least squares line is of the form y = b0 + b1x where b1 is the slope and b0 is the
y-intercept. In the regression equation: Hours(Y) = 6.62 + 0.00964 Units(X), 6.62 is the
y-intercept and 0.00964 is the slope. This line provides the relationship between the hours and
the number of units produced. The equation states that for each unit increase in x (the number
of units produced), y (the number of hours) will increase by 0.00964.
2. The Standard Error of the Estimate (s)

The standard error of the estimate measures the variation of the
points around the fitted line of regression. This is measured in units
of the response or dependent variable (y).
In regression analysis, the standard error of the estimate is re-
ported as s. The value of s is reported in Table 7.9 under “Regression
Analysis.” This value is
s = 0.4481
A small value of s indicates less scatter of the points around the

fitted line of regression.
3. The Coefficient of Determination (r2)

The coefficient of determination, r2 is an indication of how well
the independent variable predicts the dependent variable. In other
words, it is used to judge the adequacy of the regression model. The
value of r2 lies between 0 and 1 (0 ≤ r2≤ 1) or 0 to 100 percent.
The closer the value of r2 to 1 or 100 percent, better is the model.
The r2 value indicates the amount of variability in the data explained
by the regression model. In our example, the r2 value is 94.6 percent
(Table 7.9, Regression Analysis). The value of r2 is reported as:
R-Sq = 94.6%
This means that 94.6 percent variation in the dependent variable, y

can be explained by the variation in x and 5.4 percent of the variation is
due to unexplained reasons or error.
The R-Sq(adj) = 94.4 percent next to the value of r2 in the regression
output is the R2-adjusted value. This is the r2 value adjusted for the de-
grees of freedom. This value has more importance in multiple regression.
Model Adequacy Test
To check whether the fitted regression model is adequate, we first review

the assumptions on which regression is based followed by the residual
plots that are used to check the model assumptions.
Residuals: A residual or error for any point is the difference between the
actual y value and the corresponding estimated value (denoted by y-cap, ŷ ).
Thus, for a given value of , the residual is given by: e = ( y − yˆ ) )
Assumptions of Regression Model and Checking the Assumptions

Using MINITAB Residual Plots
The regression analysis is based on the following assumptions:

(1) Independence of errors (2) Normality assumption
(3) Assumption regarding E(y): the expected values of y fall on the
same straight line described by the model E ( y ) = β0 + β1 x (4)
Equal variance, and (5) Linearity
The assumption regarding the independence of errors can be evaluated

by plotting the errors or residuals in the order or the sequence in which
the data were collected. If the errors are not independent, a relationship
exists between consecutive residuals which is a violation of the assump-
tion of independence of errors. When the errors are not independent,
the plot of residuals versus the time (or the order) in which the data were
collected will show a cyclical pattern. Meeting this assumption is particu-
larly important when data are collected over a period of time. If the data
are collected at different time periods, the errors for specific time period
may be correlated with the errors of those of the previous time periods.
The assumption that the errors are normally distributed or the nor-
mality assumption requires that the errors have a normal or approximately
normal distribution. Note that this assumption means that the errors do
not deviate too much from normality. The assumption can be verified by
plotting the histogram or the normal probability plot of errors.
The assumption that the variance of errors are equal (equality of vari-
ance) is also known as homoscedasticity. This requires that the errors are
constant for all values of x or the variability of y values is the same for both
the low and high values of x. The equality of variance assumption is of
particular importance for making inferences about b0 and b1.
The linearity assumption means that the relationship between the
variables is linear. This assumption can be verified using residual plot to
be discussed in the next section.
To check the validity of the above regression assumptions, a graphical
approach known as the residual analysis is used. The residual analysis is
also used to determine whether the selected regression model is an ap-
propriate model.
Checking the Assumptions of Regression Using MINITAB

Residual Plots
Several residual plots can be created using EXCEL and MINITAB to

check the adequacy of the regression model. The plots are shown in
Figure 7.13a through 7.13d.
The plots to check the regression assumptions include the histogram of
residuals, normal plot of residuals, plot of the residuals vs. fits, and residuals
vs. order of data. The residuals can also be plotted with each of the in-
dependent variables.
Figures 7.13a and 7.13b are used to check the normality assumption.
The regression model assumes that the errors are normally distributed
with mean zero. Figure 7.13a shows the normal probability plot. This plot
is used to check for the normality assumption of regression model. In this
plot, if the plotted points lie on a straight line or close to a straight line
then the residuals or errors are normally distributed. The pattern of points
appear to fall on a straight line indicating no violation of the normality
assumption.
Figure 7.13b shows the histogram of residuals. If the normality as-
sumption holds, the histogram of residuals should look symmetrical or
approximately symmetrical. Also, the histogram should be centered at
zero because the sum of the residuals is always zero. The histogram of
residuals is approximately symmetrical which indicates that the errors ap-
pear to be approximately normally distributed. Note that the histogram
may not be exactly symmetrical. We would like to see a pattern that is
symmetrical or approximately symmetrical.
In Figures 7.13c, the residuals are plotted against the fitted value and
the order of the data points. These plots are used to check the assump-
tions of linearity. The points in this plots should be scattered randomly
around the horizontal line drawn through the zero residual value for the
linear model to be valid. As can be seen, the residuals are randomly scat-
tered about the horizontal line indicating that the relationship between x
and y is linear.
The plot of residual vs. the order of the data shown in Figure 7.13d is
used to check the independence of errors.
The independence of errors can be checked by plotting the errors or
the residuals in the order or sequence in which the data were collected.
The plot of residuals vs. the order of data should show no pattern or ap-
parent relationship between the consecutive residuals. This plot shows
no apparent pattern indicating that the assumption of independence of
errors is not violated.
Note that checking the independence of errors is more important in
the case where the data were collected over time. Data collected over time
sometimes may show an autocorrelation effect among successive data
Figure 7.13 Plots for residual analysis
139

values. In these cases, there may be a relationship between consecutive

residuals that violates the assumption of independence of errors.
The equality of variance assumption requires that the errors are con-
stant for all values of x or the variability of y is the same for both the low
and high values of x. This can be checked by plotting the residuals and the
order of data points. This plot is shown in Figure 7.13d. If the equality
of variance assumption is violated, this plot will show an increasing trend
showing an increasing variability. This demonstrates a lack of homogene-
ity in the variances of y values at each level of x. The plot shows no viola-
tion of equality of variance assumption.
Multiple Regression: Computer Analysis and Results

Introduction to Multiple Regression
In the previous chapter we explored the relationship between two vari-

ables using the simple regression and correlation analysis. We demon-
strated how the estimated regression equation can be used to predict a
dependent variable (y) using an independent variable (x). We also dis-
cussed the correlation between two variables that explains the degree of
association between two variables. In this chapter, we expand the concept
of simple linear regression to include multiple regression analysis. A mul-
tiple linear regression involves one dependent or response variable, and two
or more independent variables or predictors. The concepts of simple regres-
sion discussed earlier are also applicable to the multiple regression.
Multiple Regression Model
The mathematical form of multiple linear regression model relating the de-
pendent variable y and two or more independent variables x1 , x 2 ,… xk
with the associated error term is given by:
y = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk + ε (7.16)
where, x1 , x 2 ,… xk are k independent or explanatory variables;

β0 , β1 , β 2 ,.. βk are the regression coefficients, and ε is the associated
error term. Equation (7.16) can be viewed as a population multiple re-

gression model in which y is a linear function of unknown parameters
β0 , β1 , β 2 ,.. βk and an error term. The error ε explains the variability in
y that cannot be explained by the linear effects of the independent vari-
ables. The multiple regression model is similar to the simple regression
model except that multiple regression involves more than one independ-
ent variable.
One of the basic assumptions of the regression analysis is that the
mean or the expected value of the error is zero. This implies that the mean
or expected value of y or E = ( y ) in the multiple regression model can
be given by:
E = ( y ) = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk (7.17)

The above equation relating the mean value of y and the k independ-
ent variables is known as the multiple regression equation.
It is important to note that β0 , β1 , β 2 ,.. βk are the unknown popula-
tion parameters, or regression coefficients and they must be estimated
using the sample data to obtain the estimated equation of multiple regres-
sion. The estimated regression coefficients are denoted by b0 , b1 , b2 ,.. bk .
These are the point estimates of the parameters β0 , β1 , β 2 ,.. βk . The esti-
mated multiple regression equation using the estimates of the unknown
population regression coefficients can be written as:
( yˆ ) = b0 + b1x1 + b2 x2 + b3 x3 +…. + bk xk

(7.18)
where ŷ = point estimator of E = ( y ) or the estimated value of the

response y b0 , b1 , b2 ,.. bk . are the estimated regression coefficients and are
the estimates of β0 , β1 , β 2 ,.. βk
Equation (7.18) is the estimated multiple regression equation and can
be viewed as the sample regression model. The regression equation with
the sample regression coefficients is written as in equation (7.18). This
equation defines the regression equation for k independent variables.
In equation (7.16), β0 , β1 , β 2 ,.. βk denote the regression coefficients
for the population. The sample regression coefficients b0 , b1 , b2 ,.. bk are
the estimates of the population parameters and can be determined using

the least squares method.
In a multiple linear regression, the variation in y (the response vari-
able) may be explained using two or more independent variables or pre-
dictors. The objective is to predict the dependent variable. Compared to
simple linear regression, a more precise prediction can be made because
we use two or more independent variables. By using two or more in-
dependent variables, we are often able to make use of more information
in the model. The simplest form of a multiple linear regression model
involves two independent variables and can be written as:
y = β0 + β1 x1 + β 2 x 2 + ε (7.19)

Equation (7.19) describes a plane. In this equation β0 is the y-intercept

of the regression plane. The parameter β1 indicates the average change in
y for each unit change in x1 when x2 is constant. Similarly, β2 indicates
the average change in y for each unit change in x2 when x1 is held con-
stant. When we have more than two independent variables, the regression
equation of the form described using equation (7.18) is the equation of a
hyperplane in an n-dimensional space.
The Least Squares Multiple Regression Model

The regression model is described in form of a regression equation that is
obtained using the least squares method. Recall that in a simple regression,
the least squares method requires fitting a line through the data points so that the
sums of the squares of errors or residuals are minimized. These errors or residuals
are the vertical distances of the points from the fitted line. The same concept
of simple regression is used to develop the multiple regression equation.
In a multiple regression, the least squares method determines the best
fitting plane or the hyperplane through the data points that ensures that
the sum of the squares of the vertical distances or deviations from the
given points and the plane are a minimum.
Figure 7.14 shows a multiple regression model with two independent
variables. The response y with two independent variables x1 and x2 forms
a regression plane. The observed data points in the figure are shown using
Figure 7.14 Scatter plot and regression plane with two independent
variables
dots. The stars on the regression plane indicate the corresponding points
that have identical values for x1 and x2. The vertical distance from the ob-
served points to the point on plane are shown using vertical lines. These
vertical lines are the errors. The error for a particular point yi is denoted by
( yi − yˆ ) where the estimated value ŷ is calculated using the regression
equation: ŷ = b0 + b1 x1 + b2 x 2 for a given value of x1 and x2.
The least squares criteria requires that the sum of the squares of the
errors be minimized, or,
∑ ( y − yˆ )2
where y is the observed value and ŷ is the estimated value of the depend-
ent variable given by ŷ = b0 + b1 x1 + b2 x 2
[Note: The terms independent, or explanatory variables, and the predictors have the
same meaning and are used interchangeably in this chapter. The dependent variable
is often referred to as the response variable in multiple regression.]
Similar to the simple regression, the least squares method uses the
sample data to estimate the regression coefficients b0 , b1 , b2 ,.. bk and
hence the estimated equation of multiple regression. Figure 7.15 shows
the process of estimating the regression coefficients and the multiple re-
gression equation.
144
Figure 7.15 Process of estimating the multiple regression equation
Models with Two Quantitative Independent Variables

x1 and x2
The model with two quantitative independent variables is the simplest
multiple regression model. It is a first order model and is written as:
y = b0 + b1 x1 + b2 x 2 (7.20)
where, b0 = y -intercept, the value of y when x1 = x 2 = 0

b1 = change in y for a 1-unit increase in x1 when x2 is constant
b2 = change in y for a 1-unit increase in x2 when x1 is constant
The graph of the first order model is shown in Figure 7.16. This graph
with two independent quantitative variables x1 and x2 plots a plane in a
three-dimensional space. The plane plots the value of y for every combin-
ation ( x1 , x 2 ). This corresponds to the points in the ( x1 , x 2 ) plane.
The first-order model with two quantitative variables x1 and x2 is
based on the assumption that there is no interaction between x1 and x2.
This means that the effect on the response of y of a change in x1(for a
fixed value of x2) is same regardless of the value of x2 and the effect on
y of a change in x2 (for a fixed value of x1) is same rardless of the value
of x1.
In case of simple regression analysis in the previous chapter, we pre-
sented both the manual calculations and the computer analysis of the
problem. Most of the concepts we discussed for simple regression also
apply to the multiple regression; however, the computations for multiple
regression are more involved and require the use of matrix algebra and
other mathematical concepts which are beyond the scope of this text.
Therefore, in this chapter, we have provided computer analysis of the
multiple linear regression models using EXCEL and MINITAB. This sec-
tion provides examples with computer instructions and analysis of the
computer results. The assumptions and the interpretation of the multiple
linear regression models are similar to that of the simple linear regression.
As we provide the analysis, we will point out the similarities and the dif-
ferences between the simple and multiple regression models.
Figure 7.16 A multiple regression model with two quantitative

variables
Assumptions of Multiple Regression Model

As discussed earlier, the relationship between the response variable (y) to the
independent variables x1 , x 2 ,.. , xk in the multiple regression is assumed to
be a model of the form y = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk + ε
where, β0 , β1 , β 2 ,.. βk are the regression coefficients, and ε is the associ-
ated error term. The multiple regression model is based on the following
assumptions about the error term ε.
1. The independence of errors assumption. The assumption—

independence of errors means that the errors are independent of each
other. That is, the error for a set of values of independent variables
is not related to the error for any other set of values of independent
variables. This assumption is critical when the data are collected over
different time periods. When the data are collected over time, the er-
rors in one-time period may be correlated with another time period.
2. The normality assumption. This means that the errors or residuals
(εi) calculated using ( yi − yˆ ) are normally distributed. The nor-
mality assumption in regression is fairly robust against departures
from normality. Unless the distribution of errors is extremely dif-
ferent from normal, the inferences about the regression parameters
β0 , β1 , β 2 ,.. βk are not affected seriously.
The error assumption. The error, ε is a random variable with mean

or expected value of zero, that is, E (ε ) = 0 . This implies that the
mean values of the dependent variable y , for a given value of the in-
dependent variable, x is the expected, or the mean value of y
3. denoted by E ( y ) and the population regression model can be
written as:
E ( y ) = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk
4. Equality of variance assumption. This assumption requires that the

variance of the errors (εi), denoted by σ2 are constant for all values of
the independent variables x1 , x 2 ,.., xk . In case of serious departure
from the equality of variance assumption, methods such as weighted
least-squares, or data transformation may be used.
[Note: The terms error and residual have the same meaning and
these terms are used interchangeably in this chapter.]
Computer Analysis of Multiple Regression

In this section we provide a computer analysis of multiple regression.
Due to the complexity involved in the computation, computer software is
always used to model and solve regression problems. We discuss the steps
using MINITAB and EXCEL.
Problem Description: The home heating cost is believed to be re-
lated to the average outside temperature, size of the house, and the
age of the heating furnace. A multiple regression model is to be fitted
to investigate the relationship between the heating cost and the three
predictors or independent variables. The data in Table 7.10 shows
the home heating cost (y), average temperature (x1), house size (x2)
in thousands of square feet, and the age of the furnace (x3) in years.
The home heating cost is the response variable and the other three
variables are predictors. (The data for this problem: HEAT_COST.
MTW, EXCEL data file: HEAT_COST.xlsx) is listed in Table 7.10
below.
Table 7.10 Data for home heating cost

Row Avg Temp House Size Age of Furnace Heating Cost
1 37 3.0 6 210
2 30 4.0 9 365
3 37 2.5 4 182
4 61 1.0 3 65
5 66 2.0 5 82
6 39 3.5 4 205
7 15 4.1 6 360
8 8 3.8 9 295
9 22 2.9 10 235
10 56 2.2 4 125
11 55 2.0 3 78
12 40 3.8 4 162
13 21 4.5 12 405
14 40 5.0 6 325
15 61 1.8 5 82
16 21 4.2 7 277
17 63 2.3 2 99
18 41 3.0 10 195
19 28 4.2 7 240
20 31 3.0 4 144
21 33 3.2 4 265
22 31 4.2 11 355
23 36 2.8 3 175
24 56 1.2 4 57
25 35 2.3 8 196
26 36 3.6 6 215
27 9 4.3 8 380
28 10 4.0 11 300
29 21 3.0 9 240
30 51 2.5 7 130
Constructing Scatter Plots and Matrix Plots

We begin our analysis by constructing scatter plots and matrix plots of the
data. These plots provide useful information about the model. We first
construct scatterplots of the response (y) versus each of the independent
or predictor variables Figure 7.17. If the scatterplots of y on the independ-

ent variables appear to be linear enough, a multiple regression model can
be fitted. Based on the analysis of the scatter plots of y and each of the
independent variables, an appropriate model (for example, a first order
model) can be recommended to predict the home heating cost.
A first order multiple regression model does not include any higher order
terms (e.g., x2). An example of a first-order model with five independent vari-
ables can be written as:
y = b0 + b1 x1 + b2 x 2 + b3 x3 + b4 x 4 + b5 x5 (7.21)

The multiple linear regression model is based on the assumption that

the relationship between the response and the independent variables is
linear. This relationship can be checked using a matrix plot. The mat-
rix plot is used to investigate the relationships between pairs of variables
by creating an array of scatterplots. MINITAB provides two options for
constructing the matrix plot: Matrix of Plots and Each Y versus each X. The
first of these plots is used to investigate the relationships among pairs of
variables when there are several independent variables involved. The other
plot (each y versus each x) produces separate plots of the response y and
each of the explanatory or independent variable.
Recall that in a simple regression, a scatter plot was constructed to
investigate the relationship between the response y and the predictor. A
matrix plot should be constructed when two or more independent vari-
ables are investigated. To investigate the relationships between the re-
sponse and each of the independent or explanatory variables before fitting
a multiple regression model, a matrix plot may prove to be very useful.
The plot allows graphically visualizing the possible relationship between
response and independent variables. The plot is also very helpful in inves-
tigating and verifying the linearity assumption of multiple regression and
to determine which explanatory variables are good predictors of y. For
this example, we have constructed matrix plots using MINITAB.
Figure 7.17 shows such a matrix plot (each y versus each). In this
plot, the response variable y is plotted with each of the independent
variables. The plot shows scatterplots for heating cost (y) versus each of
the independent variables: average temperature, house size, and age of
150
Figure 7.17 Matrix plot of each y vs. each x
the furnace. An investigation of the plot shows an inverse relationship

between the heating cost and the average temperature (the heating cost
decreases as the temperature rises) and a positive relationship between
the heating cost and each of the other two variables: house size and age
of the furnace. The heating cost increases with the increasing house size
and also with the older furnace. None of these plots show bending (non-
linear or curvilinear) patterns between the response and the explanatory
variables. The presence of bending patterns in these plots would suggest
transformation of variables. The scatterplots in Figure 7.17 (also known
as side-by-side scatter plots) show linear relationship between the response
and each of the explanatory variables indicating all the three explanatory
variables could be a good predictor of the home heating cost. In this case,
a multiple linear regression would be an adequate model for predicting
the heating cost.
Matrix of Plots: Simple

Another variation of the matrix plot is known as “matrix of plots” in
MINITAB and is shown in Figure 7.18. This plot provides scatterplots
that are helpful in visualizing not only the relationship of the response
variable with each of the independent variables but also provides scat-
terplots that are useful in assessing the interaction effects between the
variables. This plot can be used when more detailed model beyond a
first-order model is of interest. Note that the first order model is the one
that contains only the first order terms; with no square or interaction
terms and is written as y = b0 + b1 x1 + b2 x 2 + + bk xk
The matrix plot in Figure 7.18 is a table of scatterplots with each cell
showing a scatterplot of the variable that is labeled for the column versus
the variable labeled for the row. The cell in the first row and first column
displays the scatterplot of heating cost (y) versus average temperature (x1).
The plot in the second row and first column is the scatterplot of heat-
ing cost (y) and the house size (x2) and the plot in the third row and the
first column shows the scatterplot of heating cost (y) and the age of the
furnace (x3).
The second column and the second row of the matrix plot shows a
scatterplot displaying the relationship between average temperature (x1)
152
Figure 7.18 Matrix plot
and the house size (x2). The scatterplots showing the relationship between
the pairs of independent variables are obtained from columns 2 and 3 of
the matrix plot. The matrix plot is helpful in visualizing the interaction
relationships. For fitting the first order model, a plot of y versus each x is
adequate.
The matrix plots in Figures 7.17 and 7.18 show a negative association
or relationship between the heating cost (y) and the average temperature
(x1) and a positive association or relationship between the heating cost (y)
and the other two explanatory variables: house size (x2) and the age of the
furnace (x3). All these relationships are linear indicating that all the three
explanatory variables can be used to build a multiple regression model.
Constructing the matrix plot and investigating the relationships between
the variables can be very helpful in building a correct regression model.
Multiple Linear Regression Model

Since a first order model can be used adequately to predict the home heat-
ing cost, we will fit a multiple linear regression model of the form
y = b0 + b1 x1 + b2 x 2 + b3 x3
where,
y = Home heating cost (dollars), x1 = Average temperature (in °F)

x2 = Size of the house (in thousands of square feet), x3 = Age of the furnace
(in years)
Table 7.10 and data file HEAT_COST.MTW shows the data for
this problem. We used MINITAB to run the regression model for this
problem.
Table 7.11 shows the results of running the multiple regression prob-
lem using MINITAB. In this table, we have marked some of the calcula-
tions (e.g., b0, b1, sbo, sb1, etc.) for clarity and explanation. These are not
the part of the computer output. The regression computer output has two
parts: Regression Analysis and Analysis of Variance.
Table 7.11 MINITAB regression analysis results
The Regression Equation

Refer to the “Regression Analysis” part of Table 7.11 for analysis. Since
there are three independent or explanatory variables, the regression equa-
tion is of the form:
y = b0 + b1 x1 + b2 x 2 + b3 x3
The regression equation from the computer output is
Heating Cost = 44.4 − 1.65 Avg. Temp. + 57.5

House Size + 7.91 Age of Furnace (7.22)
or
yˆ = 44.4 − 1.65 x1 + 57.5 x 2 + 7.91x3 (7.23)
where, y is the response variable (Heating Cost), x1, x2, x3 are the in-
dependent variables as described above, the regression coefficients
b0 , b1 , b2 , b3 are stored under the column Coef. In the regression equation
these coefficients appear in rounded form.
The regression equation which can be stated in the form of equation
(7.22) or (7.23) is the estimated regression equation relating the heating
cost to all the three independent variables.
Interpreting the Regression Equation

Equation (7.22) or (7.23) can be interpreted in the following way:
• b1 = −1.65 means that for each unit increase in the average tem-
perature (x1), the heating cost y (in dollars) can be predicted to go
down by 1.65 (or, $1.65) when the house size (x2), and the age of
the furnace (x3) are held constant.
• b2 = +57.5 means that for each unit increase in the house size (x2
in thousands of square feet), the heating cost, y (in dollars) can be
predicted to go up by 57.5 when the average temperature (x1) and
the age of the furnace (x3) are held constant.
• b3 = + 7.91 means that for each unit increase in the age of the furnace
(x3 in years), the heating cost y can be predicted to go up by $7.91 when
the average temperature (x1) and the house size (x2) are held constant.
Standard Error of the Estimate(s) and Its Meaning

The standard error of the estimate or the standard deviation of the model
s is a measure of scatter or the measure of variation of the points around
the regression hyperplane. A small value of s is desirable for a good regres-
sion model. The estimation of y is more accurate for smaller values of s.
The value of the standard error of estimate is reported in the regression
analysis (see Table 7.11). This value is measured in terms of the response
variable (y). For our example, the standard error of the estimate,
s = 37.32 dollars
The standard error of the estimate is used to check the utility of the
model and to provide a measure of reliability of the prediction made from
the model. One interpretation of s is that the interval ±2s will provide an ap-
proximation to the accuracy with which the regression model will predict the
future value of the response y for given values of. Thus, for our example, we
can expect the model to provide predictions of heating cost (y) to be within
± 2 s = ± 2 ( 37.32 ) = ± 74.64 dollars.

The Coefficient of Multiple Determination (r2)

The coefficient of multiple determination is often used to check the ad-
equacy of the regression model. The value of r2 lies between 0 and 1, or
0 percent and 100 percent, that is, 0 ≤ r2 ≤ 1. It indicates the fraction
of total variation of the dependent variable y that is explained by the in-
dependent variables or predictors. Usually, closer the value of r2 to 1 or
100 percent; stronger is the model. However, one should be careful in
drawing conclusions based solely on the value of r2. A large value of r2
does not necessarily mean that the model provides a good fit to the data.
In case of multiple regression, addition of a new variable to the model
always increases the value of r2 even if the added variable is not statistically
significant. Thus, addition of a new variable will increase r2 indicating a
stronger model but may lead to poor predictions of new values. The value
of r2 can be calculated using the expression
SSE 36, 207

r2 = 1 − = 1− = 0.88
SST 301, 985
SSR 265, 777

r2 = = = 0.88
SST 301, 985
In the above equations, SSE is the sum of square of errors (unex-

plained variation or error), SST is the total sum of squares, and SSR is
the sum of squares due to regression (explained variation). These values
can be read from the “Analysis of Variance” part of Table 7.11. From this
table, The value of r2 is calculated and reported in the “Regression Analy-
sis” part of Table 7.11. For our example the coefficient of multiple deter-
mination; r2 (reported as R-sq) is
r2 = 88.0%
This means that 88.0 percent of the variability in y is explained by the

three independent variables used in the model. Note that r2 = 0 implies
a complete lack of fit of the model to the data; whereas, r2 = 1 implies a
perfect fit.
The value of r2 = 88.0% for our example implies that using the three
independent variables; average temperature, size of the house, and the age
of the furnace in the model, 88.0 percent of the total variation in heating
cost (y) can be explained. The statistic r2 tells how well the model fits the
data, and thus, provides the overall predictive usefulness of the model.
The value of adjusted R2 is also used in comparing two regression
models that have the same response variable but different number of in-
dependent variables or predictors.
Hypothesis Tests in Multiple Regression
In multiple regression, two types of hypothesis tests are conducted to

measure the model adequacy. These are
1. Hypothesis Test for the overall usefulness, or significance of regression

2. Hypothesis Tests on the individual regression coefficients
The test for overall significance of regression can be conducted using

the information in the “Analysis of Variance” part of Table 7.11. The in-
formation contained in the “Regression Analysis” part of this table is used
to conduct the tests on the individual regression coefficients using the
“T ” or “p” column. These tests are explained below.
Testing the Overall Significance of Regression
Recall that in simple regression analysis, we conducted the test for the sig-
nificance using a t-test and F-test. Both of these tests in simple regression
provided the same conclusion. If the null hypothesis is rejected in these
tests, it will lead to the conclusion that the slope was not zero, or β1 = 0.
In a multiple regression, the t-test and the F-test have somewhat different
interpretation. These tests have the following objectives:
The F-test in a multiple regression is used to test the overall signifi-
cance of the regression. This test is conducted to determine whether a
significant relationship exists between the response variable y and the set
of independent variables, or predictors x1, x2, …,xn.
1. If the conclusion of the F-test indicates that the regression is sig-

nificant overall then a separate t-test is conducted for each of the in-
dependent variables to determine whether each of the independent
variables is significant.
Both the F-test and t-test are explained below.
F-Test
The null and alternate hypotheses for the multiple regression model
y = b0 + b1 x1 + b2 x 2 + .. + bk xk are stated as
H 0 : β1 = β 2 = … = β k = 0 (Regression is not significant)

H1 : at least one of the coefficients is nonzero (7.24)
If the null hypothesis H0 is rejected, we conclude that at least one

of the independent variables: x1 , x 2 ,.., xn contributes significantly to the
prediction of the response variable y. If H0 is not rejected, then none of
the independent variables contributes to the prediction of y. The test sta-
tistic for testing this hypothesis uses an F-statistic and is given by
MSR
F =
MSE (7.25)
where MSR = mean squares due to regression, or explained variability, and

MSE = mean square error, or unexplained variability. In equation (7.25),
the larger the explained variation of the total variability, the larger is the
F-statistic. The values of MSR, MSE, and the F statistic are calculated
in the “Analysis of Variance” table of the multiple regression computer
output (see Table 7.12 below).
The critical value for the test is given by Fk ,n − ( k +1),α where, k is the
number of independent variables, n is the number of observations in
the model, and α is the level of significance. Note that k and (n-k-1) are
the degrees of freedom associated with MSR and MSE respectively. The
null hypothesis is rejected if F > Fk ,n − ( k +1),α where F is the calculated F
value or the test statistic value in the Analysis of Variance table.
Table 7.12 Analysis of variance table
Test the Overall Significance of Regression for the

Example Problem at a 5 Percent Level of Significance
Step 1: State the Null and Alternate Hypotheses
For the overall significance of regression, the null and alternate hypoth-
eses are:
H 0 : β1 = β 2 = … = β k = 0 (Regression is not significant)

H1 : at least one of the coefficients is nonzero (7.26)
Step 2: Specify the Test Statistic to Test the Hypothesis
The test statistics is given by
MSR
F = (7.27)
MSE
The value of F statistic is obtained from the “Analysis of Variance”

(ANOVA) table of the computer output. We have reproduced the An-
alysis of Variance part of the table (Table 7.12). In this table the labels k,
[n − (k + 1)], SSR, SSE etc. are added for explanation purpose. They are
not the part of computer results.
In the ANOVA table below, the first column refers to the sources
of variation, DF = the degrees of freedom, SS = the sum of squares,
MS = mean squares, F = the F statistic, and p is the probability or p-value
associated with the calculated F statistic.
The degrees of freedom (DF) for Regression and Error are k and n −
(k + 1) respectively where, k is the number of independent variables (k = 3
for our example) and n is the number of observations (n = 30). Also, the
total sum of squares (SST) is partitioned into two parts: sum of squares
due to regression (SSR) and the sum of squares due to error (SSE) having
the following relationship.
SST = SSR + SSE
We have labeled SST, SSR, and SSE values in Table 7.12. The mean
square due to regression (MSR) and the mean squares due to error (MSE)
are calculated using the following relationships:
MSR = SSR/k and MSE = SSE/(n – k − 1)
The F-test statistic is calculated as F = MSR/MSE.
Step 3: Determine the Value of the Test Statistic
The test statistic value or the F statistic from the ANOVA table (see
Table 7.12) is
F = 63.62
Step 4: Specify the Critical Value
The critical value is given by
Fk , n − (k −1), α = F3, 26,0.05 = 2.74 (From the F-table)
Step 5: Specify the Decision Rule
Reject Ho if F-statistic > FCritical
Step 6: Reach a Decision and State Your Conclusion
The calculated F statistic value is 63.62. Since F = 63.62 > Fcritical = 2.74,
we reject the null hypothesis stated in equation (7.26) and conclude that
the regression is significant overall. This indicates that there exists a sig-
nificant relationship between the dependent and independent variables.
Alternate Method of Testing the above Hypothesis
The hypothesis stated using equation (7.26) can also be tested using the
p-value approach. The decision rule using the p-value approach is given by
If p ≥ α, do not reject H0
If P < α, reject H0
From Table 7.12, the calculated p value is 0.000 (see the P column). Since
p = 0.000 < α = 0.05, we reject the null hypothesis H0 and conclude that
the regression is significant overall.
Hypothesis Tests on Individual Regression Coefficients
t-tests
If the F-test shows that the regression is significant, a t-test on individual

regression coefficients is conducted to determine whether a particular in-
dependent variable is significant. We are often interested in determining
which of the independent variables contributes to the prediction of the y.
The hypothesis test described here can be used for this purpose.
To determine which of the independent variables contributes to the
prediction of the dependent variable y, the following hypotheses test can
be conducted:
H0:β j = 0
H1:β j ≠ 0 (7.28)
This hypothesis tests an individual regression coefficient. If the null

hypothesis H0 is rejected; it indicates that the independent variable xj is
significant and contributes in the prediction of y. On the other hand, if
the null hypothesis H0 is not rejected, then xj is not a significant variable
and can be deleted from the model or further investigated. The test is
repeated for each of the independent variables in the model.
Table 7.13 MINITAB regression analysis results
This hypothesis test also helps to determine if the model can be made
more effective by deleting certain independent variables, or by adding
extra variables. The information to conduct the hypothesis test for each of
the independent variables is contained in the “Regression Analysis” part
of the computer output which is reproduced in Table 7.13 below. The
columns labeled T and p are used to test the hypotheses. Since there are
three independent variables, we will test to determine whether each of the
three variables is a significant variable; that is, if each of the independent
variables contributes in the prediction of y. The hypothesis to be tested
and the test procedure are explained below. We will use a significance level
of α = 0.05 for testing each of the independent variables.
Test the Hypothesis That Each of the Three

Independent Variables Is Significant at a 5 Percent
Level of Significance
Test for the significance of x1 or Average Temperature
Step 1: State the null and alternate hypotheses. The null and alternate
hypotheses are:
H0: β1 = 0 (x1 is not significant or x1 does not contribute in prediction of y)

H1:β1 ≠ 0 (x1 is significant or x1 does contribute in prediction of y) (7.29)
Step 2: Specify the test statistic to test the hypothesis.

The test statistics is given by
b1 (7.30)
t =
sb1
where, b1 is the estimate of slope β1 and sb1 is the estimated standard devi-
ation of b1.
Step 3: Determine the value of the test statistic
The values b1, sb1 and t are all reported in the Regression Analysis part of
Table 7.13. From this table, these values for the variable x1 or the average
temperature (Avg. Temp.) are
b1 = −1.6457, sb1 = 0.6967
and the test statistic value is
b1 −1.6457
t = = = −2.36
sb 1 0.6967
This value is reported under the T column.

Step 4: Specify the critical value
The critical values for the test are given by
tα / 2,[ n − ( k +1)]
which is the t-value from the t-table for [n − (k + 1)] degrees of freedom
and α /2, where n is the number of observations (n = 30), k is the number
of independent variables (k = 3) and α is the level of significance (0.05 in
this case). Thus,
tα = t 0.025, [ 30 − (3 + 1)] = t 0.025, 26 = 2.056 (From the t-table)

,[ n − ( k + 1) ]
2
Step 5: Specify the decision rule: The decision rule for the test:
Reject H0 if t > +2.056

or, if t < −2.056
Step 6: Reach a decision and state your conclusion

The test statistic value (T value) for the variable “av. temp” (x1) from
Table 7.13 is −2.36. Since, t = −2.36 < tcritical = −2.056
we reject the null hypothesis H0 (stated in equation 7.29) and conclude

that the variable average temperature (x1) is a significant variable and does
contribute in the prediction of y.
The significance of other independent variables can be tested in the
same way. The test statistic or the t values for all the independent vari-
ables are reported in Table 7.13 under T column. The critical values for
testing each independent variable are the same as in the test for the first
independent variable above. Thus, the critical values for testing the other
independent variables are
tα / 2,[ n − ( k +1)] = t 0.025,[ 30 − ( 3 +1) = t 0.025, 26 = ±2.056
Alternate Way of Testing the above Hypothesis

The hypothesis stated using equation (7.29) can also be tested using the
p-value approach. The decision rule for the p-value approach is given by
If P < α, reject H0 (7.31)
From Table 7.14, the p-value for the variable average temperature (Avg.
Temp., x1) is 0.026. Since, p = 0.026 < α = 0.05, we reject H0 and con-
clude that the variable average temperature (x1) is a significant variable.
I. Test for the other independent variables

The other two independent variables are
x2 = Size of the house (or House Size)
x3 = Age of the furnace
Table 7.14 Summary table

Independent p-value from Compare p Significant?
Variable Table 7.4 to α Decision Yes or No
Av. Temp. (x1) 0.026 P<α Reject H0 Yes
House Size (x2) 0.000 P<α Reject H0 Yes
Age of Furnace (x3) 0.024 P<α Reject H0 Yes
It is usually more convenient to test the hypothesis using the p-value

approach. Table 7.14 provides a summary of the tests using the p-value
approach for all the three independent variables. The significance level α
is 0.05 for all the tests. The hypothesis can be stated as:
H0 :βj = 0 (xj is not a significant variable)
H0 :βj ≠ 0 (xj is a significant variable)
where, j = 1,2,…3 for our example.

From Table 7.14 it can be seen that all the independent variables are
significant. This means that all the three independent variables contribute
in predicting the response variable y, the heating cost.
Note: The above method of conducting t-tests on each β parameter
in a model is not the best way to determine whether the overall model is
providing information for the prediction of y. In this method, we need to
conduct a t-test for each independent variable to determine whether the
variable is significant. Conducting a series of t-tests increases the likeli-
hood of making an error in deciding which variable to retain in the model
and which one to exclude. For example, suppose we are fitting a first order
model like the one in this example with 10 independent variables and
decided to conduct t-tests on all 10 of the β’s. Suppose each test is con-
ducted at α = 0.05. This means that there is a 5 percent chance of making
a wrong or incorrect decision (Type I error − probability of rejecting a true
null hypothesis) and there is a 95 percent chance of making a right deci-
sion. If 10 tests are conducted, the probability of making a correct decision
drops to approximately 60 percent [(0.95)10 = 0.599] assuming all the
tests are independent of each other. This means that even if all the β par-
ameters (except β0) are equal to 0, approximately 40 percent of the time,
the null hypothesis will be rejected incorrectly at least once leading to the
conclusion that β differs from 0. Thus, in the multiple regression models
where a large number of independent variables are involved and a series
of t- tests are conducted, there is a chance of including a large number of
insignificant variables and excluding some useful ones from the model. In
order to assess the utility of the multiple regression models, we need to
conduct a test that will include all the β parameters simultaneously. Such
a test would test the overall significance of the multiple regression model.
The other useful measure of the utility of the model would be to find
some statistical quantity such as R2 that measures how well the model fits
the data.
A Note on Checking the Utility of a Multiple Regression Model
(Checking the Model Adequacy)
Step 1. To test the overall adequacy of a regression model, first test

the following null and alternate hypotheses,
H 0 : β1 = β 2 = … = β k = 0 (No relationship)
H1: at least one of the coefficients is nonzero
A) If the null hypothesis is rejected, there is evidence that all the β

parameters are not zero and the model is adequate. Go to step 2.
B) If the null hypothesis is not rejected then the overall regression
model is not adequate. In this case, fit another model with more
independent variables, or consider higher-order terms.
Step 2. If the overall model is adequate, conduct t-tests on the β param-
eters of interest, or the parameters considered to be most import-
ant in the model. Avoid conducting a series of t-tests on the β
parameters. It will increase the probability of type I error, α.
Multicollinearity and Autocorrelation

in Multiple Regression
Multicollinearity is a measure of correlation among the predictors in a
regression model. Multicollinearity exists when two or more independ-
ent variables in the regression model are correlated with each other.
In practice, it is not unusual to see correlations among the independent

variables. However, if serious multicollinearity is present, it may cause
problems by increasing the variance of the regression coefficients and
making them unstable and difficult to interpret. Also, highly correlated
independent variables increase the likelihood of rounding errors in the
calculation of β estimates and standard errors. In the presence of multi-
collinearity, the regression results may be misleading.
Effects of Multicollinearity
A) Consider a regression model where the production cost (y) is related
to three independent variables: machine hours (x1), material cost (x2),
and labor hours (x3):
y = β0 + β1 x1 + β 2 x 2 + β3 x3
MINITAB computer output for this model is shown in

Table 7.15. If we perform t-tests for testing β1 , β 2 , and β3 , we
find that all the three independent variables are non-significant at
α = 0.05 while the F-test for H0: β1 = β2 = β3 = 0 is significant (see
the p-value in the Analysis of Variance results shown in Table 7.15).
The results are contradictory but in fact, they are not. The tests on in-
dividual bi parameters indicate that the contribution of one variable,
say x1 = machine hours is not significant after the effects of x2 = ma-
terial cost, and x3 = labor hours have been accounted for. However,
the result of the F-test indicates that at least one of the three variables
is significant, or is making a contribution to the prediction of re-
sponse y. It is also possible that at least two or all the three variables
are contributing to the prediction of y. Here, the contribution of one
variable is overlapping with that of the other variable or variables.
This is because of the multicollinearity effect.
B) Multicollinearity may also have an effect on the signs of the parameter
estimates. For example, refer to the regression equation in Table 7.15.
In this model, the production cost (y) is related to the three explana-
tory variables: machine hours (x1), material cost (x2), and labor
hours (x3). If we check the effect of the variable machine hours (x1),
Table 7.15 Regression Analysis: PROD COST vs. MACHINE

HOURS, MATERIAL COST, and LABOR Hours.
the regression model indicates that for each unit increase in machine

hour, the production cost (y) decreases when the other two factors are
held constant. However, we would expect the production cost (y) to
increase as more machine hours are used. This may be due to the pres-
ence of multicollinearity. Because of the presence of multicollinearity,
the value of a β parameter may have the opposite sign from what is
expected.
One way of avoiding multicollinearity in regression is to conduct

design of experiments and select the levels of factors in a way that the
levels are uncorrelated. This may not be possible in many situations. It is
not unusual to have correlated independent variables; therefore, it is im-
portant to detect the presence of multicollinearity to make the necessary
modifications in the regression analysis.
Detecting Multicollinearity
Several methods are used to detect the presence of multicollinearity in
regression. We will discuss two of them.
1. Detecting Multicollinearity using Variance Inflation Factor (VIF):

MINITAB provides an option to calculate Variance inflation factors
(VIF) for each predictor variable that measures how much the vari-
ance of the estimated regression coefficients are inflated as compared
to when the predictor variables are not linearly related. Use the
guidelines in Table 7.16 to interpret the VIF.
Table 7.16 Detecting correlation using VIF values

Values of VIF Predictors are…
VIF = 1 Not correlated
1 < VIF < 5 Moderately correlated
VIF = 5 to 10 or greater Highly correlated
VIF values greater than 10 may indicate that multicollinearity is un-

duly influencing your regression results. In this case, you may want to
reduce multicollinearity by removing unimportant independent variables
from your model.
Refer to the table above for the values of VIF for the production cost
example. The VIF value for each predictor has a value greater than 10 in-
dicating the precedence of multicollinearity. The VIF values indicate that
the predictors are highly correlated. The VIF for each of the independent
variables is calculated automatically when a multiple regression model is
run using MINITAB.
Detecting Multicollinearity by Calculating Coefficient of

Correlation, r
A simple way of determining multicollinearity is to calculate the coef-

ficient of correlation, r, between each pair of predictor or independent
variables in the model. The degree of multicollinearity depends on the
magnitude of the value of r. Use Table 7.17 as a guide to determine the
multicollinearity.
Table 7.18 shows the correlation coefficient, r between each pair of
predictors for the production cost example.
The above values of r show that the variables are highly corre-
lated. The correlation coefficient matrix above was calculated using
MINITAB.
Table 7.17 Determining multicollinearty using correlation coefficient, r

Correlation Coefficient, r
r ≥ 0.8 Extreme multicollinearity
0.2 ≤ r < 0.8 Moderate multicollinearity
r < 0.2 Low multicollinearity
Table 7.18 Correlation coefficient between pairs of variables

Correlations: Machine Hours, Material Cost, Labor Hours
Machine Hours Material Cost(y)
Material Cost 0.964
Labor Hours 0.953 0.917
Cell Contents: Pearson correlation
Summary of the Key Features

of Multiple Regression Model
The multiple regression model above extended the concept of simple lin-
ear regression and provided an in-depth analysis of the multiple regression
model—one of the most widely used prediction techniques used in data
analysis and decision making. The multiple regression model explores the
relationship between a response variable, and two or more independent
variables or the predictors. The sections provided computer analysis and
interpretation of multiple regression models. Several examples of matrix
plots were presented. These plots are helpful in the initial stages of model
building. Using the computer results, the following key features of mul-
tiple regression model were explained; (a) the multiple regression equation
and its interpretation, (b) the standard error of the estimate—a measure
used to check the utility of the model and to provide a measure of reliabil-
ity of the prediction made from the model, (c) the coefficient of multiple
determination r2 that explains the variability in the response y explained by
the independent variables used in the model. Besides these, we discussed
the hypothesis tests using the computer results. Step-wise instructions
were provided to conduct the F-test and t-tests. The overall significance
of the regression model is tested using the F-test. The t- test is conducted
on individual predictor or the independent variable to determine the sig-

nificance of that variable. The effect of multicollinearity and detection of
multicollinearity using computer were discussed with examples.
Model Building and Computer Analysis

Introduction to Model Building
In the previous chapters, we discussed simple and multiple regression

where we provided detailed analysis of these techniques including the
analysis and interpretation of computer results. In both the simple and
multiple regression models, the relationship among the variables is linear.
In this chapter we will provide an introduction to model building and
nonlinear regression models. By model building, we mean selecting the
model that will provide a good fit to a set of data, and the one that will
provide a good estimate of the response or the dependent variable, y that
is related to independent variables or factors x1, x2, …xn. It is important
to choose the right model for the data.
In regression analysis, the dependent or the response variable is usu-
ally quantitative. The independent variables may be either quantitative or
qualitative. The quantitative variable is one that assumes numerical values
or can be expressed as numbers. The qualitative variable may not assume
numerical values.
In experimental situations we often encounter both the quantitative
and qualitative variables. In the model building examples, we will show
later how to deal with qualitative independent variables.
Model with a Single Quantitative Independent Variable
The models relating the dependent variable y to a single quantitative in-

dependent variable x are derived from the polynomial of the form:
y = b0 + b1 x + b2 x2 + b3 x3+….+bn xn (7.32)
In the above equation, n is an integer and b0, b1,...,bn are unknown par-
ameters that must be estimated.
A) First-order Model
The first order model is given by:
y = b0 + b1 x
or y = b0 + b1 x1 + b2 x2 + b3 x3+….+bn xn (7.33)
where b0 = y-intercept, bi = regression coefficients

B) Second-order Model
A second order model can be written as
y = b0 + b1 x + b2 x2 (7.34)
Equation (7.34) is a parabola in which:
b0 = y-intercept, b1 = a change in the value of b1 shifts the parabola

to the left or right; increasing the value of b1 causes the parabola to
shift to the left, b2 = rate of curvature
The second order model is a parabola. If b2 > 0 the parabola opens

up; if b2 < 0, the parabola opens down. The two cases are shown in
Figure 7.19.
Figure 7.19 The second order model
C) Third-order Model
A third order model can be written as:
y = b0 + b1 x + b2 x2 + b3 x3 (7.35)
b0: y-intercept and b3: controls the rate of reversal of the curvature of
curve.
A second order model has no reversal in curvature. In a second order

model, the y value either continues to increase or decrease as x increases
and produces either a trough or a peak. A third order model produces one
reversal in curvature and produces one peak and one trough. Reversals
in curvature are not very common but can be modeled using third or
higher order polynomial. The graph of a nth-order polynomial contains
(n − 1) peaks and troughs. Figure 7.20 shows the graph of a third order
polynomial. In real world situation, the second-order model is perhaps
the most useful.
Figure 7.20 The third-order model
Example: A Quadratic (Second-Order) Model
The life of an electronic component is believed to be related to the tem-

perature in the operating environment. Table 7.19 shows 25 observations
(Data File: COMP_LIFE) that show the life of the components (in hours)
and the corresponding operating temperature (in °F). We would like to fit
a model to predict the life of the component. In this case, the life of the
component is the dependent variable (y) and the operating temperature is
the independent variable (x).
Figure 7.21 shows the scatter plot of the data in Table in 7.19. From
the scatter plot, we can see that the data can be well approximated by a
quadratic model.
We used MINITAB and EXCEL to fit a second order model to the
data. The analysis of the computer results is presented below.
Table 7.19 Life of electronic components

Obs. 1 2 3 4 5 6 7 8 9 10
X(Temp.) 99 101 100 113 72 93 94 89 95 111
Y (Life) 141.0 136.7 145.7 194.3 101.5 121.4 123.5 118.4 137.0 183.2
Obs. 11 12 13 14 15 16 17 18 19 20
X(Temp.) 72 76 105 84 102 103 92 81 73 97
Y (Life) 106.6 97.5 156.9 111.2 158.2 155.1 119.7 105.9 101.3 140.1
Obs. 21 22 23 24 25
X(Temp.) 105 90 94 79 91
Y (Life) 148.6 116.4 121.5 108.9 110.1
Figure 7.21 Scatter Plot of Life (y) vs. Operating Temp. (x)
Second-Order Model Using MINITAB
A second order model was fitted using MINITAB. The regression output
of the model is shown in Table 7.20.
A quadratic model in MINITAB can also be run using the fitted line
plot option. The results of the quadratic model using this option provide
a fitted line plot (shown in Figure 7.22).
While running the quadratic model, the data values and residuals can
be stored and the plots of residuals be created.
Table 7.20 Computer results of second order model
Figure 7.22 Regression Plot with Equation
Residual Plots for the above Example Using MINITAB
Figure 7.23 shows the residual plots for this quadratic model. The residual
plots are useful in checking the assumptions of the model and the model
adequacy.
The analysis of residual plots for this model is similar to that of simple
and multiple regression models. The investigation of the plots shows that
the normality assumption is met. The plot of residuals versus the fitted
176
Figure 7.23 Residual plots for the quadratic model example
values shows a random pattern indicating that the quadratic model fitted
to the data is adequate.
Running a Second-Order Model Using EXCEL
Unlike MINITAB, EXCEL does not provide an option to run a quadratic

model of the form
y = b0 + b1x + b2 x2
However, we can run a quadratic regression model by calculating the

2
x column from the x column in the data file. The EXCEL computer
results are shown in Table 7.21.
Analysis of Computer Results of Tables 7.20 and 7.21
Refer to MINITAB output in Table 7.20 or the EXCEL computer output

in Table 7.21. The prediction equation from this table can be written
using the coefficients column. The equation is
yˆ = 433 − 8.89 x + 0.0598 x 2
In the EXCEL output, the prediction equation can be read from the
“coefficients” column.
The r2 value is 95.9 percent which is an indication of a strong model.
It indicates that 95.9 percent of the variation in y can be explained by the
variation in x and 4.1 percent of the variation is unexplained or due to
error. The equation can be used to predict the life of the components at a
specified temperature.
We can also test a hypothesis to determine if the second order term in our
model, in fact, contributes to the prediction of y. The null and alternate hy-
potheses to be tested for this can be expressed as
H0:β2 = 0
H0:β2 ≠ 0 (7.36)
178
Table 7.21 EXCEL computer output for the quadratic model
Summary Output
Regression Statistics
Multiple R 0.97947
R Square 0.95936
Adjusted R Square 0.95567
Standard Error 5.37620
Observations 25
ANOVA
df SS MS F Significance F
Regression 2 15,011.7720 7,505.8860 259.6872 0.0000
Residual 22 635.8784 28.9036
Total 24 15,647.6504
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 433. 0063 61.8367 7.0024 0.0000 304.7648 561.2478 304.7648 561.2478
Temp. (x) −8.8908 1.3743 −6.4691 0.0000 −11.7410 −6.0405 −11. 7410 −6.0405
x**2 0. 0598 0.0075 7.9251 0.0000 0.0442 0.0755 0.0442 0.0755
The test statistic for this test is given by
b2
t =
sb2
The test statistic value is calculated by the computer and is shown in

Table 7.21. In this table, the t value is reported in x**2 row and under t
stat column. This value is 7.93. Thus,
b2
t = = 7.93
sb2
The critical value for the test is
t α = t 22, 0.025 = 2.074

n − k − 1,
2
[Note: t n − k −1 is the t-value from the t-table for (n – k − 1) degrees of

freedom where n is the number of observations and k is the number of
independent variables.]
For our example, n = 25, k = 2 and the level of significance, α = 0.05.
Using these values, the critical value or the t-value from the t-table for 22
degrees of freedom and α = 0.025 is 2.074. Since the calculated value of t
t = 7.93 > tcritical = 2.074
We reject the null hypothesis and conclude that the second order term in
fact contributes in the prediction of the life of components (y). Note: we
could have tested the following hypotheses:
H0:β = 0
H0:β > 0
which will determine that the value of b2 = 0.0598 in the prediction equa-
tion is large enough to conclude that the life of the components increases
at an increasing rate with temperature. This hypothesis will have the same
test statistic and can be tested at α = 0.05.
Therefore, our conclusion is that the mean component life increases at an

increasing rate of temperature and the second order term in our model, in fact,
is significant and contributes to the prediction of y.
Another Example: Quadratic (Second-Order) Model

The fitted line plot of the temperature and yield in Figure 7.24 shows
the yield of a chemical process at different temperatures. The plot clearly
indicates a nonlinear relationship. There is an indication that the data can
be well approximated by a quadratic model.
We used MINITAB and EXCEL to run a quadratic model to the
data. The prediction equation from the regression output is shown below.
Yield (y) = 1,459 + 277 Temperature (x) − 0.896 x*x or,

yˆ = 1, 459 + 277 x − 0.896 x 2
The coefficient of determination, R2 is 88.2 percent. This tells us that

88.2 percent of the variation in y is explained by the regression and 11.8
percent of the variation is unexplained or due to error. The model is ap-
propriate and the prediction equation can be used to predict the yield at
different temperatures.
Figure 7.24 Fitted line plot showing the yield of a chemical process
Summary of Model Building

The sections above provided an introduction to model building. The first
order, second order, and third order models were discussed. Unlike the
simple and multiple regression models, where the relationship among
the variables is linear, there are situation where the relationship among
the variables under study may not be linear. We discussed the situation
where higher order and nonlinear models provide a better relationship
between the response and independent variables and provided examples
of quadratic or second-order models. Scatter plots were created to select
the model that would provide a good fit to a set of data and can be used to
obtain a good estimate of the response or the dependent variable, y that is
related to the independent variables or predictors. Since the second order
or quadratic models are appropriate in many applications, we provided a
detailed computer analysis of such models. The computer analysis and in-
terpretation of computer results were explained and examined including
the residual plots and analysis.
Models with Qualitative Independent (Dummy)

Variables
Dummy or Indicator Variables in Multiple Regression: In regression
we often encounter qualitative or indicator variables that need to be in-
cluded as one of the independent variables in the model. For example,
if we are interested in building a regression model to predict the salary
of male and female employees based on their education and years of ex-
perience; the variable male or female is a qualitative variable that must be
included as a separate independent variable in the model. To include such
qualitative variables in the model we use a dummy or indicator variable.
The use of dummy or indicator variables in a regression model allows us
to include qualitative variables in the model. For example, to include the
sex of employees in a regression model as an independent variable, we
define this variable as
 1
x1 = 
 0
In the above formulation, a “1” indicates that the employee is a male

and a “0” means the employee is a female. Which one of the male or fe-
male is assigned the value of 1 is arbitrary.
In general, the number of dummy or indicator variables needed is
one less than the total number of indicator variables to be included in
the model.
One Qualitative Independent Variable at Two Levels

Suppose we want to build a model to predict the mean salary of male and
female employees. This model can be written as
y = b0 +b1 x
where x is the dummy variable coded as
 1 if male
x1 = 
 0 if female
This coding scheme will allow us to compare the mean salary for male
and female employees by substituting the appropriate code in the regres-
sion equation: y = b0 + b1 x.
Suppose µM = mean salary for the male employees

µF = mean salary for the female employees
Then the mean salary for the male: µM = y = b0 + b1 (1) = b0 + b1
and the mean salary for the female: µF = y = b0 + b1 (0) = b0
Thus, the mean salary for the female employees is b0. In a 0-1 coding
system, the mean response will always be b0 for the qualitative variable
that is assigned the value 0.This is also called the base level.
The difference in the mean salary for the male and female employees
can be calculated by taking the difference (µM − µF)
µM −µF = (b0 +b1) − b0 = b1
The above is the difference between the mean response for the level
that is assigned the value 1 and the level that is assigned the value 0 or the
base level. The mean salary for the male and female employees is shown
graphically in Figure 7.25. We can also see that
b0 = µ F
b1 = µ M − µ F
Figure 7.25 Mean salary of female and male employees
Model with One Qualitative Independent Variable

at Three Levels
We would like to write a model relating the mean profit of a grocery chain.
It is believed that the profit to a large extent depends on the location of the
stores. Suppose that the management is interested in three specific locations
where the stores are located. We will call these locations A, B, and C. In this
case, the store location is a single qualitative variable which is at three levels
corresponding to the three locations A, B, and C. The prediction equation
relating the mean profit (y) and the three locations can be written as:
y = b0 + b1 x1 + b2 x2 where,
 1 if location B
x1 = 
 0 if not
 1 if location C
x2 = 
 0 if not
The variables x1 and x2 are known as the dummy variables that make
the model function.
Explanation of the Model
Suppose, µA = mean profit for location A

µB = mean profit for location B
µC = mean profit for location C
If we set x1 = 0 and x2 = 0, we will get the mean profit for location A.
Therefore, the mean value of profit y when the store location is A
µA = y = b0 + b1(0) + b2 (0)
or, µA = b0
Thus, the mean profit for location A is b0 or, b0 = µA

Similarly, the mean profit for location B can be calculated by setting
x1 = 1 and x2 = 0. The resulting equation is
µB = y = b0 + b1 x1 + b2 x2 = b0 + b1(1) + b2(0)
or, µB = b0 + b1
Since bo = µA, we can write
µB = µA + b1
or b1 = µB − µA
Finally, the mean profit for location C can be calculated by setting x1 = 0

and x2 = 1. The resulting equation is
µC = y = b0 + b1 x1 + b2 x2 = b0 + b1(0) + b2(1)
or, µC = b0 + b2
Since b0 = µA, we can write
µC = µA + b2
b2 = µC − µA
Thus, in the above coding system, one qualitative independent variable

is at three levels,
µA = b0 and b1 = µB − µA
µB = b0 + b1
µC = b0 + b2 b2 = µC − µA
where µA, µB, µC are the mean profits for locations A, B, and C.
Note that the three levels of the qualitative variable can be described with only
two dummy variables. This is because the mean of the base level (in this case
location A) is accounted for by the intercept b0. In general form, for m levels
of qualitative variable, we need (m − 1) dummy variables.
The bar graph in Figure 7.26 shows the values of mean profit (y) for
the three locations.
Figure 7.26 Bar chart showing the mean profit for three locations A,
B, C
In the above bar chart, the height of the bar corresponding to location
A is y = b0. Similarly, the heights of the bars corresponding to locations
B and C are y = b0 + b1 and y = b0 + b2 respectively. Note that either b1 or
b2, or both could be negative. In Figure 7.26, b1 and b2 are both positive.
Example: Dummy Variables

Consider the problem of the pharmaceutical company model where
the relationship between the sales volume (y) and three quantitative
independent variables: advertisement dollars spent (x1) in hundreds of

dollars, commission paid to the salespersons (x2) in hundreds of dollars,
and the number of salespersons (x3) were investigated. The company is
now interested in including different sales territories where they market
the drug. The territory in which the company markets the drug is divided
into three zones: zone A, B, and C. The management wants to predict the
sales for the three zones separately. To do this, the variable “zone” which
is a qualitative independent variable must be included in the model. The
company identified the sales volumes for the three zones along with the
variables considered earlier. The data including the sales volume and
the three zones are shown in the last column of Table 7.22 (Data File:
DummyVar_File1).
 1 if zone A  1 if zone B
x4  x5 
 0 otherwise  0 otherwise
In the above coding system, the choice of 0 and 1 in the coding is

arbitrary.
Note that, we have defined only two dummy variables—x4 and x5—
for a total of three zones. It is not necessary to define a third dummy
variable for zone C
From the above discussion, it follows that the regression model for the
data in Table 7.22 including the variable “zone” can be written as:
y = b0 + b1 x1 + b2 x 2 + b3 x3 + b4 x 4 + b5 x5
where (y): sales volume (y)

(x1): advertisement dollars spent in hundreds of dollars,
(x2): commission paid to the salespersons in hundreds of dollars,
(x3): the numbers of salespersons, and x4 and x5 the dummy variables:
 1 if zone A  1 if zone B
x4  x5 
 0 otherwise  0 otherwise
Table 7.22 Sales for different zones

No. of
Sales Advertisement Commission Salesperson
Row Volume (y) (x1) (x2) (x3) Zone
1 973.62 580.17 235.48 8 A
2 903.12 414.67 240.78 7 A
3 1,067.37 420.48 276.07 10 A
4 1,193.37 454.59 295.70 14 B
5 1,429.62 524.05 286.67 16 C
6 1,557.87 623.77 325.66 18 A
7 1,590.12 641.89 298.82 17 A
8 1,081.62 403.03 210.19 12 C
9 1,088.37 415.76 202.91 13 C
10 1,132.62 506.73 275.88 11 B
11 1,314.87 490.35 337.14 15 A
12 1,562.37 624.24 266.30 19 C
13 1,050.12 459.56 240.13 10 C
14 1,055.37 447.03 254.18 12 B
15 1,112.37 493.96 237.49 14 B
16 1,235.37 543.84 276.70 16 B
17 1,518.12 618.38 271.14 18 A
18 1,574.37 690.50 281.94 15 C
19 1,644.87 591.27 316.75 20 C
20 1,169.37 530.73 297.37 10 C
21 1,212.87 541.34 272.77 13 B
22 1,304.37 492.20 344.35 11 B
23 1,477.62 546.34 295.53 15 C
24 1,593.87 590.02 293.79 19 C
25 1,134.87 505.32 277.05 11 B
Table 7.23 shows the data file for this regression model with the dummy
variables. The data can be analyzed using a MINITAB data file – [Data
File: DummyVar_File(2) or from the EXCEL data file – DummyVar_File
(2).xlsx].
We used both MINITAB and EXCEL to run this model The
MINITAB and EXCEL regression output and results are shown in Tables
7.24 and 7.25. Refer to the computer results to answer the following
questions.
Table 7.23 Data file for the model with dummy variables
No. of
Volume Advertisement Commission Salespersons Zone A Zone B
Row (y) (x1) (x2) (x3) (x4) (x5)
1 973.62 580.17 235.48 8 1 0
2 903.12 414.67 240.78 7 1 0
3 1,067.37 420.48 276.07 10 1 0
4 1,193.37 454.59 295.70 14 0 1
5 1,429.62 524.05 286.67 16 0 0
6 1,557.87 623.77 325.66 18 1 0
7 1,590.12 641.89 298.82 17 1 0
8 1,081.62 403.03 210.19 12 0 0
9 1,088.37 415.76 202.91 13 0 0
10 1,132.62 506.73 275.88 11 0 1
11 1,314.87 490.35 337.14 15 1 0
12 1,562.37 624.24 266.30 19 0 0
13 1,050.12 459.56 240.13 10 0 0
14 1,055.37 447.03 254.18 12 0 1
15 1,112.37 493.96 237.49 14 0 1
16 1,235.37 543.84 276.70 16 0 1
17 1,518.12 618.38 271.14 18 1 0
18 1,574.37 690.50 281.94 15 0 0
19 1,644.87 591.27 316.75 20 0 0
20 1,169.37 530.73 297.37 10 0 0
21 1,212.87 541.34 272.77 13 0 1
22 1,304.37 492.20 344.35 11 0 1
23 1,477.62 546.34 295.53 15 0 0
24 1,593.87 590.02 293.79 19 0 0
25 1,134.87 505.32 277.05 11 0 1
A) Using the EXCEL data file, run a regression model. Show your regres-
sion output.
B) Using the MINITAB or EXCEL regression output, write down the
regression equation.
C) Using a 5 percent level of significance and the column “p” in the
MINITAB regression output or “p-value” column in the EXCEL re-
gression output, conduct appropriate hypotheses tests to determine
that the independent variables advertisement, commission paid, and
number of sales persons are significant or they contribute in predict-

ing the sales volume.
D) Write separate regression equations to predict the sales for each of the
zones A, B, and C.
E) Refer to the given MINITAB residual plots and check that all the regres-
sion assumptions are met and the fitted regression model is adequate.
Solution:
A) The MINITAB regression output is shown in Table 7.24.
B) Table 7.25 shows the EXCEL regression output.
C) From the MINITAB or the EXCEL regression outputs in Tables 7.24
and 7.25, the regression equation is:
Sales Volume (y) = −98.2 + 0.884 Advertisement(x1) + 1.81

Commission(x2) + 33.8 No. of Salespersons(x3) − 67.2 Zone A (x4)
−105 Zone B (x5)
or
y = −98.2 + 0.884 x1 + 1.81x 2 + 33.8 x3 − 67.2 x 4 − 105 x5
The regression equation from the EXCEL output in Table 7.25 can be
written using the coefficients column.
D) The hypotheses to check the significance of each of the independent

variables can be written as:
H 0 : β j = 0 (xj is not a significant variable)
H 1 : β j ≠ 0 (xj is a significant variable)
The above hypothesis can be tested using the “p” column in either
MINITAB or the p-value column in EXCEL computer results. The deci-
sion rule for the p-value approach is given by
If p ≥ α , do not reject H0
If p < α , reject H0
Table 7.26 shows the p-value for each of the predictor variables. From
MINITAB or EXCEL computer results in Table 7.24 or 7.25 (see the “p”
or the “p-value” columns in these tables).
From the above table it can be seen that all the three independent
variables are significant.
(E) As indicated, the overall regression equation is

Commission(x2) + 33.8 No. of Salespersons(x3) − 67.2 Zone A (x4)
− 105 Zone B (x5)
Separate equations for each zone can be written from this equation.
Table 7.26 Summary table

p-value from
Independent Table 7.24 Compare p Significant?
Variable or 7.25 to α Decision Yes or No
Advertisement 0.000 p<α Reject H0 Yes
(X1)
Commissions (X2) 0.000 p<α Reject H0 Yes
No. of salesper- 0.000 p<α Reject H0 Yes
sons (X3)
Zone A: x4 = 1.0, x5 = 0
Therefore, the equation for the sales volume of Zone A can be written as

Commission(x2) +33.8 No. of Salespersons(x3) − 67.2(1) − 105 (0.0) or,
Sales Volume (y) = −98.2 + 0.884 Advertisement (x1) + 1.81 Commission

(x2) + 33.8 No. of Salespersons (x3) – 67.2 or,

Commission(x2) + 33.8 No. of Salespersons(x3)
Similarly, the regression equations for the other two zones are shown
below.
Zone B: x4 = 0, x5 = 1.0
Substituting these values in the overall regression equation of part (c)
Sales Volume (y) = −98.2 + 0.884 Advertisement(x1) + 1.81 Commission

(x2) + 33.8 No. of Salespersons(x3) − 105 or, Sales Volume (y) = −203.2
+ 0.884 Advertisement (x1) + 1.81 Commission (x2) +33.8 No. of Sales-
persons (x3)
Zone C: x4 = 0, x5 = 0
Substituting these values in the overall regression equation of part (c)

Commission(x2) + 33.8 No. of Salespersons(x3)
Note that in all of the above equations, the slopes are same but intercepts
are different.
(F) The MINITAB residual plots are shown in Figure 7.27.

The residual plots in Figure 7.27 show that the normal probability
plot and the histogram of residuals are approximately normally dis-
tributed. The plot of residuals versus fits does not show any pattern
and is quite random indicating that the fitted linear regression model
is adequate. The plot of residuals and the order of data points show no
apparent pattern indicating that there is no violation of independence
of error assumptions.
Figure 7.27 Residual plots for the dummy variable example
Overview of Regression Models

Regression is a powerful tool and is widely used in studying the relation-
ships among the variables. A number of regression models were discussed
in this book. These models are summarized here:
Simple Linear Regression y = β0 + β1 x + ε
Multiple Regression y = β0 + β1 x1 + β 2 x 2 + ... + βk xk + ε
Polynomial Regression (second Second-order polynomial:

order models can be extended to y = β0 + β1 x1 + β 2 x 22 + ε
higher order model)
Higher-order polynomial:
y = β0 + β1 x1 + β 2 x 22 + ... + βk x k + ε
Interaction Models An interaction model relating y and two quan-

titative independent variables can be written as
y = b0 + b1 x1 + b2 x 2 + b3 x1 x 2
Models with Dummy Variables General form of Model with one qualitative
(dummy)independent variable at m levels
y = b0 + b1 x1 + b2 x2 +……+ bm − 1 xm − 1
where, xi is the dummy variable for level (i + 1) and
 1 if y is observed atr level (i + 1)

xi = 
 0 otherwise
All Subset and Stepwise Regression Finding the best set of predictor variables to be
included in the model
Note; the Interaction Models and All Subset Regression are not discussed
in this chapter.
There are other regression models that are not discussed but can be de-
veloped using the concepts presented for the other models. Some of these
models are explained here.
Reciprocal Trans- This transformation can produce a linear relationship and is of

formation of x the form:
Variable  1
y = β0 + β1   + ε
 x
This model is appropriate when x and y have an inverse rela-
tionship. Note that the inverse relationship is not linear.
Log Transformation The logarithmic transformation is of the form:
of x Variable y = β0 + β1 ln( x ) + ε
Log Transformation This is a useful curvilinear form where ln( x ) is the natural loga-
of x and y variables rithm of x and x > 0 .
ln( y ) = β0 + β1 ln( x ) + ε
The purpose of this transformation is to achieve a linear rela-
tionship. The model is valid for positive values of x and y. This
transformation is more involved and is difficult to compare it to
other models with y as the dependent variable.
Logistic Regression This model is used when the response variable is categorical. In
all the regression models we developed in this book, response
variable was a quantitative variable. In cases, where the response
is categorical or qualitative, the simple and multiple least-
squares regression model violates the normality assumption.
The correct model in this case is logistic regression and is not
discussed in this book.
Implementation Steps and Strategy

for Regression Models
Successful implementation of regression models requires an understand-
ing of different types of models. A knowledge of least-squares method on
which many of the regression models are based as well as the awareness of
the assumptions of least-squares regression are critical in evaluating and
implementing the correct regression models. The computer packages have
made the model building and analysis easy. As we have demonstrated, the
scatter plots and matrix plots constructed using the computer are very
helpful in the initial stages of selecting the right model for the given data.
The residual plots for checking the assumptions of regression can be easily
constructed using computer. While the computer packages have removed
the computational hurdle, it is important to understand the fundamen-
tals underlying the regression to apply the regression models properly.
A lack of understanding of least-squares method and the assumptions
underlying the regression may lead to drawing wrong conclusions and
selecting alternative course of action. For example, if the assumptions of
regression are violated, it is important to determine the alternate course
or courses of action.
CHAPTER 8
Time Series Analysis

and Forecasting
Chapter Highlights
• Introduction to Forecasting
• Forecasting Methods: An Overview
?? Qualitative Forecasting
• Time Series Analysis and Forecasting

• Associative Forecasting
• Features of Forecasts
• Elements of a Good Forecast
• Steps in the Forecasting Process
• Forecasting Techniques
• Some Common Patterns of Historical Data
• Demonstration of Forecasting Errors
• Measuring Forecast Accuracy
• Forecasting Methods
?? Naïve Forecasting Method
• Forecasting Models Based on Averages

• Simple Moving Average
• Illustration of Simple Moving Average Method
• Calculating and Comparing Forecast Errors
• Weighted Moving Averages
• Simple Exponential Smoothing Method
• Moving Average with a Trend: Double Moving
Average—Example
?? Forecasting Data with a Trend
• Forecasting Data Using Different methods and Comparing

Forecasts to Select the Best Forecasting Method
• Computer Applications and Implementation—Selecting the
Best Forecasting Method
• Forecasting Seasonal Time Series Data
• Associative Forecasting Techniques—Regression-Based
Forecasting Models
?? Simple Regression
?? Multiple Regression Analysis
Introduction to Forecasting
Forecasting and time series analysis are major tools of predictive analytics.
Forecasting involves predicting future business outcomes using a number
of qualitative and quantitative methods. In this chapter we discuss the
prediction techniques using forecasting and time series data. Many of the
business planning production, operations, sales, demand, and inventory
decisions are based on forecasting. We discuss here the broad meaning of
forecasting applications and a number of models. A forecast is a statement
about the future value of a variable of interest such as demand. Forecast-
ing is used to make informed decisions and may be divided into:
• Long range
• Short range
Forecasts affect decisions and activities throughout an organization.

Produce-to-order companies depend on demand forecast to plan their
production. Inventory planning and decisions are affected by forecast.
Following are some of the areas where forecasting is used.
Accounting Cost/profit estimates

Finance Cash flow and funding
Human Resources Hiring/recruiting/training
Time Series Analysis and Forecasting 197
Marketing Pricing, promotion, strategy

Information Systems Information technology/information systems,
services
Operations Schedules, MRP (materials requirement planning),
workloads
Product/Service Design New products and services
Production/Manufacturing Demand, sales, revenue, raw material demand forecasts
City Planning Water and utilities demands
Technology Trend, future usage
Internet of Things Future trend and demand
Forecasting Methods: An Overview

Forecasting methods are classified as qualitative or quantitative.
Qualitative forecasting methods use expert judgment to develop fore-
casts. These methods are used when historical data on the variable being
forecast are usually not available. The method is also known as judgmental
as they use subjective inputs. These forecasts may be based on consumer
surveys, opinions of sales and marketing, market sensing, and Delphi
method that uses opinions of managers or through consensus.
The objective of forecasting is to predict the future outcome based on
the past pattern or data. When the historical data are not available, quali-
tative methods are used. These methods are used in the absence of past
data or in cases when a new product is launched for which information is
not available. Qualitative methods forecast the future outcome based on
opinion, judgment, or experience. Qualitative forecasts can also be seen
as judgmental forecasting. The forecast may be based on:
Consumer/customer surveys Executive opinions

Sales force opinions Surveys of similar competitive products
Delphi method Expert knowledge and opinions of managers
Quantitative forecasting is based on historical data. The most common

methods are time series and associative forecasting methods. These are dis-
cussed in detail in the subsequent sections.
Time Series Forecasting

These methods come under quantitative methods and use historical time
series data that are set of observations measured at successive points in
time. Time series implies the data collected over time. The forecasts are
based on studying the historical trends and patterns in the data and ap-
plying appropriate models to forecast the future outcomes. The idea is
based on the assumption that the future trend will continue. These meth-
ods project the future based on the past patterns.
In time series analysis and forecasting, plotting the data is the initial
and one of the most important steps. The pattern of time series helps the
analyst to see the behavior of the data over time. The pattern is also critical
in selecting and applying the appropriate forecasting technique. The idea
in forecasting is to study the past pattern and project it into the future if
there is a reason to believe that such pattern will continue in the future.
Associative Forecasting
Associative forecasting methods use explanatory variables to predict the
future. These methods use one or several independent variables or factors
to predict the response variable. Regression methods using simple, mul-
tiple, nonlinear regression models and also indicator variables are some of
the methods used in this category. In this chapter, we will mainly focus on
quantitative forecasting methods.
Features of Forecasts
• Forecasts are not exact and are rarely perfect because of randomness.
Also more than one forecasting method can often be used to forecast
the same data. They all produce different results. The forecast accuracy
differs based on the methods used. Applying the correct forecasting
technique is critical to achieving good forecasts. Some forecasting tech-
niques are more complex than the others. Applying the correct fore-
casting method requires experience and a knowledge of the process.
• Forecast accuracy depends on the randomness and noise present
in the data.
• Forecast accuracy decreases as the time horizon increases.
Elements of a Good Forecast
The generated forecasts should be:
• Timely ° Reliable ° Accurate ° Meaningful

• Written/documented ° Easy to use and implement
Steps in Forecasting Process

1. Specify the purpose and objective of the forecast.
2. Establish a time horizon of the forecast (short, medium, or long range).
3. Plot the data to examine the trend, pattern, etc. The plots may be a
time series, scatter plot, or other plot as applicable.
4. Select an appropriate forecasting method or methods.
5. Analyze data using a forecasting software.
6. Generate the forecast (use more than one method if applicable).
7. Plot the actual data and the forecast to see visually how the forecast
is responding to the actual data.
8. Calculate the accuracy of the forecast by calculating different measures.
9. Determine the best forecast.
10. Implement the best forecast.
11. Monitor the forecast.
Forecasting Models and Techniques: An Overview
The forecasting methods and models can be divided into following

categories:
Techniques Using Average

(a) Simple moving average; (b) weighted moving average; (c) exponential
smoothing
Techniques for Trend

Linear trend equation (similar to simple regression)
Double moving average or moving average with trend
Exponential smoothing with trend or trend-adjusted exponential
smoothing
Techniques for Seasonality

Forecasting data with seasonal pattern
Associative Forecasting Techniques

Simple regression
Multiple regression analysis
Nonlinear regression
Regression involving categorical or indicator variables
Other regression models
Some Common Patterns in Forecasting

Horizontal or Constant Pattern
Horizontal or constant pattern: these are also known as stable or constant

process. In this case, the variable of interest does not show an increasing
or decreasing pattern but fluctuates around an average.
Trend
A trend in the time series is identified by gradual shifts or movements to

relatively higher or lower values over a period of time. A trend may be
increasing or decreasing and may be linear or nonlinear. Sometimes an in-
creasing or decreasing trend may depict a fluctuation around an average.
Some examples of trend may be changes in populations, sales, and rev-
enue of a company, and demand for a particular technology of consumer
items showing increasing or decreasing demand.
Seasonal
Seasonal patterns are recognized by seeing the same repeating pattern of

highs and lows over successive periods of time within a year but may
occur within a day, week, month, quarter, year, or some other interval no
greater than a year. Note that a seasonal pattern does not necessarily refer
to the four seasons of the year.
Trend and Seasonal
These are the time series where the variable of interest shows a combin-
ation of a trend and seasonal pattern. Forecasting this type of pattern
requires a technique that can deal with both trend and seasonality and can
be achieved through time series decomposition to separate or decompose
a time series into trend and seasonal components. The methods to forecast
trend and seasonal patterns are usually more involved computationally.
Cyclical
A cyclical pattern is identified by a time series that shows an alternating

sequence of points plotting above and below a trend line. These patterns
last more than one year. The cyclical pattern is the result of multiyear
business cycles and are difficult to forecast. An example of a time series
showing cyclical pattern is stock market.
Random Fluctuations
Random fluctuations are the result of chance variation and may be a com-
bination of constant fluctuations followed by trends. An example would
be the demand for electricity in summer. These patterns require special
forecasting techniques and are often complex in nature.
Usually the first step in forecasting is to plot the historical data. This
is critical in identifying the pattern in the time series and applying the
correct forecasting method. If the data are plotted over time, such plots
are known as time series plots. This plot involves plotting the time on the
horizontal axis and the variable of interest on the vertical axis. The time
series plot is a graphical representation of data over time where the data
may be weekly, monthly, quarterly, or annually. Some of the common
time series patterns are shown in Figures 8.1 through 8.7.
Figure 8.1 shows that the demand data is fluctuating around an aver-
age. The averaging techniques such as, Simple Moving Average or Simple
Exponential Smoothing can be used to forecast such patterns. Figure 8.2
shows the actual data and the forecast for Figure 8.1.
Figure 8.1 A constant (stable process)
Figure 8.2 Forecast for the demand data in Figure 8.1 (forecasts are
dotted lines)
Figure 8.3 shows the sales data for a company over a period of
65 weeks. Clearly, the Data are fluctuating around an average and show-
ing an increasing trend. Forecasting techniques such as, Double Moving
Average or Exponential Smoothing with a trend can be used to forecast
such patterns. Figure 8.4 shows the sales and forecast for the data in
Figure 8.3. Figure 8.5 shows a seasonal pattern.
Figure 8.3 A Linear Trend Process
Figure 8.4 Forecast for the sales data in Figure 8.3 using double
moving average
The other class of models is based on regression. Figure 8.6 shows the
relationship between two variables—summer temperature and electricity
used. There is a clear indication that there exists a linear relationship be-
tween the two variables. Such a relationship between the variables enables
us to use regression models where one variable can be predicted using the
other variable. We have explained the regression models in the previous
chapter. Figure 8.7 shows a nonlinear relationship (quadratic model). A
nonlinear or quadratic model as explained in the previous chapter can be
used in such cases to predict the response variable (yield in this case) using
the independent variable (temperature).
Figure 8.5 Data showing seasonal pattern
Figure 8.6 Linear trend model

Figure 8.7 Nonlinear relationship (quadratic model)
Measuring Forecast Accuracy

The accuracy of the forecasting method is critical in selecting, applying,
and implementing a forecasting method. The accuracy measures tell us
how the forecast is behaving and responding to the actual data and the
pattern. Usually, more than one forecasting method can be applied to
forecast the same data.
A number of measures are available to determine the accuracy of a
forecasting method. We will describe these measures here. These measures
are used to determine how well the forecast is responding to the actual data
as well as how close the forecast values are to the actual data. A good fore-
cast should respond and follow the actual data closely with minimum of
error. The error is the difference between the actual value and the forecast.
The forecast accuracy measures the errors in different forms. A num-
ber of accuracy measures are calculated. We will see that some accuracy
measures are preferred more than the others. Usually, the most accurate
forecast is the one that has minimum of errors. Note that different fore-
casting methods can be used to forecast the same time series data.
When different methods are used to forecast the same time series, the
forecast accuracy measures are calculated and compared for each method
to determine the best forecasting method.
• Measures of forecast accuracy are used to determine how well a

particular forecasting method is able to reproduce the time series
data as well as how the method is responding to the actual fluctua-
tion in data.
• To get a better idea of the forecast accuracy, it is helpful to plot the
actual data and the forecast on the same plot.
• Measures of forecast accuracy are important factors in comparing
different forecasting methods. This helps to select the best forecast-
ing method for the given time series data.
The forecast accuracy is related to the forecast error that is defined as:
Forecast Error = Actual Value − Forecast
The forecast error can be positive or negative. A positive error indi-

cates the forecasting method underestimated the actual value, whereas
a negative forecast error indicates that the forecasting method overesti-
mated the actual value. The forecasting error is assessed using the follow-
ing measures.
Mean Error
Mean or the average forecast error is the simplest measure of forecast ac-
curacy. Since the error can be positive or negative, the positive and nega-
tive forecast errors tend to offset one another, resulting in a small value of
the mean error. Therefore, mean forecast error is not a very useful measure.
Mean Absolute Error
The mean absolute error (MAE) is also known as mean absolute deviation
(MAD). It is the mean of the absolute values of the forecast errors. This
avoids the problem of offsetting the positive and negative mean errors.
The MAD can be calculated as:
MAD =
∑ Actual − Forecast
n
MAD shows the average size of the error (or average deviation of
forecast from the actual data). Note that n is the number of forecasts
generated.
Mean Squared Error
This is another measure of forecast error that avoids the problem of posi-
tive and negative errors. It is the average of the squared forecast errors
(mean squared error, MSE) and is calculated using
MSE =
∑ ( Actual − Forecast )2
n −1
Mean Absolute Percentage Error
The MAE or MAD and the MSE depend upon the scale of the data. This
makes it difficult to compare the error for different time intervals. The
mean absolute percentage error (MAPE) provides a relative or percent
error measure that makes the comparison easier. The MAPE is the average
of the absolute percentage forecast errors and is calculated using:
MAPE =
∑ Actual − Forecast / Actual * 100
n
Tracking Signal
Ratio of cumulative error to MAD
Tracking Signal =
∑ ( Actual − Forecast )
MAD
MAD shows the average size of the error (or average deviation of fore-
cast from the actual data).
Bias is the persistent tendency for the forecasts to be greater or smaller
than the actual values. It indicates whether the forecast is typically too low
or too high and by how much. Thus, the bias shows the average total error
and its direction.
Tracking signal uses both bias and MAD and can also be calculated as:
Bias
Tracking Signal =
MAD
The value of bias ranges from −1 to +1.

Ratio approaching −1 indicates that all or most of the forecast errors
tend to be negative (i.e., the forecasts are too low). Ratio approaching +1
indicates that all or most of the forecast errors tend to be positive (i.e., the
forecasts are too high).
Demonstration of Forecasting Errors
We will demonstrate the computation of some of the measures of fore-

cast accuracy above using the simplest of the forecasting methods. The
method is known as the naïve forecasting method.
Forecasting Methods
Naïve Forecasting Method
This method uses the most recent observation in the time series as the
forecast for the next time period and generates short-term forecast.
The weekly demand (for the past 21 weeks) for a particular brand
of cell phone is shown in Table 8.1. We will use the naïve forecasting
method to forecast one week ahead and calculate the forecast accuracy
by calculating the errors. The data and the forecast along with the fore-
cast errors, absolute errors, squared errors, and absolute percent errors are
shown in Table 8.1.
Note that this method uses the most recent observation in the time
series as the forecast for the next time period. Thus, the forecast for the
next period
^
X t +1 = Actual Value in Period t
Table 8.1 Weekly demand and forecast for a product
Using the values from the Total row in Table 8.1, we can calculate the
forecast accuracies or errors as shown in Table 8.2.
Table 8.2 Forecast errors for naïve forecasting

MAE or MAD 1059
MAE = = 52.95
20
MSE 88373
MSE = = 4418.65
20
MAPE 500.56
MAPE = = 25.01%
20
The above measures are used in selecting the forecasting method for
the data by comparing them to the measures calculated using other meth-
ods. Usually a small deviation (MAD) or MAPE is an indication of better
forecast.
Forecasting Models Based on Averages

Here we present examples of different forecasting models that are based
on the averages. These methods are most appropriate for the time series
with horizontal or constant pattern. The methods are outlined below.
Techniques Using Average
(a) Simple moving average; (b) weighted moving average; (c) expo-
nential smoothing
The above methods are used for short-range forecast and are also known
as smoothing methods because their objective is to smooth out the random
fluctuations in the time series. A computer software is almost always used
to study the trend or the time series characteristics of the data. The ex-
amples below show the analysis of the class of forecasting techniques that
are based on averages.
Simple Moving Average

The first of these methods is known as the moving average or simple mov-
ing average method and is a short-term forecasting method. This method
uses the average of the most recent N data values in the time series as the
forecast for the next period and is most appropriate for the data showing
a horizontal or constant pattern. This pattern exists when the data fluc-
tuate around a constant mean (see Figure 8.1). Sometimes, a time series
depicting a horizontal pattern can shift to a new level due to changes in
the business conditions. When this shift occurs, it is often difficult to
apply an appropriate mathematical model to forecast the horizontal pat-
tern and the shift. Here we deal with the time series showing a constant
or horizontal pattern without a shift.
In the moving average forecasting method, the term moving means
that every time a new observation becomes available for the time series,
the oldest value is discarded and the average is calculated using the most
recent observations in the series. This results in a move or change in the
average that keeps changing as new observations become available.
Illustration of Simple Moving Average Method
The weekly demand (for the past 65 weeks) for a particular brand of cell
phone is used to demonstrate the simple moving average method. The
partial data are show in Table 8.3. The plot of complete data is shown in
Figure 8.8.
Table 8.3 Demand data

Demand
Row Week (XT)
1 1 158
2 2 222
3 3 248
4 4 216
5 5 226
6 6 239
7 7 206
8 8 178
9 9 169
10 10 177
11 11 290
12 12 245
13 13 318
14 14 158
15 15 274
: 16 255
:
65
Figure 8.8 Time series plot of demand data

Since the data show a horizontal or constant model, simple moving

average is an appropriate method to forecast this type of pattern.
We used a six-period moving average to forecast one period ahead.
Figure 8.9 shows the plot of actual demand data and one-period ahead
forecast. Note how the forecast responds to the actual demand. This ex-
ample shows a six-period moving average, which means that N = 6 and the
most recent six periods of demand are used to calculate the moving aver-
age. We can also change the period of the moving average to a lower value.
Figure 8.9 Plot of actual data and six-period moving average forecast
A smaller value of N will respond to the shifts in a time series more

quickly than a larger value of N. We have also shown a three-period mov-
ing average forecast in Figure 8.10 for the same data. Note the effect
of lowering the moving average period in forecasts generated. A smaller
value tracks the shifts in the time series more quickly and may generate a
better forecast. Table 8.4 shows the accuracy measures. The three-period
moving average forecast has less deviation (MAD) and a smaller MAPE
so this should be preferred.
The accuracy measures for the six- and three-period moving average
are shown in Table 8.4.
Compare the forecast accuracies generated using a six-period mov-
ing average and a three-period moving average in Table 8.4. The fore-
cast using a three-period moving average has smaller deviation, and these
Figure 8.10 Plot of actual data and three-period moving average

forecast
Table 8.4 Accuracy measures

Moving average length 6 Moving average length 3
MAPE 20.50 MAPE 19.45
MAD 46.23 MAD 43.89
MSD 3189.40 MSD 3134.26
forecasts are responding better to the actual data compared to the six-
period moving average. Usually, a smaller averaging period will produce
a better forecast.
Formulas and Sample Calculations
To demonstrate the calculations of moving average, consider the first

15 values of the demand data in Table 8.3. We have shown the computation
for a six-period moving average. The methods can be used for any other
moving average period.
N-period simple moving average can be calculated using the following
formula:
X T + X T −1 + X T − 2 + + X T − N +1
MT = (8.1)
N

T = no. of observations, N = no. of periods in the moving average,

MT = N-period simple moving average.
General Equation:
XT − XT − N
MT = MT −1 +
N (8.2)
One-Period Ahead Forecast:
XˆT + τ (T ) = MT (8.3)
Equation (8.3) is the forecasting equation. It states that the forecast

for the next period is the moving average of the previous period. For ex-
ample, the moving average for period 6 is the forecast for period 7.
Sample Calculations
Refer to the first 15 values of demand from Table 8.5 for sample calculation.
Table 8.5 Data and forecast

One-period
6-Period MA Ahead Residual
Row Week Demand XT MT Forecast (Error)
1 1 158 * * *
2 2 222 * * *
3 3 248 * * *
4 4 216 * * *
5 5 226 * * *
6 6 239 218.167 * *
7 7 206 226.167 218.167 −12.167
8 8 178 218.833 226.167 −48.167
9 9 169 205.667 218.833 −49.833
10 10 177 199.167 205.667 −28.667
11 11 290 209.833 199.167 90.833
12 12 245 210.833 209.833 35.167
13 13 318 229.500 20.833 107.167
14 14 158 226.167 229.500 −71.500
15 15 274 243.667 226.167 47.833
To calculate a 6-period moving average (N = 6):
X1 = 158, X2 = 222, X3 = 248, X4 = 216, X5 = 226, X6 = 239
Use equation (8.1) to calculate the 6-period moving average. Set:

T = 6 and N = 6
X T + X T −1 + X T − 2 + + X T − N +1
MT =
N
X 6 + X 5 + X 4 + X 3 + X 2 + X1
M6 =
6
239 + 226 + 216 + 248 + 222 + 158
M6 = = 218.17
6
In Table 8.5: Week is the time, Demand is the actual demand XT,
MA = moving average, Forecast = one-period ahead forecast, Error is the
difference between the actual and the forecast values (it is a measure of
deviation of actual and the forecast values).
Using equation (8.2) calculate the moving averages. Note that you
need to use equation (8.1) once.
XT − XT − N
MT = MT −1 +
N
Set: T = 7, N = 6
XT − XT − N
MT = MT −1 +
N
X 7 − X1
M7 = M6 +
6
206 − 158
= 218.17 + = 226.17
6
In the computations shown below, note that each time the most re-
cent value is included in the average and the oldest one is discarded. To
calculate the next moving average,
Set: T = 8, N = 6
XT − XT − N
MT = MT −1 +
N
X8 − X 2
M8 = M 7 +
6
178 − 222
= 226.17 + = 218.33
6
Set: T = 9, N = 6
XT − XT − N
MT = MT −1 +
N
X9 − X3
M9 = M8 +
6
169 − 248
= 218.83 + = 205.66
6
Set: T = 10, N = 6
XT − XT − N
MT = MT −1 +
N
X 10 − X 4
M10 = M9 +
6
177 − 216
= 205.66 + = 199.167
6
The rest of the moving averages and forecasts are shown in Table 8.5.
Since we calculated a six-period moving average, the forecast for the 7th
period is just the moving average for the 6 periods.
The forecasts for the complete data (with 65 periods) were generated
using a computer software. Figures 8.9 and 8.10 showed the actual data
and forecasts plotted on the same graph for a six-period and three-period
moving average for all 65 periods of data. The forecast errors for these two
moving average periods were shown in Table 8.4.
Calculating and Comparing Forecast Errors

In Table 8.6 we have calculated the forecast errors for the demand data
using a three-period moving average for the first 21 values used to cal-
culate the forecasts and errors. Table 8.7 summarizes the forecast errors.
Table 8.6 Three-period simple moving average, forecasts, and errors
Table 8.7 Forecast errors for three-period moving average forecasts

MAE or MAD 699.333
MAE = = 38.85
18
MSE 52846.44
MSE = = 2935.91
18
MAPE 340.43
MAPE = = 18.91
18
Comparing the above error measures to naïve forecast method in

Table 8.2 we find that a three-period moving average provided a much
better forecast. Note: the forecast errors are used to compare the error
from different forecasting methods. Often, more than one method is used
to forecast different sets of data. In such cases, the forecast errors are the
measures of the best forecast.
Weighted Moving Averages

Unlike the simple moving average method where every data value is given
the same weight, the weighted moving average uses different weights for
each of the data values. In this method, we first select the number of data
values to be included in the average, then choose the weight for each of
the data values. The more recent observations are given more weights
compared to the older observations. The sum of the weights for the data
values included in the average is usually 1.0.
In Table 8.8, we used Excel to calculate a 4-period simple moving
average and 4-period weighted moving average forecasts for the 21 per-
iods of sales data in column B. Column C shows the 4-period simple
moving average forecasts and column D shows 4-period weighted mov-
ing average forecasts. The weights used for the four data points are 0.1,
0.2, 0.3, and 0.4 and are denoted using W(1) through W(4) shown in
columns A and B. Columns E to H show the forecast errors and absolute
errors for the simple and weighted 4-period forecasts.
Table 8.8 Four-period simple moving average and weighted moving average
forecasts and errors
The MAE and the MAD—the measures of forecast accuracy—are cal-

culated and shown in the worksheet. Type the values in columns A and B
from row 1 to row 30 then type the formulas in the indicated cells shown
in Table 8.9 to get the results.
The MAE for the two methods are shown below. The 4-period weighted
moving average has smaller overall error and should be preferred over the
4-period simple moving average forecast. Figure 8.11 shows the 4-period
simple moving average and 4-period weighted moving average forecasts.
Table 8.9 Instructions to calculate 4-period simple and 4-period

weighted moving average
Column (2): Actual sales
Column (3): Forecast using 4-period simple moving average
In Cell C14, type ’=AVERAGE(B10:B13) and copy to C24
Column (4): 4-Period weighted moving average forecast using the weights in B3:B6
In Cell D14, type ’=(B10*B$3)+(B11*B$4)+(B12*B$5)+(B13*B$6) and copy to D24
4-period Simple Moving 4-period Weighted Moving

Average Average
MAE or MAD
MAE = 45.8 MAE = 43.0
Figure 8.11 4-Period simple moving average and 4-period weighted

moving average forecasts
Simple Exponential Smoothing Method

This method can be used in place of simple moving average. When the
data are stable or show a horizontal pattern (see the plot for simple mov-
ing average in Figure 8.1), the simple exponential smoothing can be used.
Exponential smoothing takes the forecast for the prior period and
adds an adjustment to obtain the forecast for the next period. A smooth-
ing constant, α, provides weight to the actual data and to the prior fore-
cast value. The forecast is affected by changing the value of α.
The value of α is selected based on the noise or error in the data.

Higher levels of α do not always result in more accurate forecasts. Ex-
perimentation with different α levels is advised in order to obtain forecast
accuracy. Many computer programs provide option to optimize α.
Formula for Simple Exponential Smoothing: The following for-
mula is used to determine one-period ahead forecast.
Ft = α At −1 + (1 − α )Ft −1
where
Ft = forecast for period t, the next period, Ft–1 = forecast for period (t−1),
the prior period
At–1= actual data for (t−1), the prior period, α = smoothing constant
0≤α ≤1
Example: Develop one-period ahead forecast of the sales data in

Table 8.10. The initial forecast F1 is 393 calculated from the past data and
the smoothing constant, α = 0.1. The exponential smoothing method
Table 8.10 Sales and one-period ahead forecast using simple

exponential smoothing
Week Actual Sales (At) One-period Ahead Forecast, Ft
1 330 *
2 410 387
3 408 389
4 514 391
5 402 403
6 343 403
7 438 397
8 419 401
9 374 403
10 415 400
11 451 402
12 333 407
13 386 399
14 408 398
15 333 399
16 392
requires the initial forecast to generate additional forecasts. The initial

forecast is determined in different ways—it may be the first value in the
data set or the average of historical data.
Figures 8.12 and 8.13 show the plot of actual data and the forecast.
The calculations are explained below.
Figure 8.12 Plot of actual sales
Figure 8.13 Actual sales and forecast

Sample Calculations
Forecast for periods 2 through 5 using the forecasting equation:
Ft = α At −1 + (1 − α )Ft −1
Note: the smoothing constant, α = 0.1 and the initial forecast or the
forecast for the first period, F1 = 393
The forecasts for periods 2,3,4,… are shown below:
F2 = α A1 + (1 − α )F1 = (0.1)(330) + (0.9)(393) = 386.7 ≈ 387
F3 = α A2 + (1 − α )F2 = (0.1)(410) + (0.9)(386.7 ) = 389.03 ≈ 389
F4 = α A3 + (1 − α )F3 = (0.1)(408) + (0.9)(389.03) = 390.93 ≈ 391
F5 = α A4 + (1 − α )F4 = (0.1)(514) + (0.9)(390.93) = 403.24 ≈ 403
… and so on.
Another Example on Simple Exponential Smoothing for Inven-
tory Demand. The operations manager at a company talks to an an-
alyst at company headquarters about forecasting monthly demand for
inventory from her warehouse. The analyst suggests that she considers
using simple exponential smoothing with smoothing constant of 0.3. The
operations manager decides to use the most recent inventory demand (in
thousands of dollars) shown below. From the past experience, she decided
to use 99.727 as the forecast for the first period. Use the simple exponen-
tial smoothing using α = 0.3 and F1 = 99.727 to develop the forecast for
months 2 through 11 for the data in Table 8.11. What is the MAD?
The results are shown in Table 8.11. MINITAB statistical software
was used to generate the forecast.
The inventory demand data and the forecast are plotted and shown
in Figure 8.14.
To see the effect of the smoothing constant α on the forecasts, two
sets of forecasts were generated with α = 0.3 and α = 0.1 and accuracy
measures were calculated. These are shown in Table 8.12.
Table 8.11 Actual data and forecasts of inventory length 11

Month Inventory Demand At Forecast Ft Residual (Error)
1 85 99.727 −14.7273
2 102 95.309 6.6909
3 110 97.316 12.6836
4 90 101.121 −11.1215
5 105 97.785 7.2150
6 95 99.950 −4.9495
7 115 98.465 16.5353
8 120 103.425 16.5747
9 80 108.398 −28.3977
10 95 99.878 −4.8784
11 100 98.415 1.5851
Figure 8.14 Inventory demand data and the forecast
Table 8.12 Forecast accuracy for different values of the smoothing

constant α
Smoothing Constant, a = 0.3 Smoothing Constant, a = 0.1
Accuracy Measures Accuracy Measures
MAPE 11.842 MAPE 10.620
MAD 11.396 MAD 10.339
MSD 182.150 MSD 152.976
Changing α from 0.3 to 0.1 produced better forecast with less error
values. Both the MAD and MAPE decreased for smaller α. There is a way
of obtaining an optimal value of smoothing constant. The forecast using
exponential smoothing depends on the value of α; therefore, an optimal
value of α is recommended.
Example of Moving Average with a Trend or Double

Moving Average
Forecasting Data with a Trend
The previous forecasting methods were applied to the time series data that
did not show any trend. For the data showing a trend, the simple moving
average method will not provide correct forecasts.
A trend in the time series is identified by a gradual shift or move-
ments to relatively higher or lower values over a period of time. A trend
may be increasing or decreasing and may be linear or nonlinear. Some-
times an increasing or decreasing trend may depict a fluctuation around
an average. Some examples of trend may be changes in populations,
sales and revenue of a company, and demand for a particular technol-
ogy of consumers items showing increasing or decreasing demand.
Figure 8.15 shows the actual sales and double moving average forecast
Figure 8.15 Sales and forecast using double moving average

for a company for the past 65 weeks (the dotted line represents the
forecast). Table 8.13 shows partial data. The time series clearly shows
an increasing trend. The appropriate method to forecast this pattern
is double moving average or the moving average with a trend. Double
moving average is the average of simple moving average. The forecasting
equation in this method is designed to incorporate both the average and
trend component.
Demonstration of Double Moving Average Forecasting: The double

moving average technique is explained using limited data. The example
uses the first 12 values of the sales data. Suppose we want to forecast the
sales data in Table 8.13 using a 5-period double moving average.
Table 8.13 Sales data

Row Week Sales XT
1 1 35
2 2 46
3 3 51
4 4 46
5 5 48
6 6 51
7 7 46
8 8 42
9 9 41
10 10 43
11 11 61
12 12 55
13 13 67
14 14 42
15 15 61
: : 58
: : 49
65 65 74
Sample Calculations: Double Moving Average

Table 8.14 shows the simple and double moving averages and one-week
ahead forecast. The calculations are explained below.
Table 8.14 Sales data and double moving average calculations

(1) (2) (3) (4) (5) (6)
Double 0ne-Week
Simple Moving. Moving Ahead
Row Week Sales Average MT Average Forecast
1 1 35 * * *
2 2 46 * * *
3 3 51 * * *
4 4 46 * * *
5 5 48 45.2 * *
6 6 51 48.4 * *
7 7 46 48.4 * *
8 8 42 46.6 * *
9 9 41 45.6 46.84 *
10 10 43 44.6 46.72 43.74
11 11 61 46.6 46.36 41.42
12 12 55 48.4 46.36 46.96
Refer to Table 8.14. The 5-period simple moving averages in column

(3) are calculated using equations (8.1) and (8.2) of simple moving aver-
age explained earlier. This column is labeled MT .
The 5-period double moving average in column (5) labeled MT2 in
Table 8.14 is calculated using the formulas:
MT + M T −1 + MT − 2 + + MT − N +1
MT2 = (8.4)
N
where
MT2 = N-period double moving average
N = no. of periods in moving average
T = no. of observations
MT = N-period simple moving average
General Equation:
MT − MT − N
MT[ 2 ] = MT[ 2−]1 + (8.5)
N
One-period ahead forecast is calculated using
 2 
XˆT + τ (T ) = 2 MT − MT[ 2 ] + τ 
 N − 1 
( )
MT − MT[ 2 ] (8.6)

Note: τ is always =1 as we are generating one-period ahead forecast.

Sample calculations for column (5) in Table 8.14 are as shown.
For the first calculation, we use the equation
MT + M T −1 + MT − 2 + + MT − N +1
MT2 =
N
Set T = 9, N = 5 and using the values in Table 8.14
M9 + M 8 + M7 + M6 + M5
M 9[ 2 ] =
5
45.60 + 46.60 + 48.40 + 48.40 + 45.20
M 9[ 2 ] = = 46.84
5
For other calculations, use the general equation (8.5)
MT − MT − N
MT[ 2 ] = MT[ 2−]1 +
N
Using this equation, calculate the other double moving average values as
shown in column (5) of Table 8.14.
Set T = 10, N = 5
M10 − M 5 44.60 − 45.20

[2]
M10 = M 9[ 2 ] + = 46.84 + = 46.72
5 5
Set T = 11, N = 5
M11 − M 6 46.60 − 48.40

[2]
M11 = M10
[2]
+ = 46.72 + = 46.36
5 5
… and so on.
Calculating one-period ahead forecast shown in column (6) of Table 8.14

The forecasting equation is [equation (8.6) above]:
 2 
XˆT + τ (T ) = 2 MT − MT[ 2 ] + τ 
 N − 1 

(
MT − MT[ 2 ] )
Forecast for the 10th week using the first 9 periods of data (note τ is
always =1 because of one-period ahead forecast)
Set T = 9, τ = 1
 2 
Xˆ 9 +1(9) = 2 M 9 − M 9[ 2 ] + 1  (
 5 − 1 
M 9 − M 9[ 2 ] )
1
Xˆ10 (9) = 2(45.60) − 46.84 + ( 45.60 − 46.84 ) = 43.74 (shown in
2
Table 8.14 column 6)
Forecast for the 11th week

Set T = 10, τ = 1
 2 
Xˆ10 +1(10) = 2 M10 − M10
[2]
+ 1
 5 − 1 

M10 − M10(
[2]
)
1
Xˆ11(10) = 2(44.60) − 46.72 + ( 44.60 − 46.72 ) = 41.42
2
… and so on.
The rest of the forecasts and complete data are shown in Appendix A.
Forecasting Data Using Different Methods and

Comparing Forecasts to Select the Best Forecasting
Method
Computer Applications and Implementation—Selecting the Best
Forecasting Method
An investment analyst for a financial planning business in San Diego,

California, has been asked to suggest a forecasting approach to predict the
next-day closing price of XYZ Analytics Inc. common stock. The analyst
has obtained the closing stock prices for the past 40 days (see Appendix).
A) Forecast the stock price for days 3 through 41 using a 3-period mov-
ing average and calculate the forecast errors: MAD, MAPE, and MSD.
Plot the actual data and the forecast on one plot. Use a 6-period mov-
ing average to forecast the stock price data.
B) Use the simple exponential smoothing method to forecast periods 1
through 41 of the stock price. Note that the forecast for period 1 is the
actual price of day 1 (which is 43.50). Use the smoothing constant α
of 0.4. Then increase the value of α to 0.804 and develop your fore-
cast with this α value. Calculate the MAD, bias, and tracking signal
for α = 0.4 and for α = 0.804. The forecast and the error values should
be rounded to four decimal places.
C) Compare the MAD values in parts (a) and (b) and decide which fore-
casting approach to use. What does the bias and tracking signal tell
you? Make a table as shown below and show your values.
Figures 8.16 through 8.19 show the plots of actual data and the fore-
casts using moving average and exponential smoothing methods. The fore-
cast accuracies for comparison purposes are provided below the figures.
Figure 8.16 3-Period moving average forecast of stock price

Figure 8.17 6-Period moving average forecast of stock price
A close examination of the forecasts shows that all these methods pro-
vided good short-term forecast of the stock values. However, the forecasts
using the exponential smoothing with a smoothing constant (α = 0.8)
has the least MAD and also the MAPE.
Forecast Accuracies Using Moving Average Method

Moving Average Length 6 Moving Average Length 3
MAPE 1.84176 MAPE 1.35353
MAD 1.67794 MAD 1.22973
MSD 3.56691 MSD 1.95503
Figure 8.18 Exponential smoothing forecast of stock price (a = 0.4)

Figure 8.19 Exponential smoothing forecast of stock price (a = 0.8)
Forecast Accuracies Using Exponential Smoothing Method

Smoothing Constant, α = 0.4 Smoothing Constant, α = 0.8
MAPE 1.39252 MAPE 1.11689
MAD 1.26455 MAD 1.01394
MSD 2.16331 MSD 1.51690
The exponential smoothing method should be implemented to forecast the

stock price data. This method is also preferred over the moving average method
because unlike the moving average method that requires past several periods
of data depending upon the length of the moving average forecast, the expo-
nential smoothing method just requires the forecast for the previous period to
generate the forecast for the next period.
Forecasting Seasonal Time Series Data
(Example): The plant manager of Computer Products Corporation (CPC)

wants to plan cash, personnel, and materials requirement for each quarter of
next year. The quarterly demand data for the past three years are to be used
to forecast for the four quarters of the following year or year 11. If the man-
ager could estimate quarterly demand for next year, the cash, personnel, and
materials needs could be determined. The quarterly demand (in thousands
of units) for the past three years (years 8, 9, 10) are shown in Table 8.15.
Table 8.15 Quarterly demand for years 8, 9, and 10

Year Q1 Q2 Q3 Q4 Annual Total
8 520 730 820 530 2,600
9 590 810 900 600 2,900
10 650 900 1,000 650 3,200
Totals 1,760 2,440 2,720 1,780 8,700
Step 1. Plot the data: Quarter vs. Sales. Figure 8.20 shows the plot of the
demand. This plot clearly shows a seasonal pattern.
Figure 8.20 Historical data of quarterly sales
2. Calculate the seasonal index for each quarter as shown in Table 8.16.
The formula to calculate the seasonal index is explained below the table.
Table 8.16 Seasonal index for each quarter

Year Q1 Q2 Q3 Q4 Annual Total
8 520 730 820 530 2,600
9 590 810 900 600 2,900
10 650 900 1,000 650 3,200
Totals 1,760 2,440 2,720 1,780 8,700
Quarter Average 586.67 813.67 906.67 593.33 Overall quarter aver-
age = 8,700/12 = 7 25
Seasonal Index 0.809 1.122 1.251 0.818
Overall quarter average = Sum of all quarters/total no. of quar-

ters = 8,700/12 = 725
Seasonal Index = Quarter Average/Overall Quarter Average
3. Deseasonalize the data by dividing each quarterly value by it seasonal
index as shown in Table 8.17.
Table 8.17 Deseasonalized data

Row Quarter Sales Seasonal Index Deseasonlized Data
1 1 520 0.809 642.8
2 2 730 1.122 650.6
3 3 820 1.251 655.5
4 4 530 0.818 647.9
5 5 590 0.809 729.3
6 6 810 1.122 721.9
7 7 900 1.251 719.4
8 8 600 0.818 733.5
9 9 650 0.809 803.5
10 10 900 1.122 802.1
11 11 1000 1.251 799.4
12 12 650 0.818 794.6
4. Plot the deseasonalized data. Figure 8.21 shows the plot of deseason-
lized data.
Figure 8.21 Plot of deseasonlized data

5. Since the data show an increasing trend (see plot above), perform a
regression analysis on the deseasonalized data (x is quarter and y is
deseasonalized data). The computer result is shown in Figure 8.22
and the regression equation is shown below.
Regression Analysis: Y (Deseasonalized) Data versus x (Quarter)

The regression equation is:
Y = 615.419 + 16.8652 x
S = 22.3799 R-Sq = 89.0%
Figure 8.22 Regression on deseasonlized data
6. Use the regression equation to forecast for quarters 13, 14, 15, and
16 of the following year or year 11. These are deseasonalized fore-
casts for the next four quarters of next year (note that quarter 13 is
the 1st quarter of the next year, quarter 14 is the 2nd quarter of the
next year, and so on).
y = 615.419 =16.8652x
y13 = 615.419 = 16.8652(13) = 834.67
y14 = 615.419 = 16.8652(14) = 851.53
y15 = 615.419 = 16.8652(15) = 868.39
y16 = 615.419 = 16.8652(16) = 885.26
7. Multiply the deseasonalized forecast for each quarter with the sea-
sonal index to get the seasonalized forecast. The forecasts are shown
in Table 8.18.
Table 8.18 The seasonalized forecast

Seasonal Deseasonalized Seasonalized Forecast
Quarter Index Forecast (rounded)
Q1 0.809 834.67 675
Q2 1.122 851.53 955
Q3 1.251 868.39 1,086
Q4 0.818 885.26 724
8. The actual data (for the first 12 quarters) and seasonal forecast (next
4 quarters 13 to 16) are shown in Table 8.19.
Table 8.19 Forecast for the four

quarter of next year (year 11)
Quarter. Seasonal Forecast
1 520
2 730
3 820
4 530
5 590
6 810
7 900
8 600
9 650
10 900
11 1,000
12 650
13 675
14 955
15 1,086
16 724
9. Plot the actual data and the forecast. Figure 8.23 shows the plot of
actual data and the forecast for the next quarter. Note how the fore-
cast follows the seasonal trend.
Figure 8.23 Actual demand data (first 12 quarters) and the forecasts
for the next four quarters (quarters 13 through 16)
Associative Forecasting Techniques
• Simple Regression
• Multiple Regression Analysis
Regression and regression analysis are widely used forecasting tech-

niques and are critical parts of predictive analytics. In this text, we dis-
cussed the most widely used regression methods including the simple,
multiple, and nonlinear regression. We also discussed regression methods
involving qualitative or indicator variables in Chapter 7.
Summary
This chapter discussed forecasting techniques. Forecasting is a critical part
of predictive analytics and involves predicting future business activities
including the sales, revenue, workforce requirements, demand, and in-
ventory, to name a few. Forecasts affect decisions and activities through-
out an organization. Produce-to-order and produce-to-stock companies
depend on forecast for production and operations planning. Inventory
planning and decisions are affected by forecast. The companies with good
forecasting in place are able to balance the demand and supply, thereby
reducing inventory carrying cost. Following are some of the forecasting

methods discussed in this chapter:
Simple moving average

Weighted moving average
Exponential smoothing
Linear trend equation (similar to simple regression)
Double moving average or moving average with trend
Exponential smoothing with trend or Trend-adjusted exponential
smoothing
Forecasting data with seasonal pattern
The associative forecasting methods were discussed in the previous

chapter (Chapter 7). These models include: simple regression, multiple
regression analysis, nonlinear regression, and regression involving cat-
egorical or indicator variables. We presented detailed examples of the
above methods along with computer applications.
CHAPTER 9
Data Mining: Tools

and Applications in
Chapter Highlights
• Introduction to Data Mining
• Data Mining Defined
• Some Application Areas of Data Mining
• Machine Learning and Data Mining
• Data Mining and Its Origins and Areas It Interacts with
• Process of Data Mining and Knowledge Discovery in Databases
(KDD)
• Data Mining Methodologies and Data Mining Tasks
?? Data Preparation or Data Preprocessing, and
▪▪ Data cleaning
▪▪ Data integration
▪▪ Data selection
▪▪ Data transformation
?? Data Mining
▪▪ Pattern evaluation
▪▪ Knowledge representation
• Data Mining Tasks
?? Descriptive Data Mining
?? Predictive Data Mining
• Difference between descriptive and predictive data mining?

• Additional Tools and Applications of Predictive Analytics: Data
Mining Tasks
?? anomalies (or outlier) detection,

?? association learning,
?? classification,
?? clustering,
?? sequence, and
?? Time Series and forecasting
• Difference between Classification and Clustering

• Data Mining and Machine Learning
• Machine Learning Problems and Tasks
• Supervised and Unsupervised Machine Learning
• Artificial neural networks
• Deep Learning
• Summary
Introduction to Data Mining

Data mining involves exploring new patterns and relationships from the
collected data—a part of predictive analytics that involves processing and
analyzing huge amounts of data to extract useful information and pat-
terns hidden in the data. The overall goal of data mining is knowledge
discovery from the data. Data mining techniques are used to (i) extract
previously unknown and potentially useful knowledge or patterns from
massive amounts of data collected and stored, (ii) explore and analyze
these large quantities of data to discover meaningful pattern, and (iii)
transform data into an understandable structure for further use. The field
of data mining is rapidly growing, and statistics plays a major role in it.
Data mining is also known as knowledge discovery in databases (KDD),
pattern analysis, information harvesting, business intelligence, analytics,
etc. Besides statistics, data mining uses artificial intelligence (AI), ma-
chine learning, database systems, advanced statistical tools, and pattern
recognition.
Successful companies use their data as an asset and use them for
competitive advantage. These companies use business analytics and data
mining tools as an organizational commitment to data-driven decision
making. Business data mining combined with machine learning and AI
Data Mining: Tools and Applications in Predictive Analytics 241
techniques helps businesses in making informed business decisions. It is

also critical in automating and optimizing business processes.
Data Mining Defined
Data mining may be defined in the following ways:

• Data mining is Knowledge Discovery in Data Bases (KDD).
• Data mining is the extraction of interesting (nontrivial, implicit,
previously unknown, and potentially useful) patterns or knowl-
edge from huge amounts of data (big data).
• Data mining can also be seen as the exploration and analysis by
automatic or semi-automatic means of large quantities of data (big
data) in order to discover meaningful patterns.
Why Data Mining?
In this age of technology, companies collect massive amounts of data

automatically using different means. A large quantity of data is also col-
lected using remote sensors and satellites. With the huge quantities of
data collected today—usually referred to as big data—traditional tech-
niques of data analysis are infeasible for processing the raw data. The
data in its raw form have no meaning unless processed and analyzed.
Among several tools and techniques available and currently emerging
with the advancement of technology and computers, it is now possi-
ble to analyze big data using data mining, machine learning, and AI
techniques.
The other reason to mine data is to discover the hidden patterns and
relationship in the data. There is often hidden information in the data
that is not readily apparent, and it is usually difficult to discover using
traditional statistical tools. Sometimes it may take significant amount of
time to discover useful information using traditional methods.
Data mining automatically processes massive amounts of data using
specially designed software. A number of techniques, for example, classi-
fication and clustering, are used to analyze huge quantities of data. These
provide useful information to the analysts and are critical in analyzing
business, financial, or scientific data.
Some Application Areas of Data Mining
Data mining is one of the major tools of predictive analytics. In business,

data mining is used to analyze business data. Business transaction data along
with other customer and product-related data are continuously stored in the
databases. The data mining software is used to analyze the vast amount of
customer data to reveal hidden patterns, trends, and other customer behav-
ior. Businesses use data mining to perform market analysis to identify and
develop new products, analyze their supply chain, find the root cause of
manufacturing problems, study the customer behavior for product promo-
tion, improve sales by understanding the needs and requirements of their cus-
tomer, prevent customer attrition, and acquire new customers. For example,
Wal-Mart collects and processes over 20 million point-of-sale transactions
every day. These data are stored in a centralized database and are analyzed
using data mining software to understand and determine customer behavior,
needs, and requirements. The data are analyzed to determine sales trends and
forecasts, develop marketing strategies, and predict customer-buying habits
[http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/].
The success with data mining and predictive modeling has encour-
aged many businesses to invest in data mining to achieve a competitive
advantage. Data mining has been successfully applied in several areas of
business and industry including customer service, banking, credit card
fraud detection, risk management, sales and advertising, sales forecast,
customer segmentation, and manufacturing.
Data mining is “the process of uncovering hidden trends and pat-
terns that lead to predictive modeling using a combination of explicit
knowledge base, sophisticated analytical skills and academic domain
knowledge” (Luan, Jing, 2002). Data mining has been used successfully
in science, engineering, business, and finance to extract previously un-
known patterns in the databases containing massive amounts of data and
to make predictions that are critical in decision making and improving
the overall system performance.
In recent years, data mining combined with machine learning/AI is
finding larger and wider applications in analyzing business data, thereby
predicting future business outcomes. The reason for this is the growing
interest in knowledge management and in moving from data to informa-
tion and finally to knowledge discovery.
Machine Learning and Data Mining

Machine learning and data mining are similar in some ways and often
overlap in applications. Machine learning is used for prediction, based
on known properties learned from the training data, whereas data mining
algorithms are used for discovery of (previously) unknown patterns. Data
mining is concerned with KDD.
Data mining and its origins and areas it interacts with: Data min-
ing has multiple objectives including knowledge discovery form large data
sets and predicting the future business outcomes. Recently, machine learn-
ing, deep learning, and AI are finding more applications in data mining
techniques. Figure 9.1 shows different areas that data mining interacts.
Figure 9.1 Data mining, its origin, and areas of interaction
Process of Data Mining and Knowledge Discovery in Databases

One of the key objectives of data mining is knowledge discovery. A data
mining process involves multiple stages. KDD involves the activities lead-
ing up to actual data analysis and evaluation and deployment of the re-
sults. The activities in KDD are shown in Figure 9.2 and described below.
• Data Collection: The goal of this phase is to extract the data rel-
evant to data mining analysis. The data should be stored in a data-
base where data analysis will be applied.
244
Figure 9.2 The knowledge discovery in data mining (KDD) process
• Data Cleaning and Preprocessing: This phase of KDD involves data

cleansing and preparation of data to achieve the desired results.
The purpose is to remove the noise and irrelevant information
from the data set.
• Data Transformation: This phase is aimed at converting the data
suitable for processing and obtaining valid results. An example
would be transforming a Boolean column type to integer.
• Data Mining: The purpose of data mining phase is to analyze the
data using appropriate algorithm to discover meaningful patterns
and rules to produce predictive models. This is the most important
phase of the KDD cycle.
• Interpretation and Evaluation of Results: This final phase involves
selecting the valid models for making useful decisions. This phase
also involves pattern evaluation. Not all of the patterns determined
from the previous data mining phase may be meaningful. It is im-
portant to select the valid and meaningful patterns.
The KDD process depicted in Figure 9.2 involves a number of pro-

cesses. The entire process can be divided into two broad categories, each
involving a number of steps. These are:
1. Data preparation or data preprocessing

2. Data mining
Data preparation or preprocessing has several steps including:

(a) Data cleaning (b) Data integration
(c) Data selection (d) Data transformation
The above steps are necessary to prepare the data for further process-
ing. The steps provide clean or processed data so that data mining tasks
can be performed. The data mining tasks involve:
A) Data mining
B) Pattern evaluation
C) Knowledge representation
Figure 9.3 shows the data mining tasks in detail.

246
Figure 9.3 Data mining (KDD) process: data preprocessing and data mining tasks
To obtain useful information from the large volume of data is a com-

plex process and requires several tasks. As stated above, it requires data
preprocessing—the task that must be performed before data mining tech-
niques can be applied to obtain useful information.
The data preprocessing or preparation involves several steps including
data cleaning, data integration, data selection, and data transformation.
Once the processed or cleaned data are available, data mining techniques
are applied for pattern evaluation and obtaining knowledge and useful
information. The data preprocessing steps are briefly discussed below fol-
lowed by data mining tasks.
Data Cleaning
Data cleaning is the process of preparing and making data ready for
further processing. The data collected are raw data and are usually un-
structured, incomplete, noisy, have missing values, and are inconsistent.
The data may also be missing attributes, for example, a huge number of
customer data of a financial company may miss attributes like age and
gender. Such data are incomplete with missing values. Data may also have
outliers or extreme values. There may be recording errors, for example, a
person’s age may be wrongly recorded as 350 years.
The data available in data sources might be lacking attribute values.
For example, we may have data that do not include attributes for the
gender or age of the customers. These data are, of course, incomplete.
Sometimes the data might contain errors or outliers. An example is
an age attribute with value 200. It is obvious that the age value is
wrong in this case. The data could also be inconsistent. For example,
the name of an employee might be stored differently in different data
tables or documents. Here, the data are inconsistent. If the data are
not clean and structured, the data mining results would be neither
reliable nor accurate.
Data cleaning involves a number of techniques including filling in the
missing values manually, combined computer and human inspection, etc.
The output of data cleaning process is adequately cleaned data ready for
further processing.
Data Integration
Data integration is the process where data from different data sources are
integrated into one. Data lie in different formats in different locations
and could be stored in databases, text files, spreadsheets, documents, data
cubes, Internet, and so on. Data integration is a really complex and tricky
task because data from different sources may not match normally. For
example, suppose table A contains an entity, named customer-id, whereas
table B contains an entity named “number” instead of customer-id. In
such cases, it is difficult to ensure whether both these entities refer to the
same value. Metadata can be used effectively to reduce errors in the data
integration process. Another issue faced is data redundancy where the
same data may be available in different tables in the same database or are
available in different data sources. Data integration tries to reduce redun-
dancy to the maximum possible level without affecting the reliability of
data.
Data Selection
Data mining process uses large volumes of historical data for analysis.
Sometimes, the data repository with integrated data may contain much
more data than actually required. Before applying any data mining task or
algorithm, the data of interest needs to be separated, selected, and stored
from the available stored data. Data selection is the process of retrieving
the relevant data for analysis from the database.
Data Transformation
Data transformation is the process of transforming and consolidating

data into different forms that are suitable for mining. Data transforma-
tion normally involves normalization, aggregation, generalization, etc.
For example, a data set available as “−7, 57, 200, 99, 68” can be trans-
formed as “−0.07, 0.57, 2.00, 0.99, 0.68.” This may be more desirable
for data mining. After data integration, the available data are ready for
data mining. Data transformation may involve smoothing, aggregation,
generalization, or normalization of data.
Data Mining
Data mining is the core process that uses a number of complex methods
to extract patterns from data. This purpose of data mining phase is to
analyze the data using appropriate algorithms to discover meaningful pat-
terns and rules to produce predictive models. This is the most important
phase of KDD cycle.
Data mining process includes a number of tasks such as association,
classification, prediction, clustering, time series analysis, machine learning,
and deep learning. Table 9.1 outlines the data mining tasks.
Data Mining Methodologies: Data Mining Tasks
Data mining tasks can be broadly classified into descriptive data mining
and predictive data mining.
There are a number of data mining tasks such as classification, pre-
diction, time series analysis, association, clustering, and summarization.
All these tasks are either predictive or descriptive data mining tasks.
Figure 9.4 shows a broad view of data mining tasks.
Difference between Descriptive and Predictive Data Mining
Descriptive data mining tasks make use of collected data and data min-
ing methodologies to look into the past behavior, relationships, and
patterns to understand and explain what exactly happened in the past.
Predictive analytics employs various predictive data mining and sta-
tistical models including regression, forecasting techniques, and other
predictive models including simulation, machine learning, and AI to
understand what could happen in the future and predict future busi-
ness outcomes.
Predictive data mining uses models from the available data to pre-
dict future values of future business outcomes. An operations manager
using simulation and queuing models to predict the future behavior of a
call center to improve its performance can be considered as a predictive
data mining task. Descriptive data mining tasks use graphical visual and
numerical methods to find data describing patterns to learn about the
Table 9.1 Key data mining tasks
Data
250
Mining Brief Description Application Areas
Data Min- Data Mining involves exploring new patterns and relation- Data mining is one of the major tools of predictive analytics. In business, data mining is used
ing and ships from the collected data—a part of predictive analytics to analyze business data. Business transaction data along with other customer and product-re-
Tasks that involves processing and analyzing huge amounts of data lated data are continuously stored in the databases. The data mining software are used to an-
to extract useful information and patterns hidden in the alyze the vast amount of customer data to reveal hidden patterns, trends, and other customer
data. The overall goal of data mining is knowledge discovery behavior. Businesses use data mining to perform market analysis to identify and develop new
from the data. Data mining techniques are used to (i) ex- products, analyze their supply chain, find the root cause of manufacturing problems, study
tract previously unknown and potential useful knowledge or the customer behavior for product promotion, improve sales by understanding the needs and
patterns from massive amount of data collected and stored, requirements of their customer, prevent customer attrition, and acquire new customers. For
(ii) explore and analyze these large quantities of data to dis- example, Wal-Mart collects and processes over 20 million point-of-sale transactions every
cover meaningful pattern, and transform data into an under- day. These data are stored in a centralized database and are analyzed using data mining soft-
standable structure for further use. The field of data mining ware to understand and determine customer behavior, needs, and requirements. The data are
is rapidly growing and statistics plays a major role in it. Data analyzed to determine sales trends and forecasts, develop marketing strategies, and predict
mining is also known as KDD, pattern analysis, informa- customer-buying habits [http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/].
tion harvesting, business intelligence, analytics, etc. Besides The success with data mining and predictive modeling has encouraged many businesses
statistics, data mining uses AI, machine learning, database to invest in data mining to achieve a competitive advantage. Data mining has been
systems, advanced statistical tools, and pattern recognition. successfully applied in several areas of business and industry including customer service,
In this age of technology, companies collect massive banking, credit card fraud detection, risk management, sales and advertising, sales fore-
amounts of data automatically using different means. cast, customer segmentation, and manufacturing.
A large quantity of data are also collected using remote sen- Data mining is “the process of uncovering hidden trends and patterns that lead to predic-
sors and satellites. With the huge quantities of data collected tive modeling using a combination of explicit knowledge base, sophisticated analytical
today—usually referred to as big data, traditional techniques skills and academic domain knowledge” (Luan, Jing, 2002).
of data analysis are infeasible for processing the raw data. Data mining has been used successfully in science, engineering, business, and finance to extract
The data in its raw form have no meaning unless processed previously unknown patterns in the databases containing massive amounts of data and to make
and analyzed. Among several tools and techniques available predictions that are critical in decision making and improving the overall system performance.
and currently emerging with the advancement of technol- In recent years, data mining combined with machine learning/AI is finding larger and
ogy and computers, it is now possible to analyze big data us- wider applications in analyzing business data, thereby predicting future business out-
ing data mining, machine learning, and AI techniques. comes. The reason for this is the growing interest in knowledge management and in
moving from data to information and finally to knowledge discovery.
Figure 9.4 Data mining tasks
new information from the available data set that is not apparent other-
wise. Businesses use a number of data visualization techniques including
dashboards, heat maps, and a number of other graphical tools to study
the current behavior of their businesses. These visual tools are simple but
rather powerful tools in studying the current business behaviors and are
used in building predictive analytics models.
Additional Tools and Applications of Predictive Analytics: Data

Mining Tasks
Data mining is a software-based predictive analytics tool used to gain

insights from massive amounts of data by extracting hidden patterns and
relationships and using them to predict future business behaviors or out-
comes. Data mining uses a number of methodologies including anomalies
(or outlier) detection, patterns, association learning, classification, clustering,
sequence, and forecasting to predict the probabilities and future business
outcomes. We briefly describe them here. Figure 9.5 shows the broad
categories of data mining methodologies.
Association learning is used to identify the items that may co-occur
and the possible reasons for their co-occurrence. Classification and clus-
tering techniques are used for association learning.
252
Figure 9.5 Data mining methodologies
Anomaly detection is also known as outlier detection and is used to

identify specific events, or items, that do not conform to usual or expected
pattern in the data. Typical example would be the detection of bank fraud.
Classification and clustering algorithms are used to divide the data
into categories or classes. The purpose is to predict the probabilities of
future outcomes based on the classification. Clustering and classifica-
tion both divide the data into classes and, therefore, seem to be similar,
but they are two different techniques. They are learning techniques used
widely to obtain reliable information from a collection of raw data. Clas-
sification and clustering are widely used in data mining.
Classification
Classification is a process of assigning items to prespecified classes or categories.

For example, a financial institution may study the potential borrowers to pre-
dict whether a group of new borrowers may be classified as having a high de-
gree of risk. Spam filtering is another example of classification where the inputs
are e-mail messages that are classified into classes as “spam” and “no spam.”
Classification uses the algorithms to categorize the new data according
to the observations of the training set. Classification is a supervised learning
technique where a training set is used to find similarities in classes. This
means that the input data are divided into two or more classes or catego-
ries and the learner creates a model that assigns inputs to one or more of
these classes. This is typically done in a supervised way. The objects are
classified on the basis of the training set of data.
The algorithm that implements the classification is known as the
classifier. Some of the most commonly used classification algorithms are
K-Nearest Neighbor algorithm and decision tree algorithms. These are
widely used in data mining. An example of classification would be credit
card processing. A credit card company may want to segment customer
database based on similar buying patterns.
Clustering
Clustering technique is used to find natural groupings or clusters in a set

of data without prespecifying a set of categories. It is unlike classification
where the objects are classified based on prespecified classes or categories.

Thus, clustering is an unsupervised learning technique where a training set
is not used. It uses statistical tools and concepts to create clusters with
similar features within the data. Some examples of clustering are:
• Cluster houses in a town into neighborhoods based on similar fea-

tures like houses with overall value of over million dollars.
• Marketing analyst may define distinct groups in their customer
bases to develop targeted marketing programs.
• City planning may be interested in identifying groups of houses
according to their house value, type, and location.
• In cellular manufacturing, the clustering algorithms are used to
form clusters of similar machines and processes to form machine
component cells.
• Scientists and geologists may study earthquake epicenters to iden-
tify clusters of fault lines with high probability of possible earth-
quake occurrences.
Cluster Analysis
Cluster analysis is the assignment of a set of observations into subsets

(called clusters) so that observations within the same cluster are similar
according to some prespecified criterion or criteria, while observations
drawn from different clusters are dissimilar. Clustering techniques differ
in application and make different assumptions on the structure of the
data. In clustering, the clusters are commonly defined by some similarity
metric or similarity coefficient and may be evaluated by internal compactness
(similarity between members of the same cluster) and separation between
different clusters. Other clustering methods are based on estimated density
and graph connectivity. It is important to note that clustering is unsuper-
vised learning and the commonly used method in statistical data analysis.
Difference between Clustering and Classification
Clustering is an unsupervised learning technique used to find groups

or clusters of similar instances on the basis of features. The purpose of
clustering is a process of grouping similar objects to determine whether

there is any relationship between them. Classification is a supervised
learning technique used to find similarities in classification based on a
training set. It uses algorithms to categorize the new data according to the
observations in the training set. Figure 9.6 distinguishes between super-
vised and unsupervised learning techniques. Table 9.2 outlines the differ-
ences between classification and clustering.
Other Applications of Data Mining
Prediction
Predictive modeling is about predicting future business outcomes. Pre-

diction techniques use a number of models based on the available data.
These models can be simple to complex and may include simple and
other regression models, simulation techniques, analysis of variance
and design of experiments, and many others. For example, businesses
are often interested in creating models to predict the success of their
new products, or predict the sales and revenue from the advertisement
expenditures. A number of examples and cases were cited in the previ-
ous chapters. When it is difficult to create a single model involving a
number of variables, companies create simulation models to study and
predict future business outcomes. Simulation models have been used
successfully in studying the behavior of call centers, hospital emergency
services, fast food drive-through, and many others. Heath care sys-
tems have used prediction techniques in the diagnosis and prediction
of patient health. Prediction techniques also have applications in fraud
detection.
Time Series Analysis
Time series analysis involves data collected over time. Time series is a se-
quence of historical events over time and studies the past performance to
forecast or determine the future events where the next event is determined
by one or more of the preceding events.
A number of models are used to analyze time series data. The forecast-
ing chapter in this book discussed a number of time series patterns and
256
Figure 9.6 Supervised and Unsupervised Learning Techniques
Table 9.2 Difference between classification and clustering

Classification Clustering
Classification is supervised learning technique Clustering is unsupervised technique
where items are assigned to prespecified used to find natural groupings or clusters
classes or categories. For example, a bank in a set of data without prespecifying a
may study the potential borrowers to predict set of categories. It is unlike classification
whether a group of new borrowers may be where the objects are classified based on
classified as having a high degree of risk. prespecified classes or categories.
Classification algorithm requires training Clustering does not require training
data. data.
Classification uses predefined instances. Clustering does not assign predefined
label to each and every group.
In classification the groups (or classes) are In clustering the groups (or clusters)
prespecified with each training data instance are based on the similarities of data
that belongs to a particular class. instances to each other.
Classification algorithms are supposed to Unlike in classification, the groups are
learn the association between the features of not known beforehand, making this an
the instance and the class they belong to. unsupervised task.
Example: An insurance company trying to Example: Dropbox, a movie rental
assign customers into high-risk and low-risk company, may recommend a movie to
categories. customers because others who had made
similar movie choices in the past have
favorably rated that movie.
their analysis that includes a number of forecasting techniques. These fore-

casting techniques are simple to complex and use a number of techniques.
The future product, process, and workforce requirement planning start
with forecasts. Forecasting is a critical component of produce-to-stock
firms and impacts future sales, revenue, demand, inventory, and work-
force requirements. Time series analysis and forecasting techniques look
into the data collected over time to extract useful patterns, trends, rules,
and statistical models. The chapter on forecasting in this book outlines
a number of time series analysis and forecasting models. Stock market
prediction is an important application of time series analysis. These mod-
els have a number of applications in businesses including the stock mar-
ket and predicting power need requirements during peak summer hours
when the electricity requirement is highly variable and fluctuates rapidly
in a short span of time.
Summarization
Data summarization is the process of reducing the vast amount of data

into smaller sets that can provide a compact description of the data. Sev-
eral techniques are used to summarize the data that help to facilitate the
analysis of large amounts of data. One of the major objectives of data
mining is the knowledge discovery from the vast amount of data (KDD).
Data summarization techniques are applied to both structured and un-
structured data. These techniques expedite the KDD tasks by reducing
the size of the data. In this digital age, the collection and transfer of data
is a fast process. Businesses now work with big data collected from various
sources including media, networks, cloud storage, etc. The data may be
collected from audio and video sources and are both structured and un-
structured. Because of the volume and nature of data, it becomes necessary
to summarize the data for further analysis. The summarization techniques
result in smaller data set that facilitate further analysis. A number of tools
and techniques are discussed in the literature for data summarization.
Data Mining and Machine Learning
A number of data mining and machine learning methods overlap in ap-

plications. As indicated earlier, data mining is concerned with knowledge
discovery from the data (KDD) where the key task is the discovery of
previously unknown knowledge. Machine learning, on the other hand, is
evaluated with respect to known knowledge. According to Arthur Sam-
uel, machine learning gives “computers the ability to learn without being
explicitly programmed” [2, 3].
Machine learning methods use complex models and algorithms that
are used to make predictions. The machine learning models allow the
analysts to make predictions by learning from the trends, patterns, and
relationships in the historical data. The algorithms are designed to learn
iteratively from data without being programmed. In a way, machine
learning automates model building.
Recently, machine learning algorithms are finding extensive applica-
tions in data-driven predictions and are a major decision-making tool.
Some applications where machine learning has been used are e-mail fil-
tering, cyber security, signal processing, and fraud detection. Machine
learning is employed in a range of computing tasks. Although machine

learning models are being used in a number of applications, it has limi-
tations in designing and programming explicit algorithms that are re-
producible and have repeatability with good performance. With current
research and the use of newer technology, the fields of machine learning
and AI are becoming more promising.
It is important to note that data mining uses unsupervised methods
that usually outperform the supervised methods used in machine learn-
ing. Data mining is the application of knowledge discovery from the data
(KDD) where supervised methods cannot be used due to the unavail-
ability of training data. Machine learning may also employ data min-
ing methods as “unsupervised learning” to improve learner accuracy. The
performance of the machine learning algorithms depends on its ability to
reproduce known knowledge.
Machine Learning Problems and Tasks
Machine learning tasks have the following broad categories: supervised

learning, unsupervised learning, and reinforced learning. The dif-
ference between supervised and unsupervised learning is explained in
Table 9.3.
Other Applications of Machine Learning
Another application of machine learning is in the area of deep learning

that is based on artificial neural networks. In this application, the learn-
ing tasks may contain more than one hidden layer or tasks, with a single
hidden layer known as shallow learning.
Artificial Neural Networks
An artificial neural network learning algorithm, usually called “neural

network,” is a learning algorithm that is inspired by the structure and
functional aspects of biological neural networks. Computations are
structured in terms of an interconnected group of artificial neurons,
processing information using a connectionist approach to computation.
Table 9.3 Difference between supervised and unsupervised learning

Supervised Learning Unsupervised Learning
Supervised learning uses a set of input Unsupervised learning uses a set of
variables (x1,x2,...,xn) and an output variable, input variables but no output vari-
y(x). An algorithm of the form y=f(x) is used able. No labels are given to the
to learn the mapping function relating the learning algorithm. The algorithm
input to output. is expected to find the structure in
This mapping function or the model relating its input. The goals of unsupervised
the input and output variable is used to pre- learning may be finding hidden pat-
dict the output variable. The goal is to obtain tern in the large data or feature learn-
the mapping function that is so accurate that ing. Thus, unsupervised learning can
it can use even the new set of data That is, be a goal in itself or a means toward
the model can be used to predict the output an end that is not based on general
variable as the new data become available. rule of teaching and training the
The name supervised learning means that in algorithms.
this process the algorithm is trained to learn Unlike supervised learning algorithms,
from the training data set where the learning unsupervised algorithms are designed
process is supervised. In supervised learning to devise and find the interesting struc-
process, the expected output or the answer is ture in the data.
known. The algorithm is designed to make pre- The most commonly used unsupervised
dictions iteratively from the training data and is learning problems are clustering and
corrected by the analyst as needed. The learn- association problems.
ing process stops when the algorithm provides In clustering, a set of inputs is to be
the desired level of performance and accuracy. divided into groups. Unlike clas-
The most commonly used supervised problems sification, the groups are not known
are regression and classification problems. We beforehand, making this typically an
discussed regression problems earlier. Time unsupervised task.
series predictions problems, random forest for Association: Association problems
classification and regression problems, and are used to discover rules that describe
support vector machines for classification association such as people that buy X
problems also fall in this category. also tend to buy Y.
Modern neural networks are nonlinear statistical data modeling tools.

They are usually used to model complex relationships between inputs and
outputs, to find patterns in data, or to capture the statistical structure in
an unknown joint probability distribution between observed variables.
Deep Learning
Falling hardware prices and the development of graphics processing units

for personal use in the last few years have contributed to the development
of the concept of deep learning, which consists of multiple hidden layers
in an artificial neural network. This approach tries to model the way the
human brain processes light and sound into vision and hearing. Some
successful applications of deep learning are computer vision and speech
recognition.
Summary
This chapter introduced and provided an overview of the field of data
mining. Today, vast amounts of data are collected by businesses. Data
mining is an essential tool for extracting knowledge from massive
amounts of data. The tools of data mining are used in extracting knowl-
edge from the data—the process is known as KDD. The extracted in-
formation and knowledge are used in different models to predict future
business outcomes. Besides the process of data mining and KDD, the
chapter explained a number of data mining methodologies and tasks. We
outlined and discussed several areas where data mining finds application.
The essential tasks of data mining including data preparation or data pre-
processing, knowledge representation, pattern evaluation, and descriptive
and predictive data mining were discussed. The two broad areas of data
mining are descriptive and predictive data mining. We discussed both of
these areas and outlined the tools in each case with their objectives.
Data mining techniques are also classified as supervised and unsu-
pervised learning. We discussed the tasks of data mining that fall under
supervised and unsupervised learning. The key methodologies of data
mining including anomalies (or outlier) detection, association learning,
classification, clustering, sequence, prediction, and time series and fore-
casting along with their objectives were discussed. We also introduced
the current and growing application areas of data mining. Data mining
has wide applications in machine learning. The chapter introduced the
relationship between data mining and machine learning. Different types
of machine learning problems and tasks—supervised and unsupervised
machine learning, applications of data mining in using artificial neural
networks, and deep learning—were introduced.
CHAPTER 10
Wrap-Up, Overview, Notes

on Implementation, and
Current State of Business
Analytics
Overview
This book provided an overview of the field of business analytics (BA).
BA uses a set of methodology to extract, explore, and analyze big data. It
is about extracting information and making decisions from big data. BA
is a data-driven decision-making process.
The field of BA can be broken down into two broad areas: (1) business
intelligence (BI) and (2) statistical analysis. The flow diagram in
Figure 10.1 outlines the broad area of analytics.
Chapters 1, 2, and 3 provided an explanation on BI and BA. This
book mainly focuses on predictive analytics involving predictive analytics
models. Several chapters in the book are devoted to these models.
Broad Areas of Business Analytics
The broad area of BA can be broken down into: (1) BI and (2) statistical
analysis
Business Intelligence
BA comes under the broad umbrella of BI discussed in Chapter 3. BI
has evolved from business data reporting that involves examining histori-
cal data to gain an insight into the performance of a company over time.
264
Figure 10.1 Broad area of analytics
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 265
It involves a number of reporting tools, applications, and methodologies

that are used to collect a company’s data (both from internal and external
sources) for further analysis, develop queries to get useful information, and
create dashboards to aid in data visualization. All this information is used
by company executives for data-driven decisions. Figure 10.2 shows the
functions of BI and different forms of analytics as applied to different areas.
Howard Dresner is credited with first proposing the term BI in 1989,
which evolved as the application of data analysis techniques to support
business decision making. The tools of BI come from earlier decision sup-
port systems (DSSs). A DSS is a collection of applications, algorithms, and
computer programs or software designed to analyze and solve specific
problems. They use an iterative process that can automate problem solv-
ing. These computer-based models are important decision-making tools
and can provide a different decision to aid in the decision-making process.
Statistical Analysis
The field of analytics is about driving business decisions using data.
Therefore, statistical analysis is at the core of BA. A number of statistical
techniques and models—from descriptive and data visualization tools to
analytics models—are applied for drawing meaningful conclusions from
the data. Statistical analysis involves performing data analysis and creating
statistical models and can be broken down into the following categories:
I. Collect and describe the type of the data to be analyzed.

II. Explore the relation of the data to the underlying population, (iii)
establish and understand how the collected sample data will be
used to draw conclusion about the population, (iv) perform data
analysis and create different descriptive and predictive analytics
models that can be used to predict the future outcomes, and (v)
prove the validity of the models.
In its basic form, statistical analysis comprises descriptive statistics and

inferential statistics.
Descriptive statistics is the process of describing data using charts
and graphs (simple to more advanced). Since companies collect massive
266
Figure 10.2 Functions of BI and analytics in different areas
amounts of data, the conventional methods of graphical techniques are

replaced by software specially designed for big data. The big data software,
such as Tableau, is capable of handling massive amounts of data and creat-
ing visuals and dashboards that can display the multiple views of business
data in one graph.
Inferential statistics is another form of statistics. It is the process
of drawing conclusions or making inferences about a population using
sample data. A number of inferential statistics tools are used in analytics.
Estimation theory, confidence intervals, hypothesis testing, and analy-
sis of variance are a few of the many inferential statistics tools used in
BA. Besides the tools of descriptive and inferential statistics, analytics re-
quires an understanding of probability theory and sampling and sampling
distributions.
Within the framework of BA and BI, statistical analysis involves infer-
ential statistics that uses sample data to draw conclusion about the popu-
lation. In statistical analysis, sample is part of a population. Sampling is a
systematic way of selecting a few items from the population. The popula-
tion constitutes the entire data or measurement possible. It is often not
possible to study the entire population, so much of the statistical analysis
depends on drawing sample from the population. Statistics and statisti-
cal analysis, in general, allow us to study the variation in data and allow
us to draw inference (conclusion) about a population using sample data.
Statistics is the science and mathematics of variation. Statistics is about
making decisions from the data.
Statistical analysis also involves identifying trends and patterns in the
data to create models that can be used in predictive analytics. These predic-
tive analytics models are used for predicting future business outcomes.
The other component of statistical analysis is data analytics. Statistical
analysis and data analytics have somewhat similar approaches, except that
data analytics goes beyond statistical analysis that includes more elaborate
and extensive applications. The data analytics is explained below.
Data Analytics
Data analytics is the process of exploring and investigating a company’s
data to find patterns and relationships in data and applying specialized
software. Data analytics makes use of statistical techniques including pre-

dictive modeling algorithms to predict business outcomes. These tech-
niques involve testing hypotheses to determine whether hypotheses about
a data set are consistent with the stated hypothesis. Data analytics can
be thought of as exploratory data analysis (EDA) and confirmatory data
analysis that involve the process of data investigation and drawing valid
conclusion(s). The term EDA was coined by statistician John W. Tukey in
his 1977 book Exploratory Data Analysis.
The terms data analytics, BA, and BI are used interchangeably as the
tools used in all these overlap. In a broad sense, data analytics is another
approach of exploring and analyzing data using BI, reporting, and online
analytical processing (OLAP) as well as BA and advanced analytics
tools. Thus, data analytics is also used as an umbrella term that refers to
analyzing data and big data with a much broader scope. In the analytics
literature, the distinction between data analytics, BI, BA, and advanced
analytics is not clear. In some cases, data analytics specifically means data
analysis using BA and advanced analytics tools while BI is treated as a
separate category. In the earlier chapters of this book, we have tried to
explain the tools and applications of each of these terms and tried to out-
line the differences and similarities among the three—data analytics, BA,
and BI. It is important to note that the overall objective of all these tools
is to be able to manage, understand, and analyze the massive amounts of
data (referred to as big data) using the tools and technologies in making
fact-based data-driven business decisions. The tools in all these areas are
critical in visualizing data, extracting business trends, studying the cur-
rent trends, predicting the future business outcomes, optimizing business
processes, and using the resources in the most effective way.
All the data analytics and BA initiatives can help businesses increase
revenues and profitability, improve operational efficiency, optimize mar-
keting campaigns and respond quickly to changing market trends, im-
prove customer service efforts, address the needs and requirements of
customers, gain a competitive edge over the competition, and increase
market share. The ultimate goal is to improve business performance by
making data-driven decisions. With the advancement in technology, new
software and computer applications are being devised to collect and ana-
lyze real-time business data, thus enabling real-time analytics.
Types of Data Analytics Applications

Data analytics applications can be categorized into quantitative data anal-
ysis and qualitative data analysis. These involve the analysis of qualitative
or categorical data and quantitative data. As the name suggests, quanti-
tative analysis involves the analysis of numerical data that are quantifi-
able and can be compared statistically. The qualitative or categorical data
analysis is more interpretive—it focuses on understanding non-numerical
data and may involve text, audio and video, images, and interpreting
common phrases. A number of applications including text mining and
text analytics are now in use to analyze and interpret qualitative data.
Another widely used application of data analytics is BI reporting and
OLAP that provides business executives and other corporate workers and
stakeholders the data regarding key performance indicators, business opera-
tions, and customer information that are valuable decision-making and
business process management tools.
One of the major applications of data analytics is the use of data que-
ries to generate reports by BI developers. More recent application is the
use of self-service BI tools that enable managers, business analysts, and
operations managers to run their own ad hoc queries and build reports
themselves.
Business Analytics Models
BA models are divided into:

Descriptive Analytics
Prescriptive Analytics
Advanced Analytics Models
These models are discussed in detail in Chapters 1 and 2. Figure 10.3
outlines the tools and models of BA.
The different types of analytics are briefly explained here.
I. A number of graphical and visual techniques, big data applications,

and dashboards are used to visualize a company’s data to learn
about the current state of business. This phase is the data visualiza-
tion part of BA and is known as descriptive analytics.
270
Figure 10.3 Descriptive, predictive, and prescriptive analytics models
II. Information from data visualization is used to model and predict

the future business outcomes applying a number of prediction tech-
niques including regression and modeling, time series analysis and
forecasting, and data mining techniques to extract useful informa-
tion from huge databases (known as the knowledge discovery in da-
tabases), and, more recently, application of machine learning (ML)
and artificial intelligence (AI) techniques is becoming a big part of
analytics. This part of analytics is known as predictive analytics.
Most of the descriptive and predictive analytics techniques discussed

above use statistical analysis and statistical methods. The predictive ana-
lytics techniques mostly use statistical models and algorithms to predict
future business trends. These statistical techniques include regression,
time series analysis and forecasting, data mining, ML and AI techniques,
and also advanced analytics techniques like cluster and classification al-
gorithms in different applications such as marketing analytics. Specific
chapters in the book are devoted to these models.
Figures 10.4 and 10.5 summarize models of predictive analytics. An
important class of predictive analytics models now includes ML, neural
networks, deep learning, and AI. These are shown in Figure 10.5.
Figure 10.4 Predictive analytics models

Figure 10.5 Predictive analytics models including ML, neural

networks, deep learning, and AI
The applications of predictive modeling as outlined in Figure 10.5

use DSSs that include expert systems, AI, ML, and deep learning. These are
briefly described here.
Artificial Intelligence, Machine Learning,

and Deep Learning
AI can be described as the theory and development of computer systems
able to perform tasks normally requiring human intelligence, for exam-
ple, visual perception, speech recognition, language processing, decision
making, and translation between languages. First coined in 1956 by John
McCarthy, AI involves machines that can perform tasks that are char-
acteristic of human intelligence.
AI systems are classified as (i) weak AI or narrow AI. These sys-
tems are designed to perform a narrow task, the phenomenon that ma-
chines are not too intelligent to do their own work. In this type of AI
system, the system works based on the information and rules fed to
the system. Apple’s SIRI is an example. Google’s AI system is also an
example of a narrow AI that is stronger than SIRI. Another example

of this category is a poker game where a machine can play and beat
humans based on all rules and moves fed into the machine. (ii) Strong
AI systems are the machines that can actually think and perform tasks
on its own like human beings. There is no existing application that can
be described as a true strong AI. This is an active area of research where
these systems are evolving rapidly and getting close to building strong
AI systems.
Machine Learning
AI and ML are sometimes used synonymously, but there is a difference
between the two. ML is simply a way of achieving AI.
AI can be achieved without using ML, in which the AI system would
require specific program with millions of lines of codes with complex
rules and decision trees. Alternatively, ML algorithms can be developed.
These are a way of “training” an algorithm so that it can learn how. The
“training” requires feeding huge amounts of data to the algorithm and al-
lowing it to adjust, learn, and improve. One of the most successful appli-
cations of ML is in the area of computer vision—the ability of a machine
to recognize an object in an image or video.
Deep Learning
Deep learning is a class of ML algorithm and is one of many approaches
to ML. Most deep learning models are based on an artificial neural
network and are inspired by the structure and function of the brain
or neurons in the brain. The deep learning applications are commonly
referred to as artificial neural networks (ANNs). The term deep refers
to the number of layers through which the data are transformed. The
reported applications of deep learning include computer vision, speech
recognition, natural language processing, social network filtering, bioin-
formatics, drug design, medical image processing, material inspection,
and more. The research in this area is promising, and the results pro-
duced in different applications are comparable to and, in some cases,
superior to human experts.9
Background and Prerequisites to Predictive Analytics

The application and implementation of predictive analytics models re-
quire the necessary background and understanding of several of the sta-
tistical concepts. These include probability and probability distributions,
estimation theory and confidence intervals, sampling theory, hypothesis
testing, and covariance analysis. These are shown in Figure 10.6 and are
explained in the Appendix. The readers may refer to Appendix A through
D for the explanation of these topics.
Optimization Models for Business Analytics Prescriptive Analytics
BA also involves using a number of optimization, simulation, operations

management, and business process optimization techniques to optimize
business performances. This is the prescriptive analytics phase of BA. The
models under prescriptive analytics are also known as advanced analyt-
ics models. Figure 10.7 shows the most common optimization and other
models. This text mainly focuses on predictive analytics; therefore, pre-
scriptive models are not discussed in detail.
In this chapter, we have provided an overview. Separate chapters in
the book discuss these concepts in detail. The first three chapters of the
book are devoted to the basic concepts and models used in analytics,
BA, and BI. In Chapter 4, we discussed the descriptive analytics along
with applications and a case. We explained the objectives of descriptive
analytics and how it leads to predictive analytics. Chapter 5 draws a dis-
tinction between descriptive and predictive analytics and the background
and prerequisites needed to apply predictive analytics models. These in-
clude probability and probability distributions, sampling and sampling
distribution, tools of inferential statistics—estimation and confidence in-
tervals, hypothesis testing, and correlation analysis. The applications, ex-
amples, and importance of all these concepts are presented in this chapter.
A detailed treatment of all these concepts is provided in Appendixes A–D.
The appendixes are available for a free download to the readers.
Chapter 6 provides an overview of the most widely used predictive
analytics models. Each model is discussed along with their purpose, tools,
and applications. The brief explanation in this chapter of each model pro-
vides the reader the importance and purpose behind each model to follow.
Figure 10.6 Background and prerequisites to predictive analytics
275
Figure 10.7 Prescriptive analytics models

In Chapters 7 and 8, we discussed the most widely used predictive ana-

lytics model in detail. These include regression analysis and different regres-
sion models with applications and examples. Chapter 8 provides another
class of predictive models—time series analysis and forecasting. Chapters 7
and 8 contain a number of different models with examples. These are very
widely used models in predictive analytics. Chapter 9 provides an introduc-
tion to data mining—an important part of data analytics, data analysis,
and predictive modeling. The purpose and importance of data mining in
predictive modeling are explained. We also introduce the recent applica-
tions of ML, AI, deep learning, and neural networks models. These are
now an integral part of predictive analytics and are finding applications in a
number of areas. Finally, the prescriptive analytics is introduced along with
models they use and the purpose of prescriptive analytics in the overall BA.
Future of Data Analytics and Business Analytics

Job Outlook
• Demand for skilled data scientists continues to be sky-high, with
IBM recently predicting that there will be a 28% increase in the
number of data scientists employed in the next 2 years.
• According to the U.S. Bureau of Labor Statistics (BLS) data, em-
ployment of management analysts—including business analysts—
is expected to grow 14 percent from 2014 to 2024, which is much
faster than the average for all occupations.
• The BLS reports for May 2016 showed that the average annual
income for all management analysts, including business analysts,
was $91,910. The middle 50 percent earned between $60,950 and
$109,170. Salaries for the lowest 10 percent were around $46,560,
while the highest 10 percent brought in upward of $149,720.
Here are some other facts:
• The amount of data doubles every 3 years as various digital sources

continue to make information available (Source: McKinsey &
Company).
• A significant shortage of managers and analysts who can effectively

use big data analytics and analytical concepts to make decisions is
predicted for 2018 (Source: McKinsey & Company).
• Three-quarters of companies are missing the skills and technology
to make the best use of the data they collect (Source: PWC).
• Businesses in all industries are beginning to capitalize on the vast
increase in data and the new big data technologies becoming avail-
able for analyzing and gaining value from it. This makes it a great
prospect for anyone looking for a well-paid career in an exciting
and cutting-edge field.
• Analytics, ML, AI, and expert systems applications and research
are not for just those following a traditional academic path. A
number of industries including Google, Amazon, IBM, and others
are highly invested in big data analytics, AI, ML, and deep leaning
research and applications.
Certification and Online Courses in Business Analytics

There are also a large number of free online courses and tutorials which
a motivated individual could use as a springboard into a rewarding and
lucrative career (please follow the link below).
https://www.forbes.com/sites/bernardmarr/2017/06/06/the-9-best-
free-online-big-data-and-data-science-courses/#6403190343cd
Foundations in Business Analytics — University of Maryland
Business Analytics Certificate — Cornell University
Master Certificate in Business Aanalytics — Michigan State University
The above are listed as the Best Online Business Analytics Certificates &
Courses [Updated 2018]
Summary
In this chapter, we provided an overview of the field of analytics. The
broad area of analytics can be divided into two broad categories: BI and
statistical analysis.
BI evolved as the application of data analysis techniques to support

business decision-making processes. The tools of BI come from ear-
lier DSSs. These computer-based models are important decision-making
tools and can provide different alternatives to aid in the decision-making
process. The second broad area of analytics is statistical analysis. The field
of analytics is about driving business decisions using data. Therefore, sta-
tistical analysis is at the core of BA. A number of statistical techniques
and models—from descriptive and data visualization tools to analytics
models—are applied for drawing meaningful conclusions from the data.
Statistical analysis involves performing data analysis and creating sta-
tistical models and can be broken down into (i) data analytics, (ii) BA,
and (iii) advanced analytics.
Data analytics is the process of exploring and investigating a compa-
ny’s data to find patterns and relationships in data and applying specialized
software to learn the current state of business through data visualization,
predict future business outcomes, and optimizing business process. Thus,
data analytics is exploring and analyzing data using BI, reporting, and
OLAP as well as BA and advanced analytics tools. Data analytics is also
used as an umbrella term that refers to analyzing data and big data with
a much broader scope.
The BA area is divided into: (i) descriptive analytics, (ii) predictive
analytics, (iii) prescriptive analytics, and (iv) advanced analytics models.
These involve a number of models and tools that were briefly described
throughout the book.
Most of the descriptive and predictive analytics techniques use statis-
tical analysis and statistical methods. The focus of this book is predictive
analytics. Predictive analytics techniques use mostly statistical models and
algorithms to predict future business trends. These statistical techniques
include regression, time series analysis and forecasting, data mining, ML,
and AI techniques and also advanced analytics techniques like cluster and
classification algorithms in different applications such as marketing ana-
lytics. Specific chapters in the book are devoted to these models. These are
described in different chapters of the book.
BA initiatives can help businesses increase revenues and profitabil-
ity, improve operational efficiency, optimize marketing campaigns and
respond quickly changing market trends, improve customer service ef-

forts, address the needs and requirements of customers, gain a competitive
edge over the competition, and increase market share. The ultimate goal
is to improve the business performance by making data-driven decisions.
Finally, we explored the future and job outlook of this emerging field
of BA. This is one of the fastest growing areas that is predicted to have
immense job growth and opportunities. According to the BLS, employ-
ment of management analysts—including business analysts—is expected
to grow 14 percent from 2014 to 2024, which is much faster than the
average for all occupations. A number of universities and agencies are
now offering courses, graduate degrees, and certifications in BA. The op-
portunities were listed in this chapter.
APPENDICES
Background and
Prerequisite for Predictive
Analytics
APPENDIX A: Probability Concepts: Role of Probability and Prob-

ability Distributions in Decision Making
APPENDIX B: Sampling, Sampling Distribution, and Inference

Procedure
APPENDIX C: Review of Estimation, Confidence Intervals, and

Hypothesis Testing
APPENDIX D: Hypothesis Testing for One and Two Population

Parameters
APPENDIX A
Probability Concepts:
Role of Probability in
Decision Making
Review of Probability: Theory and Formulas

Some important terms of probability theory.
Event: An event is one or more possible outcomes of an experiment.
Experiment: An experiment is any process that produces an outcome or

observation. For example, throwing a die is a simple experiment and get-
ting a number (1 or 2…, or 6) on the top face is an event. Similarly, toss-
ing a coin is an experiment and getting a head (H) or a tail (T) is an event.
In probability theory, we use the term experiment in a very broad sense.

We are interested in an experiment whose outcome cannot be predicted
in advance.
Sample Space: The set of all possible outcomes of an experiment is called

the sample space and is denoted by the letter S.
The probability of an event A is denoted as P(A), which means the
“probability that event A occurs.” That probability is between 0 and 1. In
other words,
0 ≤ P ( A) ≤ 1
Mutually Exclusive Events: When the occurrence of one event excludes

the possibility of the occurrence of another event, the events are mutually
exclusive. In other words, one and only one event can take place at a time.
284 APPENDIX A
Exhaustive Events: The total number of possible outcomes in any trial

is known as exhaustive events. For example, in a roll of two dice, the ex-
haustive number of events or the total number of outcomes is 36. If three
coins are tossed at the same time, the total number of outcomes is 8 (try
to list these outcomes).
Equally Likely Events: All the events have an equal chance of occurrence
or there is no reason to expect one in preference to the other.
Counting Rules in Probability
1. Multiple-Step Experiment or Filling Slots

Suppose an experiment can be described as a sequence of k steps in
which
n1 = the number of possible outcomes in the first step
n2 = the number of possible outcomes in the second step
. . .
nk = the number of possible outcomes in the kth step,

then the total number of possible outcomes is given by
(n1 )(n2 )(n3 )...(nk )
2. Permutations
The number of ways of selecting n distinct objects from a group of N
objects—where the order of selection is important—is known as the
number of permutations on N objects using n at a time and is written as
N!
PnN = = (n )(n − 1)...(n − k + 1)
( N − n )!
3. Combinations
Combination is selecting n objects from a total of N objects. The
order of selection is not important in combination. This disregard
of arrangement makes the combination different from the permuta-
tion. In general, an experiment will have more permutations than
combinations.
APPENDIX A 285
The number of combinations of N objects taken n at a time is given by
N N!
C nN =   = Note 0! = 1 by definition.
n  n !( N − n )!
Assigning Probabilities
There are two basic rules for probability assignment.
1. The probability of an event A is written as P(A) and it must be

between 0 and 1. That is,
0 ≤ P ( A ) ≤ 1.0
2. If an experiment results in n number of outcomes A1, A2..., An, then

the sum of the probabilities for all the experimental outcomes must
equal 1. That is,
P ( A 1 ) + P ( A2 ) + P ( A3 ) + ... + P ( An ) = 1
Methods of Assigning Probabilities
There are three methods for assigning probabilities
1. Classical Method
2. Relative Frequency Approach
3. Subjective Approach
1. Classical Method
The classical approach of probability is defined as the favorable number
of outcomes divided by the total number of possible outcomes. Suppose
an experiment has n number of possible outcomes and the event A occurs
in m of the n outcomes, then the probability that event A will occur is
m
P ( A) =
n
286 APPENDIX A
Note that P(A) denotes the probability of occurrence for event A.

The probability that event A will not occur is given by P ( A ) , which
is read as P (not A) or “A complement.” Thus,
P ( A) + P ( A ) = 1
which means that the probability that event A will occur plus the
probability that event A will not occur must be equal to 1.
2. Relative Frequency Approach
Probabilities are also calculated using relative frequency. In many
problems, we define probability by relative frequency.
3. Subjective Probability
Subjective probability is used when the events occur only once or
very few times and when little or no relevant data are available. In as-
signing subjective probability, we may use any information available,
such as our experience, intuition, or expert opinion. In this case the
experimental outcomes may not be clear and relative frequency of
occurrence may not be available. Subjective probability is a measure
of our belief that a particular event will occur. This belief is based on
any information that is available to determine the probability.
Addition Law for Mutually Exclusive Events
If we have two events A and B that are mutually exclusive, then the
probability that A or B will occur is given by
P ( A ∪ B ) = P ( A ) + P (B )
Note that the “union” sign is used for “or” probability, that is, P ( A ∪ B ) .
This is the same as P (A or B). This rule can be extended to three or more
mutually exclusive events. If three events A, B, and C are mutually exclu-
sive, then the probability that A or B or C will happen can be given by
P ( A ∪ B ∪ C ) = P ( A ) + P (B ) + P (C )
APPENDIX A 287
Addition Law for Non-Mutually Exclusive Events
The occurrence of two events that are non-mutually exclusive means that
they can occur together. If the events A and B are non-mutually exclusive,
the probability that A or B will occur is given by
P ( A ∪ B ) = P ( A ) + P (B ) − P ( A and B )
or P ( A ∪ B ) = P ( A ) + P (B ) − P ( A ∩ B )
If events A, B, and C are non-mutually exclusive, then the probability that

A or B or C will occur is given by
P ( A ∪ B ∪ C ) = P ( A ) + P (B ) + P (C ) − P ( A and B ) − P ( A and C ) − P (B and C

B ) + P (C ) − P ( A and B ) − P ( A and
C ) − P (B and C ) + P ( A and B and C )
or
P ( A ∪ B ∪ C ) = P ( A ) + P (B ) + P (C ) − P ( A ∩ B ) − P ( A ∩ C ) − P (B ∩ C ) + P
A ) + P (B ) + P (C ) − P ( A ∩ B ) − P ( A ∩
C ) − P (B ∩ C ) + P ( A ∩ B ∩ C )
Probabilities of Equally Likely Events
Equally Likely Events are those that have an equal chance of occurrence
or those where there is no reason to expect one in preference to the other.
In many experiments it is natural to assume that each outcome in the
sample space is equally likely. Suppose that the sample space S consists
of k outcomes, where each outcome is equally likely to occur. The k out-
comes of the sample space can be denoted by S = {1,2,3,...,k} and
P(1) = P(2) = ... = P(k)
That is, each outcome has an equal probability of occurrence.

The above implies that the probability of any event B is
Number of outcomes in S that are in A

P (B ) =
Total no. of outcomes (k )
288 APPENDIX A
Probabilities under Conditions of Statistical Independence
When two or more events occur, the occurrence of one event has no ef-
fect on the probability of occurrence of any other event. In this case, the
events are considered independent. There are three types of probabil-
ities under statistical independence:
Statistical Independence
1. Simple/Marginal 2. Joint Probability 3. Conditional Probability

or under Independence
Unconditional Probability
1. Simple probability is also known as marginal or unconditional and

is the probability of occurrence of a single event, say A, and is de-
noted by P (A).
P(A) = marginal probability of event A
P(B) = marginal probability of event B
2. Joint Probability under Statistical Independence
Joint probability is the probability of occurrence of two or more
events together or in succession. It is also known as “and” probabil-
ity. Suppose we have two events, A and B, which are independent.
Then the joint probability, P(AB), which is the probability of occur-
rence of both A “and” B, is given by
P ( AB ) = P ( A )P (B ) or
P ( A ∩ B ) = P ( A )P ( B )
The probability of two independent events occurring together or in

succession is the product of their marginal or simple probabilities.
Note that P(AB) = probability of event A and B occurring together,
which is known as joint probability. P(AB) is the same as P(A and B)
or P ( A ∩ B ) .
APPENDIX A 289
Events A and B are independent and can be extended to more

than two events. For example,
P ( ABC ) = P ( A ∩ B ∩ C ) = P ( A ).P (B ).P (C )
is the probability of three independent events, A, B, and C, which

is calculated by taking the product of their marginal or simple
probabilities.
3. Conditional Probability under Statistical Independence
The conditional probability is written as
P( A B)
and is read as the probability of event A, given that B has occurred,

or the probability of A, given B. If the two events A and B are in-
dependent, then
P ( A B ) = P ( A)
This means that if the events are independent, the probabilities are
not affected by the occurrence of each other. The probability of oc-
currence of B has no effect on the occurrence of A. That is, the condi-
tion has no meaning if the events are independent.
Statistical dependence
When two or more events occur, the occurrence of one event has an effect
on the probability of the occurrence of any other event. In this case, the
events are considered to be dependent.
There are three types of probabilities under statistical dependence.
Statistical Dependence
1. Conditional Probability 2. Joint Probability 3. Marginal Probability

290 APPENDIX A
1. Conditional Probability under Statistical Dependence
P ( A ∩ B ) P ( A and B )
P( A B) = =
P (B ) P (B )
2. Joint Probability under Statistical Dependence
P ( A ∩ B ) = P ( A B )P ( B )
or P ( A and B ) = P ( A B )P (B )
Similarly,
P (B ∩ A ) = P (B A )P ( A ) or
P (B and A ) = P (B A )P ( A )
3. Marginal Probability under Statistical Dependence

The marginal probability under statistical dependence is explained
using the joint probability table below.
Dotted (D) Striped (S) Total

Red (R) 0.30 0.10 0.40
Green (G) 0.20 0.40 0.60
Total 0.50 0.50 1.00
From the above table,
P(R) = P (D and R) + P (S and R) or
P ( R ) = P ( D R )P ( R ) + P (S R )P ( R )
APPENDIX A 291
Bayes’ Theorem
P ( Ai )P ( D Ai )
P ( Ai D ) =
P ( A1 )P ( D A1 ) + P ( A2 )P ( D A2 ) + ... + P ( An )P ( D An )
The above equation can be used to compute any posterior probability

P ( Ai D ) when prior probabilities P ( A1 ), P ( A2 ),..., P ( An ) and condi-
tional probabilities P ( D A1 ), P ( D A2 ),..., P ( D An ) are known.
Role of Probability Distributions in Decision Making

Probability Distributions
The graphical and numerical techniques discussed and used in descriptive

analytics (the first volume of this book) are very helpful in getting insight
and describing the sample data. These methods help us draw conclusions
about the process from which data are collected.
In this section, we will study several of the discrete and continuous prob-
ability distributions and their properties, which are a critical part of data
analysis and predictive modeling. A good knowledge and understanding of
probability distributions helps an analyst to apply these distributions in data
analysis. The probability distributions are an essential part of decision-making
process. The application of key predictive analytics models including all types
of regression, nonlinear regression and modeling, forecasting, data mining,
and computer simulation all use probability and probability distributions.
Here we discuss some of the important probability distributions that
are used in analyzing and assessing the regression and predictive models.
The knowledge and understanding of two important distributions—the
normal and t-distributions—are critical in analyzing and checking the ade-
quacy of the regression models. Before we discuss these distributions in de-
tail, we will provide some background information about the distributions.
Probability Distribution and Random Variables
The probability distribution is a model that relates the value of a random

variable with the probability of occurrence of that value.
292 APPENDIX A
A random variable is a numerical value that is unknown and may re-

sult from a random experiment. The numerical value is a variable and the
value achieved is subject to chance and is, therefore, determined randomly.
Thus, a random variable is a numerical quantity whose value is determined by
chance. Note that a random variable must be a numerical quantity.
Types of random variable: Two basic types of random variables are
discrete and continuous variables, which can be described by discrete and
continuous probability distributions.
Discrete Random Variable
A random variable that can assume only integer value or whole number
is known as discrete. An example would be the number of customers ar-
riving at a bank. Another example of a discrete random variable would be
rolling two dice and observing the sum of the numbers on the top faces.
In this case, the results are 2 through 12. Also, note that each outcome
is a whole number or a discrete quantity. The random variable can be
described by a discrete probability distribution.
Table A.1 shows the discrete probability distribution (in a table form)
of rolling two dice and observing the sum of the numbers. In rolling two
dice and observing the sum of the numbers on the top faces, the outcome
is denoted by x which is the random variable that denotes the sum of
the numbers.
Table A.1
X 2 3 4 5 6 7 8 9 10 11 12
P(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
The outcome X (which is the sum of the numbers on the top faces)
takes on different values each time the pair of dice is rolled. On each trial,
the sum of the numbers is going to be a number between 2 and 12 but
we cannot predict the sum with certainty in advance. In other words,
the outcomes or the occurrence of these numbers is a chance factor. The
probability distribution is the outcomes Xi, and the probabilities for these
outcomes P(Xi). The probability of each outcome of this experiment
can be found by listing the sample space of all 36 outcomes. These can
be shown both in a tabular and in a graphical form. Figure A.1 shows the
APPENDIX A 293
probability distribution graphically. The following are the two require-

ments for the probability distribution of a discrete random variable:
1. P(x) is between 0 and 1 (both inclusive) for each x, and

2. ∑ P ( x ) = 1.0
Figure A.1 Probability Distribution of Rolling Two Dice
In summary, the relationship between the values of a random variable and

their probabilities is summarized by a probability distribution. A probability
distribution of a random variable is described by the set of possible ran-
dom variables’ values and their probabilities. The probability distribution
provides the probability for each possible value or outcome of a random vari-
able. A probability distribution may also be viewed as the shape of the
distribution. The foundation of probability distributions is the laws of
probability. Note that most of the phenomenon in real-world situation
is random in nature. In a production situation, finding the number of
defective product might be seen as a random variable because it takes on
different values according to some random mechanism.
294 APPENDIX A
Expected Value, Variance, and Standard Deviation of a Discrete

Distribution
The mean or the expected value of a discrete random variable denoted by

µ , or E ( x ) is the average value observed over a long period. The variance
and standard deviation σ 2 and σ , respectively, are the measures of varia-
tion of the random variable.
In this section, we will demonstrate how to calculate the expected
value, variance, and standard deviation for a discrete probability distribu-
tion. We will use the concept of the mean or expected value and variance
in the next section.
Background: The mean for a discrete random variable is defined math-

ematically as the expected value and is written as:
µ x = E ( X ) = ∑ xi P ( xi )
The variance of a discrete random variable is defined as:

σ2 = ∑ ( xi − µ )2 P ( xi ) (A)

σ2 = ∑ x 2 P ( x ) − µ 2 (B)
The standard deviation, σ = σ2
Example A.1
Table A.2 shows the number of cars sold over the past 500 days for a par-
ticular car dealership in a certain city.
[a] Calculate the relative frequency

The relative frequencies are shown in column (3) of Table A.2. Note that
the relative frequency distribution is calculated by dividing the frequency
of the class by the total frequency, which is also the probability, P(x).
APPENDIX A 295
Table A.2: Number of Cars Sold

(3)
(1) (2) Relative Frequency,
No. of Cars Sold (xi) Frequency (fi) P(xi)
0 40 40/500 = 0.08
1 100 0.200
2 142 0.284
3 66 0.132
4 36 0.072
5 30 0.060
6 26 0.052
7 20 0.040
8 16 0.032
9 14 0.028
10 8 0.016
11 2 0.004
Total 500 1.00
[b] Calculate the expected value or the mean number of cars sold
The expected value is given by:
µx = E (x ) = ∑ xi P ( xi )
or
E ( x ) = (0)(0.08) + (1)(0.200) + (2)(0.284) + (3)(0.132) + (4)(0.072)

+ (5)(0.060) + (6)(0.052) + (7 )(0.040) + (8)(0.032) + (9)(0.028) + (10)(0.016)
+ (11)(0.004) = 3.056
or E ( x ) = 3.056
[c] Calculate the variance and the standard deviation

The variance for this discrete distribution is given by
σ2 = ∑ ( x − µ )2 P ( x )
σ 2 = (0 − 3.056)2 (0.08) + (1 − 3.056)2 (0.200) + (2 − 3.056)2 (0.284)
+ (3 − 3.056)2 (0.132) + (4 − 3.056)2 (0.072) + (5 − 3.056)2 (0.060)
+ (6 − 3.056)2 (0.052) + (7 − 3.056)2 (0.040) + (8 − 3.056)2 (0.032)
+ (9 − 3.056)2 (0.028) + (10 − 3.056)2 (0.016) + (11 − 3.056)2 (0.004)
= 6.071296
296 APPENDIX A
The variance can be more easily calculated using equation (2.5) with (B).
The standard deviation for this discrete distribution is
σ = σ2 = 6.071296 = 2.46
[d] Find the probability of selling less than four cars
P ( x < 4) = P ( x = 0) + P ( x = 1) + P ( x = 2) + P ( x = 3)
= 0.08 + 0.200 + 0.284 + 0.132
= 0.696
These probability values are obtained from Table A.2 column (3).
[e] Find the probability of selling at most four cars
P ( x ≤ 4 ) = P ( x = 0 ) + P ( x = 1) + P ( x = 2 ) + P ( x = 3) + P ( x = 4 )
= 0.08 + 0.200 + 0.284 + 0.132 + 0.072
= 0.768
[f] What is the probability of selling at least four cars?
The above probability can also be calculated as
P ( x ≥ 4) = 1 − P ( X < 4)
= 1 − [ P ( X = 0) + P ( X = 1) + P ( X = 2) + P ( X = 3)]
= 1 − [0.08 + 0.200 + 0.284 + 0.132]
= 0.304
Continuous Random Variables
The random variable that might assume any value over a continuous
range of possibilities is known as continuous random variables. Some
examples of continuous variables are physical measurements of length,
volume, temperature, or time. These variables can be described using con-
tinuous distributions.
The continuous probability distribution is usually described using
a probability density function. The probability density function, f(x), de-
scribes the behavior of a random variable. It may be viewed as the shape
of the data. Figure A.2 shows the histogram of the diameter of a machined
APPENDIX A 297
parts with a fitted curve. It is clear that the diameter can be approximated
by certain patterns that can be described by a probability distribution.
The shape of the curve in Figure A.2 can be described by a mathemat-
ical function, f ( x ) , or a probability density function. The area below the
probability density function to the left of a given value, x, is equal to the
probability of the random variable (the diameter in this case) shown on
the x-axis. The probability density function represents the entire sample
space; therefore, the area under the probability density function must
equal one.
Figure A.2 Diameter of Machined Parts
The probability density function, f(x), must be positive for all values of x
(as negative probabilities are impossible). Stating these two requirements
mathematically,
∫ f (x ) = 1
−∞
298 APPENDIX A
and f ( x ) > 0 for continuous distributions. For discrete distributions,

the two conditions can be written as
n
∑ f ( x ) = 1.0 and f ( x ) > 0 .
i =1
To demonstrate how the probability density function is used to compute

probabilities, consider Figure A.3. The shape in the figure can be well
approximated by a normal distribution. Assuming a normal distribution,
we would like to find the probability of a diameter below 40 mm. The
area of the shaded region represents the probability of a diameter, drawn
randomly from the population having a diameter less than 40 mm.
This probability is 0.307 or 30.7 percent using a normal probability
density function. Figure A.4 shows the probability of the diameter
of one randomly selected machined part having a diameter ≥ 50 mm
but ≤ 55 mm. Here we discuss some of the important continuous and
discrete distributions.
Some Important Continuous Distributions

In this section we will discuss the continuous probability distributions.
When the values of random variables are not countable but involve con-
tinuous measurement, the variables are known as continuous random
variables. Continuous random variables can assume any value over a
specified range. Some examples of continuous random variables are:
• The length of time to assemble an electronic appliance

• The life span of a satellite power source
• Fuel consumption in miles-per-gallon of new model of a car
• The inside diameter of a manufactured cylinder
• The amount of beverage in a 16-ounce can
• The waiting time of patients at an outpatient clinic
APPENDIX A 299
Figure A.3: An Example of Calculating Probability using Probability

Density
Figure A.4: Another Example of Calculating Probability Using

Probability Density
300 APPENDIX A
In all the above cases, each phenomenon can be described by a random

variable. The variable could be any value within a certain range and is not
a discrete whole number. The graph of a continuous random variable x is
a smooth curve. This curve is a function of x, denoted by f(x), and is com-
monly known as a probability density function. The probability density
function is a mathematical expression that defines the distribution of the
values of the continuous random variable. Figure A.5 shows examples of
three continuous distributions.
One of the most widely used and important distributions of our inter-
est is the Normal Distribution. The other distributions of importance are
the t-distribution and F-distribution. We discuss all of these here.
Figure A.5 Examples of Three Continuous Distributions
The Normal Distribution

Background: A continuous random variable X is said to follow a normal
distribution with parameters µ and σ and the probability density func-
tion of X is given by:
APPENDIX A 301
1
f ( x) = e − ( x − µ ) / 2σ
2 2
σ 2π
where f (x) is the probability density function, µ the mean, σ the standard
deviation, and e = 2.71828, which denotes the base of the natural loga-
rithm. The distribution has the following properties:
1. The normal curve is a bell-shaped curve. It is symmetrical about
the line x = µ. The mean, median, and mode of the distribution have the
same value.
2. The parameters of normal distribution are the mean µ and stan-
dard deviation σ. The interpretation of how the mean and standard devi-
ation are related in a normal curve is shown in Figure A.6.
Figure A.6 Areas under the Normal Curve
Figure A.6 states the area property of the normal curve. For a normal
curve, approximately 68 percent of the observations lie between the mean
and ±1σ (one standard deviation), approximately 95 percent of all obser-
vations lie between the mean and ±2σ (two standard deviations), and ap-
proximately 99.73 percent of all observations fall between the mean and
±3σ (three standard deviations). This is also known as the empirical rule.
302 APPENDIX A
The shape of the curve depends on the mean (µ) and standard devia-
tion (σ). The mean µ determines the location of the distribution, whereas
the standard deviation σ determines the spread of the distribution. Note
that larger the standard deviation (σ), more spread out is the curve (see
Figure A.7).
Figure A.7 Normal Curve with Different Values of Mean and

Standard Deviation
The Standard Normal Distribution

To calculate the normal probability, P ( x1 ≤ X ≤ x 2 ) where X is a nor-
mal variate with parameters µ and σ, we need to evaluate:
x2
1
∫σ e − ( x − µ ) / 2σ dx
2 2
(A)
x1
2π

To evaluate the above expression in (A), none of the standard integra-

tion techniques can be used. However, the expression can be numerically
evaluated for µ = 0 and σ = 1 . When the values of the mean µ and
standard deviation σ are 0 and 1, respectively, the normal distribution is
known as the standard normal distribution.
The normal distribution with µ = 0 and σ = 1 is called a stan-
dard normal distribution. Also, a random variable with standard normal
APPENDIX A 303
distribution is called a standard normal random variable and is usually

denoted by Z.
The probability density function of Z is given by
1
e − Z / 2σ
2 2
f (x ) = –∞ < z < ∞
σ 2π
The cumulative distribution function of Z is given by:
z
P(Z ≤ z ) = ∫ f ( y ) dy
−∞
which is usually denoted by Φ (z).

When the random variable X is normally distributed with mean µ and
variance σ2, that is, x : N ( µ , σ 2 ) , we can calculate the probabilities in-
volving x by standardizing. The standardized value is known as the standard
or standardized normal distribution and is given using expression (B):
As indicated above, if x is normally distributed with mean µ and stan-
dard deviation σ, then
x−µ
z = (B)
σ
is a standard normal random variable where,

z = distance from the mean to the point of interest (x) in terms of
standard deviation units
x = point of interest
µ = the mean of the distribution, and
σ = the standard deviation of the distribution.
Finding Normal Probability by Calculating Z-Values and using the

Standard Normal Table
Equation (B) above is a simple equation that can be used to evaluate the
probabilities involving normal distribution.
304 APPENDIX A
Example A.2
The inside diameter of a piston ring is normally distributed with a mean

of 5.07 cm and a standard deviation of 0.07 cm. What is the probability
of obtaining a diameter exceeding 5.15 cm?
The required probability is the shaded area shown in Figure A.8. To
determine the shaded area, we first find the area between 5.07 and 5.15
using the z-score formula and then subtract the area from 0.5. See the
calculations below.
x−µ
z =
σ
5.15 − 5.07
z = = 1.14 → 0.3729
0.07
Note: 0.3729 is the area corresponding to z = 1.14. This can be read from
the table of Normal Distribution provided in the Appendix. There are
many variations of this table. The normal table used here provides the
probabilities on the right side of the mean.
Figure A.8 Area Exceeding 5.15

APPENDIX A 305
The required probability is
p( x ≥ 5.15) = 0.5 − 0.3729 = 0.1271
or, there is 12.71 percent chance that piston ring diameter will exceed
5.15 cm.
Example A.3
The measurements on certain types of PVC pipes are normally distrib-

uted with a mean of 5.01 cm and a standard deviation of 0.03 cm. The
specification limits on the pipes are 5.0±0.05 cm. What percentage of the
pipes is not acceptable?
Figure A.9 Percent Acceptable (Shaded Area)
The percentage of acceptable pipes is the shaded area shown in Figure A.9.
The required area or the percentage of acceptable pipes is explained below.
x−µ 4.95 − 5.01

z1 = = = −2.0 ⇒ 0.4772
σ 0.03
x−µ 5.05 − 5.01

z2 = = = −1.33 ⇒ 0.4082
σ 0.03
306 APPENDIX A
The area 0.4772 is the area between the mean 5.01 and 4.95 (see Fig-
ure A.9). The area left of 4.95 is 0.5 – 0.4772 = 0.0228.
The area 0.4082 is the area between the mean 5.01 and 5.05. The area
right of 5.05 is 0.5 – 0.4082 = 0.0918.
Therefore, the percentage of pipes not acceptable = 0.0228 + 0.0918
= 0.1146 or 11.46 percent. These probabilities can also be calculated
using a statistical software.
Probability Plots
Probability plots are used to determine if a particular distribution fits
sample data. The plot allows us to determine whether a distribution is
appropriate and also to estimate the parameters of fitted distribution. The
probability plots are a good way of determining whether the given data
follow a normal or any other assumed distribution. In regression analysis,
this plot is of great value because of its usefulness in verifying one of the
major assumption of regressions—the normality assumption.
MINITAB and other statistical software provide options for creating
individual probability plots for the selected distribution for one or more
variables. The steps to probability plotting procedure are:
1. Hypothesize the distribution: select the assumed distribution that is

likely to fit the data
2. Order the observed data from smallest to largest. Call the observed
data x1, x2, x3,....., xn
3. Calculate the cumulative percentage points or the plotting position
(PP) for the sample of size n (i = 1,2,3,...,n) using the following:
(i − 0.5)100
PP =
n
4. Tabulate the xi values and the cumulative percentage (probability

values or PP). Depending on the distribution and the layout of the
paper, several variations of cumulative scale are used.
5. Plot the data using the graph paper for the selected distribution.
Draw the best fitting line through these points.
6. Draw your conclusion about the distribution.
APPENDIX A 307
MINITAB provides the plot based on the above steps. To test the hypoth-
esis, an Anderson-Darling (AD) goodness-of-fit statistic and associated
p-value can be used. These values are calculated and displayed on the plot.
If the assumed distribution fits the data:
• the plotted points will form a straight line (or approximate a

straight line)
• the plotted points will be close to the straight line
• the Anderson-Darling (AD) statistic will be small, and the p-value
will be larger than the selected significance level, α (commonly
used values are 0.05 and 0.10).
Example A.4 Probability Plot (1)
To demonstrate the probability plot, consider the length of measure-

ments of 15 PVC pipes from a manufacturing process. We want to use
the probability plot to check whether the data follow a normal distribu-
tion. The probability plot is shown in Figure A.10. From the plot we
can see that the cumulative percentage points approximately form a
straight line and the points are close to the straight line. The calculated
p-value is 0.508. At a 5 percent level of significance (α = 0.05), p-value
is greater than α so we cannot reject the null hypothesis that the data
follow a normal distribution. We conclude that the data follow a nor-
mal distribution.
Example A.5 Probability Plot (2)
A probability plot can be used in place of a histogram to determine the

process capability and can also be used to determine the distribution and
shape of the data. If the probability plot indicates that the distribution
is normal, the mean and standard deviation can be estimated from the
plot. Figure A.11 shows the histogram and probability plot for the length
measurements and failure time data. As can be noted, the histogram of
the length data is clearly normally distributed, whereas the histogram of
the failure time is not symmetrical and might follow an exponential dis-
tribution. The probability plots of both the length and the failure time are
plotted next to the histograms.
308 APPENDIX A
Figure A.10 Probability Plot of Length Data
From the probability plot of the length data (Figure A.11), we can see
that the cumulative percentage points approximately form a straight line
and the points are close to the straight line. The calculated p-value is 0.543.
At a 5 percent level of significance (α = 0.05), p-value is greater than α so
we cannot reject the null hypothesis that the data follow a normal distribu-
tion. We conclude that the data follow a normal distribution. The prob-
ability plot of failure time data shows that the cumulative percentage points
do not form a straight line. The plotted points show a curvilinear pattern.
The calculated p-value is less than 0.005. At a 5 percent level of significance
(α = 0.05), p-value is less than α so we reject the null hypothesis that the
data follow a normal distribution. The deviation of the plotted points from
a straight line is an indication that the failure time data do not follow a nor-
mal distribution. This is also evident from the histogram of the failure data.
Checking whether the Data Follow a Normal Distribution:

Assessing Normality
Statistics and data analysis cases involve making inferences about the pop-
ulation based on the sample data. Several of these inference procedures are
discussed in the chapters that follow. Many of these inference procedures
Figure A.11 Histograms and Probability Plots of Length and Failure Time Data
309

310 APPENDIX A
are based on the assumption of normality; that is, the population from
which the sample is taken follows a normal distribution. Before we draw
conclusions based on the assumption of normality, it is important to de-
termine whether the sample data come from a population that is nor-
mally distributed. Below we present several descriptive methods that can
be used to check whether the data follow a normal distribution. Methods
most commonly used to assess the normality are described in Table A.3.
Table A.3: Checking for Normal Data

1. Construct a histogram or stem-and-leaf of the data. The shape of the histogram
and stem-and-leaf plots will resemble a normal curve if the data are normal or
approximately normal.
2. Calculate the mean, median, and mode (if appropriate) of the data. If these
measures are approximately equal, then the data are symmetrical or approximately
normal.
3. Calculate the mean and standard deviation and the intervals x ± 1s , x ± 2 s
and x ± 3s . If the data are normal, the observations for these intervals would be
approximately 68%, 95%, and 99.7%, respectively.
4. Construct a box plot of the data using the five-measure summary: minimum, Q1,
Q2, Q3, and the maximum. If the data are normal, the distance from minimum
data value to Q1 and Q3 to maximum data value will be approximately equal. In
addition, the median (Q2) will divide the box into approximately two equal halves.
5. Calculate the interquartile range (IQR) and the standard deviation, s, of the sample
data. If the ratio
IQR / s ≈ 1.3
then the data are normal or approximately normal.

6. Construct a normal probability plot of the data. If the data are normal or
approximately normal, the points on the probability plot will fall on a straight line.
Example A.6: Checking for Normal Data

A consultant hired by a hospital management to study the waiting time
of patients at the hospital emergency service collected the data shown
in Table A.4. The table shows the waiting time (in minutes) for 150
customers.
The distribution of the waiting time data is of interest to draw certain con-
clusions. To check whether the waiting time data follow a normal distribution,
numerical and graphical analyses were conducted. The analysis methods out-
lined in Table A.3 were performed on the data. The results are shown below.
APPENDIX A 311
Table A.4: Waiting Time (in minutes)

15.9 18.0 16.8 11.7 13.9 19.0 17.6 19.6 11.9 9.8 7.6
14.7 13.8 13.0 15.7 15.6 11.1 7.8 17.2 14.4 14.5 16.4
7.2 15.7 14.3 17.2 17.6 13.6 15.2 13.7 16.5 11.3 10.7
12.9 15.9 12.4 9.7 17.8 14.9 14.8 15.6 13.3 15.2 11.6
13.4 14.2 13.6 18.2 13.1 5.5 15.5 11.9 14.2 9.0 14.0
15.6 15.7 9.3 12.1 13.7 17.2 13.5 16.8 16.3 12.9 18.0
15.0 17.8 11.4 15.0 10.8 17.4 12.7 12.6 21.1 12.3 13.5
9.2 18.3 13.1 16.4 12.0 19.1 16.9 18.8 9.5 12.1 14.4
14.2 13.5 13.5 10.9 11.8 12.4 11.2 14.6 14.4 13.9 14.9
15.0 10.8 18.0 13.6 17.1 15.3 12.1 17.0 11.4 15.3 10.0
18.5 18.6 16.2 16.5 18.3 10.3 13.8 12.8 13.2 11.9 13.7
9.5 16.8 10.1 15.7 15.2 18.2 4.5 13.5 10.5 13.4 10.5
16.7 11.8 15.3 14.8 15.5 15.2 9.0 14.2 13.4 16.0 16.7
14.1 16.7 13.8 15.9 12.8 21.5 16.4
Graphical and numerical analyses: Figure A.12 shows the histogram

with the normal curve, a box plot of the data along with descriptive sta-
tistics calculated from the data.
Check #1
The histogram of the data in Figure A.12 indicates that the shape very
closely resembles a bell shape or normal distribution. The bell curve su-
perimposed over the histogram shows that the data have a symmetric or
normal distribution centered around the mean. Thus, we can conclude
that the data follow a normal distribution.
Check #2
The values of mean and the median in Figure A.12 are 14.124 and
14.200, respectively. If the data are symmetrical or normal, the values of
the mean and median are very close. Since the mean and median for the
waiting time data are very close, it indicates that the distribution is sym-
metrical or normal.
312
Figure A.12: Graphical and Numerical Summary of Waiting Time
APPENDIX A 313
Check #3
Check #3 requires that we calculate the percentages of the observations

falling between the mean and one, two, and three standard deviations.
If the data are symmetrical or normal, approximately 68 percent of all
observations will fall between the mean and ±one standard deviation,
approximately 95 percent all observations will fall between the mean
and ±two standard deviations, and approximately 99.7 percent of all
observations will fall between the mean and ±three standard deviations.
For the waiting time data the mean x = 14.12 and standard deviation
s = 2.97. Table A.5 shows the percentages for the waiting time data
between the mean and ±one, two, and three standard deviations.
Table A.5: Percentages between One, Two, and Three Standard

Deviations
Interval Percentage in Interval
x ±s 69.3
x ± 2s 95.3
x ± 3s 99.3
The percentages between the mean and standard deviation of the ex-
ample problem (Table A.4 data) agree with the empirical rule or the nor-
mal distribution.
Check #4
The box plot of the data in Figure A.12 shows that the waiting time data
very closely follow a normal distribution.
Check #5
The ratio of the IQR to the standard deviation is calculated below. The
values are obtained from Figure A.12.
Q 3 −Q1 16.325 − 12.100

IQR = = = 1.42
s 2.974
The value is close to 1.3, indicating that the data are approximately normal.
314 APPENDIX A
Check #6
This check involves constructing a normal probability plot of the data.

This plot is shown in Figure A.13. In a probability plot, the ranked data
values are plotted on the x-axis and their corresponding z-scores from
a standard normal distribution are plotted on the y-axis. If the data are
normal or approximately normal, the points will plot on an approximate
straight line. The normal probability plot of waiting time data shows that
the data are very close to a normal distribution.
Figure A.13 Probability Plot of Waiting Time
All of the above checks confirm that the waiting time data very closely
follow a normal distribution.
Student t-Distribution
This is one of the useful sampling distributions related to the normal dis-
tribution. This distribution is used to check the adequacy of the regression
models. Suppose x is a normally distributed random variable with mean
0 and variance 1. Suppose we have another random variable χn 2 with
n degrees of freedom, then the random variable tn is given by:
x
tn =
χ n2 /n
APPENDIX A 315
which follows a t-distribution with (n − 1) degrees of freedom. Like the

normal distribution, this distribution is also symmetrical about the mean,
µ = 0, and its range extends from –∞ < x < ∞. As the degrees of freedom
increase, the t-distribution approaches the normal distribution.
As an example of a random variable that is t-distributed, consider the
sampling of the sample mean. We have seen that the normal distribution
is based on the assumption that the mean µ and the standard deviation
σ of the population are known. In calculating the normal probabilities
using the formula z = (x – µ)/σ the statistic z is calculated on the basis of
a known σ. However, in most cases σ is not known and is estimated using
the sample standard deviation s whose distribution is not normal when
the sample size n is small (<30).
In sampling from a population that is normally distributed, the fraction
( x − µ ) / (σ / n ) follows a normal distribution, but if σ is not known
 
and n is small, ( x − µ ) / ( s / n ) will not be normally distributed. It
was shown by Gosset that the random variable t = ( x − µ ) / ( s / n )
 
follows the distribution known as the t-distribution. The statistic t has
a mean = 0 and a variance >1 (unlike the standard normal distribution
whose mean = 0 and variance = 1). Since the variance is greater than
1, this distribution is less peaked at the center compared to the normal
distribution and is also higher in the tails compared to the normal dis-
tribution. As the sample size n becomes larger, the t-distribution comes
closer and closer to the normal distribution. In the next section, we have
performed an experiment to gain further insight into the t-distribution.
Comparing the Normal and t-Distribution

Objective: Compare the shapes of the normal distribution and the
t-distribution and show that as the degrees of freedom for the t-distribution
increase (or the sample size n increases), the t-distribution will approach a
normal distribution. We will also see that the t-distribution is less peaked
at the center and higher in the tails than the normal distribution.
Problem Statement: In this experiment, we will graph the normal probabil-

ities with mean µ = 0 and standard deviation σ = 1 and compare it to the
probability density functions of t-distribution with 1, 3, 5, and 10 degrees
316 APPENDIX A
of freedom. We will plot the normal and t-distributions on the same plot
and compare the shapes of the t-distributions for different degrees of free-
dom to that of the normal distribution. The steps are outlined below.
• Using MINITAB, calculate the normal probability densities for

x = –4 to 4 with mean µ = 0 and standard deviation σ = 1.0.
• Calculate the probability density function for the t-distribution
with 1, 3, 5, and 10 degrees of freedom for the same values of x as
above and store your results.
• Graph the normal density function and the density functions for
the t-distribution for 1, 3, 5, and 10 degrees of freedom on one
plot. This will help us compare the shape of the normal distribu-
tion and various t-distributions. The plot will be similar to the one
shown in Figure A.14.
From Figure A.14, the innermost curve is the probability density for
t-distribution with one degree of freedom and the outermost curve is the
density function of a normal distribution. You can see that as we increase the
number of degrees of freedom for the t-distribution, the shape approaches
Figure A.14: Comparison between Normal and t-Distributions

(df = degrees of freedom)
APPENDIX A 317
a normal distribution. Also, note that the t-distribution is less peaked at the
center and higher in the tails compared to the normal distribution.
F-distribution
The F-distribution discussed in this section is also related to the normal

distribution. To get an understanding of the F-distribution, suppose that
χu2 and χv2 are two independent χ2 random variables with u and v degrees
of freedom, respectively, then the ratio
χ u2 / u
Fu ,v =
χ v2 / v
follows an F distribution with u as the numerator degrees of freedom and

v as the denominator degrees of freedom. The range of values of F is from
0 to +∞ since the values of χu2 and χv2 are all non-negative. As an applica-
tion of this distribution, consider the following example where we sample
from the F-distribution.
Suppose we have two independent normal variables x1 and x2 with
the following mean and variances x1 ∼ N ( µ1 , σ 12 ) and x 2 ∼ N ( µ2 , σ 22 )
and we draw samples of size n1 and n2 from the first and second normal
process. If the sample variances are s12 and s22, then the ratio
s12 / σ 12
s 22 / σ 22
follows an F-distribution with (n1−1) and (n2−1) degrees of freedom. The

shape of the F-distribution depends on the numerator and denominator
degrees of freedom. As the degrees of freedom increase, the distribution
approaches the normal distribution. In Figure A.15 we have graphed
several F-distributions with different degrees of freedom.
F-distribution is the appropriate distribution for comparing the ratio
of two variances. In regression analysis, the distribution is used in con-
ducting the F-test to check for the overall significance of the regression
model.
318 APPENDIX A
Figure A.15 F-Distribution for Various Degrees of Freedom
Summary
In this section, we provided an overview of statistical methods used in ana-
lytics. A number of statistical techniques both graphical and numerical were
presented. These descriptive statistical tools are used in modeling, studying,
and solving various problems. The graphical and numerical tools of descrip-
tive statistics are also used to describe variation in the process data. The
concept of graphical tools of descriptive statistics includes the concept of
frequency distribution, histograms, stem-and-leaf plot, and box plot. These
are simple but effective tools and their knowledge is essential in studying
analytics. A number of numerical measures include the measures of central
tendency such as the mean and the median. In addition, a number of sta-
tistical measures including the variance and standard deviation are a critical
part of the data analysis. Standard deviation is a measure of variation. When
combined with the mean it provides useful information. In the second part
of this section we introduced the concept of probability distribution and ran-
dom variable. A number of probability distributions both discrete and con-
tinuous were discussed with their properties and applications. We discussed
the normal, t-distribution, and F-distribution. They all find applications in
analytics. These distributions are used in assessing the validity of models and
checking the assumptions.
APPENDIX B
Sampling, Sampling
Distribution, and Inference
Procedure
Sampling and Sampling Distribution

Introduction
In the previous section, we discussed probability distributions of discrete

and continuous random variables and discussed several of the distribu-
tions. The understanding and knowledge of these distributions are critical
in the study of analytics. This section extends the concept of probability
distribution to that of sample statistics. Sample statistics are measures
calculated from the sample data to describe a data set. The commonly
used sample statistics are the sample size (n), sample mean ( x ) , sample
variance (s2), sample standard deviation (s), and sample proportion ( p ) .
Note that some quality characteristics are expressed in proportion or per-
cent, for example, percent of defects produced. Proportion is perhaps the
most widely used statistics after the mean. The examples are proportion
of defective products, poll results, etc.
If the above measures are calculated from the population data they are
called the population parameters. These parameters are population size (N ),
population mean (µ), population variance (σ 2), population standard
deviation (σ), and population proportion (p). In most cases, the sample
statistics are used to estimate the population parameters. The reason for
this estimation is that the parameters of the population are unknown and
they must be estimated. In estimating these parameters, we take samples
and use the sample statistics to estimate the unknown population par-
ameters. For example, suppose we want to know the average height of
320 APPENDIX B
women in a country. To do this we would take a reasonable sample of

women, measure their heights, and calculate the average. This average
will serve as an estimate. To know the average height of the population
(or the population mean), we need to measure the height of every woman
in the country, which is not practical. In most cases, we don’t know the
true value of a population parameter. We estimate these parameters using
the sample statistics.
In this section we will answer questions related to samples and sam-
pling distributions. In sampling theory, we need to consider several fac-
tors and answer questions such as why do we use samples? What is a
sampling distribution and what is the purpose behind it?
In practice, it is not possible to study every item in the population.
Doing so may be too time consuming or expensive. Therefore, a few items
are selected from the population. These items are selected randomly from
the population and are called samples. Selecting a few items from the
population in a random fashion is known as sampling. A number of ran-
dom samples are possible from a population of interest but in practice, in
many cases, usually one such sample of large or small size is selected and
studied. An exception is the control chart applications in quality control
in which a number of repeated samples are used.
Samples are used to make inferences about the population. The par-
ameters of the population ( µ , σ , p ) are usually not known and must be
estimated using sample data. Suppose the characteristic of interest is the
unknown population mean, µ. To estimate this, we collect sample data
and use the statistic sample mean x to estimate this. Many of the prod-
ucts we buy, for example, a set of tires for our car, have a label indicating
the average life of 60,000 miles. A box of bulbs usually has a label indi-
cating the average life of 10,000 hours. These are usually the estimated
values. We don’t know the true mean, µ.
As indicated, there are a number of samples possible from a popula-
tion of interest. When we take such samples of size n and calculate the
sample mean x , each possible random sample has an associated value
of x , which is the sample mean. Thus, the sample mean x is a random
variable that assigns a number to x . This number is the calculated value
of the sample mean x . Recall that a random variable is a variable that
takes on different values as a result of an experiment. Since the samples
APPENDIX B 321
are chosen randomly, each sample has equal probability of being selected
and the sample mean calculated from these samples has equal probability
of going up and down the true population mean.
Because the sample mean x is a random variable, it can be described
using a probability distribution. The probability distribution of a sample
statistic is called its sampling distribution and the probability distribu-
tion of the sample mean x is known as the sampling distribution of the
sample mean. The sampling distribution of the sample has certain prop-
erties that are used in making inference about the population. The central
limit theorem plays an important role in the study of sampling distribu-
tion. We will also study the central limit theorem and see how the amazing
results produced by it are applied in analyzing and solving many problems.
The concepts of sampling distribution form the basis for the inference
procedures. It is important to note that a population parameter is always
a constant, whereas a sample statistic is a random variable. Similar to the
other random variables, each sample statistic can be described using a
probability distribution.
Besides sampling and sampling distribution, other key topics in this
section include point and confidence interval estimates of means and
proportions. We also discuss the concepts of hypothesis testing. These
concepts are important in the study of analytics.
Statistical Inference and Sampling Techniques

Statistical Inference
The objective of statistical inference is to draw conclusions or make decisions

about a population based on the samples selected from the population. To
be able to draw conclusion from the sample, the distribution of the samples
must be known. Knowledge of sampling distribution is very important in
drawing conclusion from the sample regarding the population of interest.
Sampling Distribution
Sampling distribution is the probability distribution of a sample statistic

(sample statistic may be a sample mean x , a sample variance s2, a sample
standard deviation s, or a sample proportion p ).
322 APPENDIX B
As indicated earlier, in most cases the true value of the population par-
ameters is not known. We must draw a sample or samples and calculate
the sample statistic to estimate the population parameter. The sampling
error of the sample mean is given by
Sampling error = x − µ
Suppose we want to draw a conclusion about the mean of certain

population. We would collect samples from this population, calculate the
mean of the samples, and determine the probability distribution (shape)
of the sample means. This probability distribution of the population
may follow a normal or a t-distribution, or any other distribution. The
distribution will then be used to draw conclusion about the population
mean.
• Sampling distribution of the sample mean ( x ) is the probability

distribution of all possible values of the sample mean x .
• Sampling distribution of sample proportion p is the probability
distribution of all possible values of the sample proportion p.
The process of sampling distribution is illustrated in Figure B.1.
Figure B.1: Process of Sampling Distribution

Note: [* 50 samples each of size n = 30 means that 50 different samples are drawn, where
each sample will have 30 items in it. Also, a probability distribution is similar to a frequency
distribution. Using the probability distribution, the shape of the sample means is determined].
APPENDIX B 323
Example B.1 Examining the Distribution

of the Sample Mean x
The assembly time of a particular electrical appliance is assumed to have a
mean µ = 25 minutes, and a standard deviation σ = 5 minutes.
1. Draw 50 samples each of size 5 (n = 5) from this population using

MINITAB statistical software or any other statistical package.
2. Determine the average or the mean of each of the samples drawn.
3. Draw a histogram of the sample means and interpret your findings.
4. Determine the average and standard deviation of the 50 sample
means. Interpret the meaning of these.
5. What conclusions can you draw from your answers to (3) and (4)?
Solution to (1): Table B.1 shows 50 samples each of size 5 using

MINITAB.
Solution to (2): The last column shows the mean of each sample drawn.
Note that each row represents a sample of size 5.
Table B.1: Fifty Samples of Size 5 (n = 5)

Sample Mean, x
1 30.50 24.27 33.33 21.54 22.63 26.45
2 25.87 27.84 23.17 25.52 27.09 25.90
3 18.03 24.11 26.94 26.63 26.81 24.51
4 28.44 28.87 20.90 27.51 24.34 26.01
5 24.45 23.14 28.04 21.47 21.84 23.79
6 23.73 25.32 24.84 21.87 23.89 23.93
7 25.84 24.04 30.87 20.64 26.11 25.50
8 26.63 22.50 26.85 31.51 25.49 26.60
9 26.02 28.94 25.19 24.24 21.99 25.28
10 24.91 27.00 25.47 26.34 24.21 25.59
:
48 19.24 23.47 22.69 22.85 26.59 22.97
49 18.66 28.18 21.92 20.98 22.54 22.45
50 27.89 21.28 21.27 31.06 25.72 25.44
Solution to (3): Figure 3.2 shows the histogram of the sample means
shown in the last column of Table B.1. The histogram shows that the
324 APPENDIX B
sample means are normally distributed. Figure B.2 is an example of the

sampling distribution of the sample means x .
In a similar way, we can do the sampling distribution of other sta-
tistics such as the sample variance or the sample standard deviation. As
we will see later, the sampling distribution provides the distribution or
the shape of the sample statistic of interest. This distribution is useful in
drawing conclusions.
Figure B.2 Sampling Distribution of the Sample Means
Solution to (4): The mean and standard deviation of the sample means
shown in the last column of Table B.1 were calculated using a computer
package. These values are shown in Table B.2.
Table B.2: Mean and Standard Deviation of Sample Means

Descriptive Statistics: Sample Mean
Mean StDev
25.0942 1.1035
The mean of the sample means is 25.0942, which indicates that x

values are centered at approximately the population mean of µ = 25 .
However, the standard deviation of 50 sample means is 1.1035, which
is much smaller than the population standard deviation σ = 3. Thus, we
APPENDIX B 325
conclude that x —or the sample mean—values have much less variation
than the individual observations.
Solution to (5): Based on parts (3) and (4), we conclude that the sample
mean x follows a normal distribution, and this distribution is much
narrower than the population of individual observations. This is apparent
from the standard deviation of x value, which is 1.1035 (see Table B.2).
In general, the mean and standard deviation of the random variable x
are given as follows.
Mean of the sample mean, x is
µx = µ or E ( x ) = µ (i)
The standard deviation of the sample mean x is
σ
σx = (ii)
n
For our example, µ = 25 , σ = 5 , and n = 5 . Using these values
µ x = µ = 25
σ 5
and σx = = = 2.236
n 5
From Table B.2, the mean and the standard deviation of 50 sample
means were 25.0942 and 1.1035, respectively. These values will get closer
to 25 and 3.0 if we take more and more samples of size 5.
Standard Deviation of the Sample Mean or the Standard Error
Both equations (i) and (ii) are of considerable importance. Equation

(ii) shows that the standard deviation of the sample mean x (or the sam-
pling distribution of the random variable x ) varies inversely as the square
root of the sample size. Since the standard deviation of the mean is a
measure of the scatter of the sample means, it provides the precision that
326 APPENDIX B
we can expect of the mean of one or more samples. The standard devi-
ation of the sample mean σ x is often called the standard error of the
mean. Using equation (ii), it can be shown that a sample of 16 observa-
tions (n = 16) is twice as precise as a sample of 4 (n = 4). It may be argued
that the gain in precision in this case is small, relative to the effort in
taking additional 12 observations. However, doubling the sample size in
other cases may be desirable.
Figure B.3 shows a comparison between the probability distribution
of individual observations and the probability distributions of means of
samples of various sizes drawn from the underlying population.
Note that as the sample size increases, the standard error becomes
smaller and hence the distribution becomes more peaked. It is obvious
from Figure B.3 that a sample of one does not tell us anything about the
precision of the estimated mean. As more samples are taken, the standard
error decreases, thus providing greater precision.
Figure B.3 Probability Distribution of Sample Means (n = 4, 9, 16,

and 36) Compared to Individual Observations
Central Limit Theorem

The other important concept in statistics and sampling is the central limit
theorem. The theorem states that as the sample size (n) increases, the dis-
tribution of the sample mean ( x ) approaches a normal distribution.
APPENDIX B 327
This means that if samples of large size (n ≥ 30) are selected from a
population, then the sampling distribution of the sample means is ap-
proximately normal. This approximation improves with larger samples.
The Central Limit Theorem has major applications in sampling and
other areas of statistics. It tells us that if we take a large sample (n ≥ 30) ,
we can use the normal distribution to calculate the probability and draw
conclusion about the population parameter.
• Central Limit Theorem has been proclaimed as “the most import-

ant theorem in statistics”1 and “perhaps the most important result
of statistical theory.”
• The Central Limit Theorem can be proven to show the “amazing
result” that the mean values of the sum of a large number of in-
dependent random variables are normally distributed.
• The probability distribution resulting from “a large number of in-
dividual effects . . . would tend to be Gaussian1 or Normal.”
The above are useful results in drawing conclusions from the data. For
a sample size of n = 30 or more (large sample), we can always use the
normal distribution to draw conclusions from the sample data.
• For a large sample, the sampling distribution of the sample mean

( x ) follows a normal distribution and the probability that the
sample mean ( x ) is within a specified value of the population
mean (µ) can be calculated using the following formulas:
x −µ
z = (for an infinite population) (iii)
σ
n

or
x −µ
z = (for a finite population) (iv)
σ N −n
n N −1
1
Ostle, Bernard and Mensing, Richard W., Statistics in Research, Third Edition, The
Iowa State University Press, Ames, Iowa, 1979, p. 76.
328 APPENDIX B
In the above equations, n is the sample size and N is the population

size. In a finite population, the population size N is known, whereas
in an infinite population, the population size is infinitely large. Equa-
tion (iii) is for an infinite population, and equation (iv) is for a finite
population.
APPENDIX C
Review of Estimation,
Confidence Intervals, and
Hypothesis Testing
Estimation and hypothesis testing come under inferential statistics.

Inferential Statistics is the process of using sample statistics to draw
conclusions about the population parameters. Interference problems
are those that involve inductive generalizations. For example, we use the
statistics of the sample to draw conclusions about the parameters of the
population from which the sample was taken. An example would be to
use the average grade achieved by one class to estimate the average grade
achieved in all ten sections of the same course. The process of estimating
this average grade would be a problem of inferential statistics. In this case,
any conclusion made about the ten sections would be a generalization,
which may not be completely valid, so it must be stated how likely it is
to be true.
Statistical inference involves generalization and a statement about the
probability of its validity. For example, an engineer or a scientist can make
inferences about a population by analyzing the samples. Decisions can
then be made based on the sample results. Making decisions or drawing
conclusions using sample data raises question about the likelihood of the
decisions being correct. This helps us understand why probability theory
is used in statistical analysis.
330 APPENDIX C
Tools of Inferential Statistics

Inferential tools allow a decision maker to draw conclusions about the popula-
tion using the information from the sample data. There are two major tools
of inferential statistics: estimation and hypothesis testing. Figure C.1
shows the tools of inferential statistic.
• Estimation is the simplest form of inferential statistics in which

a sample statistic is used to draw conclusion about an unknown
population parameter.
• An estimate is a numerical value assigned to the unknown popu-
lation parameter. In statistical analysis, the calculated value of a
sample statistic serves as the estimate. This statistic is known as the
estimator of the unknown parameter.
• Estimation or parameter estimation comes under the broad topic
of statistical inference.
• The objective of parameter estimation is to estimate the unknown
population parameter using the sample statistic. Two types of esti-
mates are used in parameter estimation: point estimate and inter-
val estimate.
The parameters of a process are generally unknown; they change over

time and must be estimated. The parameters are estimated using the tech-
niques of estimation theory. Hypothesis testing involves making a deci-
sion about a population parameter using the information in the sample
data. These techniques are the basis for most statistical methods.
Figure C.1: Tools of Inferential Statistics

APPENDIX C 331
Estimation and Confidence Intervals

Estimation
There are two types of estimates: (a) point estimates, which are single-
value estimates of the population parameter, and (b) interval estimates
or the confidence intervals, which are a range of numbers that contain
the parameter with specified degree of confidence known as the confi-
dence level. Confidence level is a probability attached to a confidence
interval that provides the reliability of the estimate. In the discussion of
estimation, we will also consider the standard error of the estimates, the
margin of error, and the sample size requirement.
Point Estimate
As indicated, the purpose of a point estimate is to estimate the value of a

population parameter using a sample statistic. The population parameters
are µ , σ , p etc.
A) The point estimate of the population mean (μ) is the sample mean ( x ),
x =
∑x
n
B) The point estimate of the population standard deviation (σ) is the

sample standard deviation (s)
∑ xi2 −
( ∑ x i )2
∑ ( xi − x ) or
2
s = s = n
n −1 n −1
C) The point estimate of a population proportion (p) is the sample pro-

x
portion ( p ): p = where x = no. of successes, n = sample size
n
Interval Estimate
An interval estimate provides an interval or range of values that are used

to estimate a population parameter. To construct an interval estimate,
332 APPENDIX C
we find an interval about the point estimate so that we can be highly

confident that it contains the parameter to be estimated. An interval with
high confidence means that it has a high probability of containing the
unknown population parameter that is estimated.
An interval estimate acknowledges that the sampling procedure is subject
to error, and therefore, any computed statistic may fall above or below its
population parameter target.
The interval estimate is represented by an interval or range of possible
values so it implies the presence of uncertainty. An interval estimate is
represented in one of the following ways:
16.8 ≤ µ ≤ 18.6
or
(16.8 to 18.6)
or
(16.8–18.6)
A formal way of writing an interval estimate is “L ≤ µ ≤ U” where L

is the lower limit and U is the upper limit of the interval. The symbol µ
indicates that the population mean µ is estimated. The interval estimate
involves certain probability known as the confidence level.
Confidence Interval Estimate

In many situations, a point estimate does not provide enough informa-
tion about the parameter of interest. For example, if we are estimating the
mean or the average salary for the students graduating with a bachelor’s
degree in business, a single estimate that would be a point estimate may
not provide the information we need. The point estimate would be the
sample average and will just provide a single estimate that may not be
meaningful. In such cases, an interval estimate of the following form is
more useful:
L≤µ≤U
APPENDIX C 333
It also acknowledges sampling error. The end points of this interval

will be random variables since they are a function of sample data.
To construct an interval estimate of unknown parameter β, we must
find two statistics L and U such that
P {L ≤ β ≤ U} = 1−α (v)
The resulting interval L ≤ β ≤ U is called a 100 (1 – α) percent con-

fidence interval for the unknown parameter β. L and U are known as the
lower and upper confidence limits, respectively, and (1 – α) is known
as the confidence level. A confidence level is the probability attached to
a confidence interval. A 95 percent confidence interval means that the
interval is estimated with a 95 percent confidence level or probability.
This means that there is a 95 percent chance that the estimated interval
would include the unknown population parameter being estimated.
Interpretation of Confidence Interval
The confidence interval means that if many random samples are col-
lected and a 100 (1−α) percent confidence interval computed from each
sample for β, then 100 (1−α) percent of these intervals will contain the
true value β.
In practice, we usually take one sample and calculate the confidence
interval. This interval may or may not contain the true value, and it is
not reasonable to attach a probability level to this specific event. The ap-
propriate statement would be that β lies in the observed interval [L,U]
with confidence 100(1−α). That is, we don’t know if the statement is
true for this specific sample, but the method used to obtain the interval
[L,U] yields correct statement 100 (1−α) percent of the time. The inter-
val L ≤ β ≤ U is known as a two-sided or two-tailed interval. We can also
build one-sided interval. The length of the observed confidence interval
is an important measure of the quality of information obtained from the
sample. The half interval (β – L) or (U – β) is called the accuracy of the
estimator. A two-sided interval can be interpreted in the following way:
The wider the confidence interval, the more confident we are that the inter-
val actually contains the unknown population parameter being estimated.
334 APPENDIX C
On the other hand, the wider the interval, the less information we have about
the true value of β. In an ideal situation, we would like to obtain a relatively
short interval with high confidence.
Confidence interval for the mean, known variance σ2

(or σ known)
The confidence interval estimate for the population mean is centered

around the computed sample mean ( x ). The confidence interval for the
mean is constructed based on the following factors:
A) The size of the sample (n),

B) The population variance (known or unknown), and
C) The level of confidence.
Let X be a random variable with an unknown mean µ and known vari-

ance σ2. A random sample of size n ( x1 , x 2 ,..., xn ) is taken from the popula-
tion. A 100 (1 – α) percent confidence interval on µ can be obtained by
considering the sampling distribution of the sampling mean x . We know
that the sample mean x follows a normal distribution as the sample size n
increases. For a large sample n the sampling distribution of the sample mean
is almost always normal. The sampling distribution is given by:
x −µ
z = (vi)
σ
n
The distribution of the above is normal and is shown in Figure C.2.

To develop the confidence interval for the population mean µ, refer to
Figure C.2.
APPENDIX C 335
Figure C.2: Distribution of the Sample Mean
From the above figure we see that:
P {− zα / 2 ≤ z ≤ zα / 2 } = 1 − α
or
 x −µ 
P  − zα / 2 ≤ ≤ zα / 2  = 1 − α
 σ/ n 
This can be rearranged to give:
 zα σ 
 
P x − 2 ≤ µ ≤ x + zα σ / n  = 1 − α
 n 2 
 
This leads to:
 zα σ 
 
 x − 2
≤ µ ≤ x + z α σ / n  (vii)
 n 2 
 
336 APPENDIX C
Equation (vii) is a 100(1−α) percent confidence interval for the popu-

lation mean µ.
The confidence interval formula to estimate the population mean µ

for known and unknown population variances or standard deviations
The confidence interval is constructed using a normal distribution. The

following two formulas are used when the sample size is large:
A) Known population variance (σ2) or known standard deviation (σ)
σ σ
x − zα / 2 ≤ µ ≤ x + zα / 2 (viii)
n n
Note that the margin of error is given by
σ
E = zα /2 (ix)
n

B) Unknown population variance (σ2) or unknown standard

deviation (σ)
s s
x − t n −1,α / 2 ≤ µ ≤ x + t n −1,α / 2 (x)
n n
C) Unknown population variance (σ2) or unknown standard deviation (σ)
If the population variance is unknown and the sample size is large, the
confidence interval for the mean can also be calculated using a normal
distribution using the following formula:
s
x ± zα /2 (xi)
n
In the above confidence interval formula, s is the sample standard

deviation.
APPENDIX C 337
Confidence interval for the mean when the sample size is small
and the population standard deviation s is unknown
When σ is unknown and the sample size is small, use t-distribution for
the confidence interval. The t-distribution is characterized by a single par-
ameter, the number of degrees of freedom (df ), and its density function
provides a bell-shaped curve similar to a normal distribution.
The confidence interval using t-distribution is given by
s s
x − t n −1,α / 2 ≤ µ ≤ x + t n −1,α / 2
n n (xii)
where t n−1,α/2 = t-value from the t-table for (n−1) degrees of freedom and
α/2 where α is the confidence level.
Confidence interval for estimating the population proportion p
In this section, we will discuss the confidence interval estimate for the
proportions. A proportion is a ratio or fraction, or percentage that in-
dicates the part of the population or sample having a particular trait of
interest. Following are the examples of proportions: (1) a software com-
pany claiming that its manufacturing simulation software has 12 percent
of the market share, (2) a public policy department of a large university
wants to study the difference in proportion between male and female un-
employment rate, and (3) a manufacturing company wants to determine
the proportion of defective items produced by its assembly line. In all
these cases, it may be desirable to construct the confidence intervals for
the proportions of interest. The population proportion is denoted by “p,”
whereas the sample proportion is denoted by p.
In constructing the confidence interval for the proportion:
1. The underlying assumptions of binomial distribution holds.

2. The sample data collected are the results of counts (e.g., in a sample
of 100 products tested for defects, 6 were found to be defective).
338 APPENDIX C
3. The outcome of the experiment (testing 6 products from a randomly

selected sample of 100 is an experiment) has two possible outcomes—
“success” or “failure” (a product is found defective or not).
4. The probability of success (p) remains constant for each trial.
We consider the sample size (n) to be large. If the sample size is large and
np ≥ 5, and
n(1 − p ) ≥ 5
[where n = sample size, p = population proportion], the binomial distri-

bution can be approximated by a normal distribution. In constructing the
confidence interval for the proportion, we use large sample size so that
normal distribution can be used.
The confidence interval is based on:
A) The large sample so that the sampling distribution of the sample pro-
portion ( p ) follows a normal distribution.
B) The value of sample proportion.
C) The level of confidence, denoted by z.
The confidence interval formula is given by:
p (1 − p ) p (1 − p )
p − zα / 2 ≤ p ≤ p + zα / 2 (xiii)
n n
In the above formula, note that

p = population proportion and p is the sample proportion given by
p = x /n.
Sample Size Determination

Sample size (n) to estimate m
Determining the sample size is an important issue in statistical analysis.

To determine the appropriate sample size (n), the following factors are
taken into account:
APPENDIX C 339
A) The margin of error E (also known as tolerable error level or the ac-
curacy requirement). For example, suppose we want to estimate the
population mean salary within $500 or within $200. In the first case,
the error E = 500; in the second case, E = 200. A smaller value of the
error E means more precision is required, which in turn will require
a larger sample. In general, smaller the error, larger the sample size.
B) The desired reliability or the confidence level.
C) A good guess for σ.
Both the margin of error E and reliability are arbitrary choices that have
an impact on the cost of sampling and the risks involved. The following
formula is used to determine the sample size:
( zα / 2 )2 σ 2
n= (xiv)
E2
E = margin of error or accuracy (or maximum allowable error), n = sam-

ple size
Sample size (n) to estimate p
The sample size formula to estimate the population proportion p is deter-

mined similar to the sample size for the mean. The sample size is given by
( zα / 2 )2 p(1 − p )
n=
E2 (xv)
p = population proportion (if p is not known or given, use p = 0.5).
Example C.1
A quality control engineer is concerned about the bursting strength of a glass

bottle used for soft drinks. A sample of size 25 (n = 25) is randomly ob-
tained, and the bursting strength in pounds per square inch (psi) is recorded.
The strength is considered to be normally distributed. Find a 95 percent
confidence interval for the mean strength using both t-distribution and
normal distribution. Compare and comment on your results.
340 APPENDIX C
Data: Bursting Strength (×10 psi)

26, 27, 18, 23, 24, 20, 21, 24, 19, 27, 25, 20, 24, 21, 26, 19, 21, 20, 25,
20, 23, 25, 21, 20, 21
Solution:
First, calculate the mean and standard deviation of 25 values in the data.
You should use your calculator or a computer to do this. The values are
x = 22.40
S = 2.723
The confidence interval using a t-distribution can be calculated using

the following formula:
 s 
x ±t α  n 
n −1,
2
A 95 percent confidence interval using the above formula:
2.723
22.40 ± ( 2.064 )
25
21.28 ≤ µ ≤ 23.52
The value 2.064 is the t-value from the t-table for n−1 = 24 degrees of
freedom and α/2 = 0.025.
The confidence interval using a normal distribution can be calculated
using the formula below:
σ
x ± Zα / 2
n
 2.723 
22.40 ± 1.96 
 25 
APPENDIX C 341
This interval is
21.33 ≤ µ ≤ 23.47
The confidence interval using the t-distribution is usually wider.

This happens because with smaller sample size, there is more uncertainty
involved.
Example C.2
The average life of a sample of 36 tires of a particular brand is 38,000 miles.

If it is known that the average lifetime of the tires is approximately normally
distributed with a standard deviation of 3,600 miles, construct 80 percent,
90 percent, 95 percent, and 99 percent confidence intervals for the average
tire life. Compare and comment on the confidence interval estimates.
Solution: Note the following data:
n = 36 x = 38, 000 σ = 3, 600
Since the sample size is large (n ≥ 30), and the population standard
deviation σ is known, the appropriate confidence interval formula is
σ
x ± zα /2
n
The confidence intervals using the above formula are shown below.
A) 80 percent confidence interval
 3, 600 
38, 000 ± 1.28 
 36 
37, 232 ≤ µ ≤ 38, 768
B) 90 percent confidence interval
 3, 600 
38, 000 ± 1.645 
 36 
37, 013 ≤ µ ≤ 38, 987
342 APPENDIX C
C) 95 percent confidence interval
 3, 600 
38, 000 ± 1.96 
 36 
36, 824 ≤ µ ≤ 39,176
D) 99 percent confidence interval
 3, 600 
38, 000 ± 2.58 
 36 
36, 452 ≤ µ ≤ 39, 548
Note that the z-values in the above confidence interval calculations are
obtained from the normal table. Refer to the normal table for the values
of z. Figure C.3 shows the confidence intervals graphically.
Figure C.3: Effect of Increasing the Confidence Level on the

Confidence Interval
Figure C.3 shows that larger the confidence level, the wider is the
length of the interval. This indicates that for a larger confidence interval,
we gain confidence. There is higher chance that the true value of the par-
ameter being estimated is contained in the interval but at the same time,
we lose accuracy.
Example C.3
During an election year, ABC news network reported that according to

its poll, 48 percent voters were in favor of the democratic presidential
APPENDIX C 343
candidate with a margin of error of ±3 percent. What does this mean? From
this information, determine the sample size that was used in this study.
Solution: The polls conducted by the news media use a 95 percent con-
fidence interval unless specified otherwise. Using a 95 percent confidence
interval, the confidence interval for the proportion is given by
p (1 − p )
p ± 1.96
n
The sample proportion, p = 0.48 . Thus, the confidence interval can

be given by
0.48 (1 − 0.48)
0.48 ± 1.96
n
Since, the margin of error is ±3 percent, it follows that
0.48 (1 − 0.48)
1.96 = 0.03
n
Squaring both sides and solving for n gives
n = 1066
Thus, 1066 voters were polled.
Example C.4
A pressure seal used in an assembly must be able to withstand a maximum

load of 6,000 psi before bursting. If the average maximum load of a sam-
ple of seals taken from a shipment is less than 6,000 psi, then the quality
control must reject the entire shipment. How large a sample is required
if the quality engineer wishes to be 95 percent confident that the error in
estimating this quantity is no more than 15 psi, or the probability that the
344 APPENDIX C
sample mean differs from the population mean by no more than 15 psi
is 0.95. From the past experience, it is known that the standard deviation
for bursting pressures of this seal is 150 psi.
Solution: The sample size n can be calculated using
2
z σ
n =  α /2 
 E 
Since Zα / 2 = Z 0.025 = 1.96 , σ = 150, and the error E = 15, the

required sample size will be as follows:
2
 (1.96)150 
n=  ≈ 385
 15 
Confidence Interval on the Difference between

Two Means
The confidence interval discussed in the previous sections involved a
single population. In many cases, we are interested in comparing two
populations. Suppose the average hourly wage of union and nonunion
workers is of interest. We are interested to find out whether there is a dif-
ference in the wages between the two. In this case, we would construct
a confidence interval on the difference between the two means. These
intervals depend on what information is available. The different cases are
explained below.
Case 1
Assumptions:
• the two populations are independent.

• the population variances are equal.
• the samples n1 and n2 are large and population variances σ 12 and
σ 22 are known so that normal distribution applies.
APPENDIX C 345
If the above assumptions hold, then the confidence interval for the
difference between two population means is given by
σ 12 σ 22 (xvi)
( x1 − x 2 ) ± zα / 2 +
n1 n2
or
σ 12 σ 22 σ 12 σ 22
( x1 − x 2 ) − zα / 2 + ≤ µ1 − µ2 ≤ ( x1 − x 2 ) + zα / 2 +
n1 n2 n1 n2
(xvii)
where x 1 , x 2 are the sample means from populations 1 and 2

n1, n2 = sample size of samples 1 and 2
σ 12 , σ 22 = variances of 1st and 2nd populations, respectively (known in this case)
Case 2
Assumptions:
• the two populations are independent;

• the population variances are equal;
• the samples n1 and n2 are large and population variances σ 12 and σ 22
are unknown, which means the normal distribution can be applied.
s12 s 22
( x1 − x 2 ) ± zα / 2 + (xviii)
n1 n2
or
s12 s 22 s2 s2
( x1 − x 2 ) − zα / 2 + ≤ µ1 − µ2 ≤ ( x1 − x 2 ) + zα / 2 1 + 2
n1 n2 n1 n2
(xix)
where s12 , s 22 = sample variances of 1st and 2nd sample, respectively

x1 , x 2 = sample means from populations 1 and 2, respectively
346 APPENDIX C
Case 3
Assumptions:
• the two populations are independent;

• the population variances are equal;
• the samples n1 and n2 are small (<30) and population variances
σ 12 and σ 22 are unknown so that t-distribution applies.
1 1
( x 1 − x 2 ) ± t n1 + n2 − 2,α / 2 s 2p  +  (xx)
 n1 n2 

or
1 1 1 1
( x 1 − x 2 ) − t n1 + n2 − 2,α / 2 s 2p  +  ≤ µ1 − µ2 ≤ ( x 1 − x 2 ) + t n1 + n2 − 2,α / 2 s 2p  +
 n1 n2   n1 n2
1 1 2 1 1
 n + n  ≤ µ1 − µ2 ≤ ( x 1 − x 2 ) + t n1 + n2 − 2,α / 2 s p  n + n  (xxi)
1 2 1 2

where x1 , x 2 = sample means from populations 1 and 2, respectively

n1, n2 = sample size of sample, 1 and 2
s12 , s 22 = sample variances of 1st and 2nd sample, respectively
n1+ n2 – 2 = degrees of freedom
s 2p is the “pooled” or combined variance, given by
(n1 − 1)s12 + (n2 − 1)s 22

s 2p =
n1 + n2 − 2
Note: in this problem, first calculate s 2p and then substitute in equation

to calculate the confidence interval.
APPENDIX C 347
Example C.4
The average hourly wage of union and nonunion workers is of interest.

Assume that (i) the populations are independent and (ii) the variances of
the populations are equal ( σ 12 = σ 22 ). Sample from both groups were
taken with the following results:
Union Workers Nonunion Workers

n1 = 15 n2 = 20
x1 = 14.54 x 2 = 15.36
s1 = 2.24 s2 = 1.99
A) Find the “pooled” estimate of the variance.
S p2 =
(n1 − 1) s12 + (n2 − 1) s22
n1 + n2 − 2
=
(14)( 2.24)2 + (19) (1.99)2 = 4.41
33
B) Find a 95 percent confidence interval for the difference in the mean

hourly wage.
Confidence interval formula for this case:
1 1
( x1 − x 2 ) ± t n + n − 2,α / 2 Sp 2  + 
1 2
 n1 n2 
↓
t 33,0.025 = 2.035
(14.54 − 15.36) ± ( 2.035)(0.72)
−0.82 ± 1.47
−2.29 to 0.65
or –2.29 ≤β μ1 – μ2 ≤ 0.65
Note that this interval contains zero (–2.29 to 0.65). This means that the
difference is zero at some point in the interval, indicating there is no dif-
ference in the average wage of union and nonunion workers.
348 APPENDIX C
Point and Interval Estimates for the Difference between Two

Population Proportions
Consider two large populations. Let p1 and p2 denote the proportions of

these two populations that have certain characteristic of interest. Suppose
we want to find the difference between the two population proportions.
To do this, we will take two random samples of size n1 and n2 from the
two populations. If x1 is the number having the characteristic of interest
in the first population and x2 the number having the characteristic of
interest in the second population, then p1 and p2 are the sample propor-
tions of the two samples that have the characteristic of interest. Note that
x1 x
p1 = and p2 = 2
n1 n2
The point estimate for the difference between the population propor-
tions is given by
Point estimate = p1 − p2 (xxii)
The confidence interval for the difference between the population

proportions is given by
1 1
( p1 − p2 ) ± zα / 2 p (1 − p )  + 
 n1 n2 
(xxiii)
where ( p ) is the “pooled” or combined proportion given by
x1 + x 2 or p = n1 p1 + n2 p2
p = n1 + n 2
n1 + n 2
Example C.5
A manufacturer of computer chips has two plants in two different cities.

An improved method of producing the chips is suggested by the research
and development department which is expected to reduce the proportion
APPENDIX C 349
of defective chips. After implementing the new method in one of the

plants, the manufacturer wanted to test whether the improved method
actually reduced the proportion of defects. To do this, they collected a
sample of 400 chips from the improved method of production and a
sample of 450 chips from the old production method. The numbers of
defective chips were found to be 80 and 108, respectively. At a 95 percent
confidence, can we conclude that there is a difference in proportion of
defective chips?
x1 80
Proportion of defective using the improved method: p1 = = = 0.20
n1 400
x2 108
Proportion of defective using the old method: p2 = = = 0.24
n2 450
x1 + x 2 80 + 108
Combined or the “pooled” proportion: p = = = 0.221
n1 + n2 400 + 450
The confidence interval:
1 1
( p1 − p2 ) ± zα / 2 p (1 − p )  + 
 n1 n2 
 1 1 
(0.20 − 0.24) ± (1.96) (0.221)(1 − 0.221)  +
 400 450 
−0.04 ± 0.06
Thus, a 95 percent confidence interval is
−0.1 ≤ p1 − p2 ≤ 0.02
This interval contains zero, which indicates that there is no difference

between the improved method and the old method of production.
APPENDIX D
Hypothesis Testing for

One and Two Population
Parameters
Hypothesis Testing
A hypothesis is a statement about a population parameter. This statement

may come from a claim made by a manufacturer, a mathematical model,
a theory, design specifications, etc. For example, an automobile manufac-
turer may claim that they have come up with a new fuel injection system
design that provides an improved average mileage of 50 miles a gallon.
In such a case, we may want to test the claim by taking sample data. The
claim can formally be written as a hypothesis testing problem and can
be tested using a hypothesis testing procedure. Several population par-
ameters may be of interest. Sometimes we may be interested in testing
the average or the mean. In other cases, we may be interested in testing a
variance, a standard deviation, or a population proportion.
Many problems require us to decide whether or not a statement about
some parameter is true or false. This statement about the population par-
ameter is called a hypothesis.
Hypothesis testing is the decision-making procedure about a state-
ment being true or false. The statement is about a population parameter
of interest, such as a population mean, population variance, or a popula-
tion proportion. It involves making a decision about a population param-
eter based on the information contained in the sample data.
Hypothesis testing is one of the most useful aspects of statistical in-
ference because many types of decision problems can be formulated as
hypothesis testing problems.
352 APPENDIX D
The control charts used in statistical process control are closely related
to the hypothesis testing. The tests are also used in several quality control
problems and form the basis of many of the statistical process techniques
to be discussed in the coming chapters.
Example D.1: Testing a Single Population Mean—An Example
An automobile manufacturer claims that their new hybrid model will

provide 60 miles per gallon on average because of an improved design. A
consumer group wants to test whether this claim is correct. They would
test a hypothesis stated formally as follows:
H0: μ = 60mpg
H1: μ ≠ 60mpg
Here, H0 is known as the null hypothesis and H1 (also written Ha) is

called the alternate hypothesis. The hypothesis written with an “equal to”
sign under the null hypothesis and a “not equal to” sign under the alter-
nate hypothesis is known as a two-sided or two-tailed test. A hypothesis
can also be one-sided or one-tailed. The alternate hypothesis is opposite of
the null hypothesis. To test the validity of the above claim,
• The consumer group would gather the sample data and calculate
the sample mean, x .
• Compare the difference between the hypothesized value (µ) and
the value of the sample mean ( x ).
• If the difference is small, there is a greater likelihood that the hy-
pothesized value of the population mean is correct. If the difference
is large, there is less likelihood that the claim about the population
mean is correct.
In most cases, the difference between the hypothesized population

parameter and the actual sample statistic is neither so large that we reject
our hypothesis nor is it so small that we accept it. Thus, in hypothesis
testing, clear-cut solutions are not the rule.
APPENDIX D 353
Note that in hypothesis testing, the decision to reject or not to reject the
hypothesis is based on a single sample and therefore, there is always a chance
of not rejecting a hypothesis that is false, or rejecting a hypothesis that is
true. In fact, we always encounter two types of errors in hypothesis test-
ing. These are:
Type I error = α = P {Reject H0 ∙ H0 is true}

Type II error = β = P {Fail to reject H0 ∙ H0 is false} (i)
We also use another term known as the power of the test defined as
Power = 1 – β = P {Reject H0 ∙ H0 is false}
Thus, type I error, denoted by a Greek letter α (“alpha”), is the prob-

ability of rejecting a true null hypothesis and type II error, denoted by a
Greek letter β (“beta”), is the probability of not rejecting a null hypothesis
when it is false. In a hypothesis testing situation, there is always a possibil-
ity of making one of these errors.
The power of the test is the probability that a false null hypothesis is cor-
rectly rejected. Type I error α is selected by the analyst. Increasing the type
I error α will decrease the type II error β and decreasing the type I error α
error will increase the type II error β. Thus, the type I error is controlled by
the analyst and the type II error is determined based on the type I error. The
type II error is a function of the sample size. The larger the sample size n, the
smaller is the β.
The type I error α is also known as the level of significance. In
hypothesis testing, we specify a value of the type I error or the level of
significance and then design a test that will provide a small value of the
type II error. Recall that the type I error is the probability of rejecting a true
null hypothesis. Since we don’t want this to happen, α is set to a low value
of 1 percent, 5 percent or 10 percent in general cases. If you set the value
of α = 5 percent, it means that there is a 5 percent chance of making an
incorrect decision, and a 95 percent chance of making a right decision. In
a hypothesis testing, there is never a 100 percent chance of making a right
decision because the test is based on one sample (large or small).
354 APPENDIX D
What does it mean to test a hypothesis at a 5 percent level of

significance?
Suppose we want to test a hypothesis about a population mean and the

sample size is large (n ≥ 30) so that the sample mean follows a normal
distribution. If the level of significance α is set at 5 percent, it means that
we will reject the null hypothesis if the difference between the sample
statistic (in this case, x ) and the hypothesized population mean µ is so
large that it would occur on the average only 5 or fewer times in every 100
samples. See Figure D.1. This figure shows that if the sample statistic falls
in the “do not reject” area, we will not reject the null hypothesis. On the
other hand, if the sample value falls in the rejection areas, we will reject
the null hypothesis. Rejecting the null hypothesis means that the alternate
hypothesis is true.
It is evident from the figure that by selecting a significance level α, the
areas of rejection and nonrejection are determined. In other words, we set
the boundaries that determine when to reject and when not to reject the
null hypothesis.
Note that the value of α is selected by the analyst and it must be deter-
mined before you conduct the test. If we don’t have a significance level α, we
don’t know when to reject and when not to reject a hypothesis.
Figure D.1 Rejection and Nonrejection Areas

APPENDIX D 355
In a hypothesis testing situation, there is always a possibility of mak-

ing one of the two types of errors (i.e., there is always a chance of reject-
ing a true null hypothesis; similarly, there is always a chance of accepting
a false null hypothesis). In hypothesis testing, we must consider the cost
and the risk involved in making a wrong decision. Also, we would like
to minimize the chance of making either a type I or type II error; there-
fore, it is desirable to have the probabilities of both type I and type II error
to be low.
In quality control, α is also known as producer’s risk because it de-
notes the probability of rejecting a good lot. The type II error β is referred
as the consumer’s risk as it indicates the probability of accepting a bad lot.
Testing a single population mean

Testing a population mean involves testing a one-sided or a two-sided
test. The hypothesis is stated as:
H0: μ = μ0 H0: μ ≥ μ0 H0: μ ≤ μ0

H1: μ ≠μ0 H1: μ < μ0 H1: μ > μ0
Two-tailed or two-sided test Left-tailed or left-sided test Right-tailed or right-sided test
Note that μ0 is the hypothesized value. There are three possible cases
for testing the population mean. The test statistic or the formulas used to
test the hypothesis are given below.
Case (1): Testing a single mean with known variance or known popula-
tion standard deviation σ and large sample: in this case, the sample mean
x follows a normal distribution and the test statistic is given as follows:
x −µ
z = (ii)
σ/ n
Case (2): Testing a single mean with unknown variance or unknown pop-
ulation standard deviation σ and large sample: in this case, the sample
mean x follows a normal distribution and the test statistic is given by
356 APPENDIX D
x −µ
z = (iii)
s/ n
Case (3): Testing a single mean with unknown variance or unknown pop-
ulation standard deviation σ and small (n < 30) sample. In this case, the
sample mean x follows a t-distribution and the test statistic is given by
x −µ
t n −1 = (iv)
s/ n
Note that s is the sample standard deviation and n is the sample size.
There are different ways of testing a hypothesis. These will be illus-
trated with examples.
Example D.2 Formulating the Correct Hypothesis
A hypothesis test can be formulated as a one-sided or a two-sided test. If

we have a one-sided test, it can be a left-sided or a right-sided test.
Example of a left-sided test
Suppose a tire manufacturer claims that the average mileage provided by

a certain type of tire is at least 60,000 miles. A product research group
has received some complaints in the past and wants to check the claim.
They are interested in testing whether the average mileage is below 60,000
miles. They would formulate their hypothesis as follows:
H 0 : µ ≥ 60, 000
H1 : µ < 60, 000
The hypothesis is formulated as shown above because the claim of at

least 60,000 miles is µ ≥ 60,000 and the opposite of this statement is µ <
60,000 miles. Since the null hypothesis is written with an “equal to” sign,
and this is the claim made by the manufacturer, the statement µ ≥ 60,000
APPENDIX D 357
is written under the null hypothesis, and the statement µ < 60,000 is
written under the alternate hypothesis.
Note that the alternate hypothesis is opposite of the null hypothesis.
This is an example of a left-sided test. The left-sided test will reject the
null hypothesis (H0) below a specified hypothesized value of µ.
The alternate hypothesis is also known as the research hypothesis. If you
are trying to establish a certain hypothesis, then it should be written as
the alternate hypothesis.
The statement about the null hypothesis contains the claim or the
theory. Therefore, rejecting a null hypothesis is a strong statement. This is
the reason that the conclusion of a hypothesis test is stated as “reject the
null hypothesis” or “do not reject the null hypothesis.”
Example of a right-sided test:
A car manufacturer has made a significant improvement in the fuel in-

jection system that is expected to provide an improved gas mileage. The
average mileage before the modification was 24 miles or less. The research
group expects that the modified system will provide significant improve-
ment in the gas mileage. The group would like to test the following hy-
pothesis to show the improvement:
H 0 : µ ≤ 24
H1 : µ > 24
Example of a two-sided test
A robot welder in an assembly line takes on average 1.5 minutes to finish a

welding job. If the average time taken to finish the job is higher or lower than
1.5 min, it will disrupt other activities along the production line. Since there
has been too much variation in the time it takes to perform the welding job
by the robot, the line supervisor wants to take a sample to check if the aver-
age time taken by the robot is significantly higher or lower than the average
of 1.5 minutes. The supervisor would be testing the following hypothesis:
H 0 : µ = 1.5
H1 : µ ≠ 1.5
358 APPENDIX D
Example D.3
A new production method will be implemented if a hypothesis test sup-

ports the conclusion that the new method results in reduced production
cost per hour. If the current average operating cost per hour is $600 or
more, write the appropriate hypothesis. Also, state and explain the type I
and type II errors in this situation.
A) Write the appropriate null and alternate hypotheses.

The appropriate hypotheses to test the claim:
H 0 : µ ≥ 600
H1 : µ < 600
B) State and explain the type I and type II errors in this situation.
Type I error: Reject H0: µ ≥ 600 and conclude that the average pro-
duction cost is less than $600 (µ < $600). Type II error would be to
conclude that the average operating cost is at least $600 when it is
not.
Example D.4
In a production line, automated machines are used to fill beverage cans

with an average fill volume of 16 ounces. If the mean weight falls above or
below this figure, the production line must be stopped and some remedial
action be taken. A quality control inspector samples 30 cans every hour,
opens and weighs the content, tests the appropriate hypothesis, and makes
a decision whether to shut down the line for making adjustments. Write
the appropriate hypothesis to be tested in this situation and perform the
hypothesis test. A significance level of α = 0.05 is selected for the test.
The sample results indicate a sample mean of 16.32 oz and the standard
deviation is assumed to be 0.8 oz.
APPENDIX D 359
Solution: For this problem, the given data are:
n = 30, α = 0.05
σ = 0.8, x = 16.32
1. State the null and alternate hypothesis
H 0 : µ = 16
H1 : µ ≠ 16
Note that this is a two-sided test.

2. Determine the sample size or use the given sample size
The given sample size is n = 30 (large sample)
3. Determine the appropriate level of significance (α) or use the given value
of significance, α
The given level of significance α = 0.05
4. Select the appropriate distribution and test statistic to perform the test
The sample size is large and the population standard deviation is
known; therefore, use normal distribution with the following test
statistic:
x −µ
z =
σ
n
5. Based on step 3, find the critical value or values and the area or areas of
rejection. Show the critical value(s) and the area or areas of rejection and
non rejection using a sketch
360 APPENDIX D
Figure D.2 Critical Values for a Two-Sided Test
This is a two-sided test. The level of significance α = 0.05 must be

split into two halves for a two-tailed test, with each tail area 0.025.
The critical value (z-value) for an area of 0.475 is 1.96 from the nor-
mal or z-table. The sketch is shown in Figure D.2.
6. Write the decision rule
Reject H0 if z > 1.96

or, if z < –1.96
7. Use the test data (sample data) and find the value of the test statistic
x −µ 16.32 − 16
Z = = = 2.19
σ/ n 0.8 / 30
8. Find out if the value of the test statistic is in rejection or non rejection
region; make appropriate decision and state your conclusion in terms of
the problem
z = 2.19 > Zcritical = 1.96; therefore, reject H0
There is an evidence of overfilling or underfilling. The line should be

shut down.
APPENDIX D 361
Example D.5: Testing the hypothesis using the p-value approach
In the z-value approach of testing the hypothesis, we compared the test

statistic value to the critical value of z to either reject or not to reject
the hypothesis. For example, in the previous example, the critical value
z was ±1.96 and the test statistic value was 2.19. Since the test statistic
value of 2.19 is greater than the critical value of +1.96, we rejected the
null hypothesis. In this method of testing the hypothesis, we compared
similar terms (test statistic value of z to the critical z-value). This method
of stating the conclusion in the hypothesis test requires a predefined level
of significance, which may not tell us if the computed test statistic value
is barely in the rejection region or it is far into the region. In other words,
the information may be inadequate sometimes.
To overcome this problem, another approach of testing the hypothesis
is suggested, which is widely used in testing hypothesis. This is known as
p-value (probability value) approach. This method compares a probabil-
ity to the given probability.
You may recall that to test a hypothesis you must have the level of
significance α, which is decided by the analyst before conducting the
test. This α is the same as the type I probability and is often called the
given level of significance. In a hypothesis testing situation, the type I
probability is given or known. We then calculate the p-value or the prob-
ability based on the sample data. This is also the observed level of signifi-
cance. We then compare the given level of significance α to the p-value
(observed level of significance) to test and draw the conclusion about the
hypothesis.
The p-value is the probability (assuming that the null hypothesis is true)
of getting the value of the test statistic at least as extreme as or more ex-
treme than the value actually observed. The p-value is the smallest level of
significance at which the null hypothesis can be rejected. A small p-value
for example, p = 0.05 or less, is a strong indicator that the null hypothesis
is not true. The smaller the value of p, the greater the chance that the null
hypothesis is false.
If the computed p-value is smaller than the given level of significance
α, the null hypothesis H0 is rejected. If the p-value is greater than α then
H0 is not rejected. For example, a p-value of 0.002 indicates that there
362 APPENDIX D
is less chance that H0 is true, while a p-value of 0.2356 indicates a less

likelihood of H0 being false. The p-value also provides us insight into the
strength of the decision. This tells us how confident we are in rejecting
the null hypothesis.
Example D.6: Testing the hypothesis using p-value approach
We will test the hypothesis using p-value for the following two-sided test:
H 0 : µ = 15
H1 : µ ≠ 15
The data for the problem: n = 50, x = 14.2, s = 5, α = 0.02

Decision rule for p-value approach
If p < α; reject H0
First, using the appropriate test statistic formula, calculate the test statistic
value.
Figure D.3 p-Values for the Example Problem

APPENDIX D 363
x −µ 14.2 − 15
Z = = = −1.13
s/ n 5 / 50
This test statistic value of z = –1.13 will be converted into a probabil-

ity that we call p-value. This is shown in Figure D.3.
Area corresponding to z = 1.13 is 0.3708 (from z-table).
Probability of z > 1.13 = 0.5 – 0.3708 = 0.1292

Probability of z < –1.13 = 0.5 – 0.3708 = 0.1292
For a two-sided test, the p-value is the sum of the above two values, that is,
0.1292+0.1292 = 0.2584. Since p = 0.2584 > α = 0.02, do not reject H0.
Hypothesis Testing Involving Two Population Means

Hypothesis Testing Involving Two Population Means
This section extends the concepts of hypothesis testing to two popula-

tions. Here we are interested in testing two population means. For ex-
ample, we may be interested in comparing the average salaries of male
and female employees for the same job or we may be interested in the dif-
ference between the average starting salaries of business and engineering
majors. In these cases, we would like to test whether the two population
means are equal or there is no difference between the two populations
(two-sided test). In other cases, we may want to test if one population is
larger or smaller than the other population (one-sided test). The hypoth-
esis testing procedures or steps are very similar to those for testing the
single mean but the data structure and the test statistic or the formulas
to test these hypotheses are different. In testing hypothesis involving two
populations, the samples will be drawn from both populations. The hy-
potheses tested are explained below.
364 APPENDIX D
Hypothesis testing for the equality of two means or the differences

between two population means
Basic Assumptions:
1. The populations are independent

2. The population variances are equal (σ 12 = σ 22 )
The hypothesis for testing the two means can be a two-sided test or
a one-sided test. The hypothesis is written in one of the following ways:
A) Test if the two population means are equal or there is no difference

between the two means: a two-sided test
H 0 : µ1 = µ2 or H 0 : µ1 − µ2 = 0
H1 : µ1 ≠ µ2 or H1 : µ1 − µ2 ≠ 0 (v)
B) Test if one population mean is greater than the other: a right-sided

test
H 0 : µ1 ≤ µ2 or H 0 : µ1 − µ2 ≤ 0
H1 : µ1 > µ2 or H1 : µ1 − µ2 > 0

(vi)
C) Test if one population mean is smaller than the other: a left-sided test
H 0 : µ1 ≥ µ2 or H 0 : µ1 − µ2 ≥ 0
H1 : µ1 < µ2 or H1 : µ1 − µ2 < 0 (vii)
Since we want to study two population means, the sampling distribu-

tion of interest is the sampling distribution of the difference between the
sample means ( x1 − x 2 ) and the test statistic is based on the information
(data) we have. The following three cases and test statistics are used to test
APPENDIX D 365
the means. To test two means, the test statistics are selected based on the
following cases:
Case 1: Sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are known
If the sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are known, then the sampling distribution of the dif-
ference between the sample means follows a normal distribution and the
test statistic is given by
( x1 − x 2 ) − ( µ1 − µ2 )
z = (viii)
σ 12 σ 22
+
n1 n2


σ 12 , σ 22 = variances of 1st and 2nd populations, respectively (known
in this case)
Case 2: Sample sizes n1 and n2 are large (≥ 30) and the population
variances σ 12 and σ 22 are unknown
If the sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are unknown, then the sampling distribution of the
difference between the sample means follows a normal distribution and
the test statistic is given by
( x1 − x 2 ) − ( µ1 − µ2 )
z = (ix)
s12 s 22
+
n1 n2


366 APPENDIX D
Case 3: Sample sizes n1 and n2 are small (< 30) and the population
variances σ 1 and σ 2 are unknown
2 2
If the sample sizes n1 and n2 are small ( < 30) and the population vari-
ances σ 1 and σ 2 are unknown, then the sampling distribution of the
2 2
difference between the sample means follows a t-distribution and the test
statistic is given by
( x1 − x 2 ) − ( µ1 − µ2 )
t n1 + n2 − 2 = (x)
1 1
S p2 ( + )
n1 n2


n1+ n2 –2 = degrees of freedom (df )
s 2p is the “pooled” or combined variance, given by
(n1 − 1)s12 + (n2 − 1)s 22

s 2p = (xi)
n1 + n2 − 2
Important Note:
In the equations (viii), (ix) and (x) the difference ( µ1 − µ2 ) is zero in most
cases. Also, these equations are valid under the following assumptions:
• The populations are independent and normally distributed

• The variances of the two populations are equal; that is, ( σ 12 = σ 22 )
The assumption that the two population variances are equal may not
be correct. In cases where the variances are not equal, the test statistic
formula for testing the difference between the two means is different.
APPENDIX D 367
Example D.6
Suppose that two independent random samples are taken from two pro-
cesses with equal variances and we would like to test the null hypothesis
that there is no difference between the means of two processes or the
means of the two processes are equal; that is,
H 0 : µ1 − µ2 = 0 or H 0 : µ1 = µ2
H1 : µ1 − µ2 ≠ 0 H1 : µ1 ≠ µ2

The data from the two processes are given below:

Sample 1 Sample 2
n1 = 80 n2 = 70
x 1 = 104 x 2 = 106
s1 = 8 . 4 s 2 = 7.6
α = 0.05
Note that when n1, n2 are large and σ1, σ2 are unknown, therefore use
normal distribution. The test statistic for this problem is
z =
( x1 − x 2 ) − ( µ1 − µ2 )
s12 s 22
+
n1 n2
Solution: The test can be done using four methods that are explained
below.
Method (1): z-Value Approach

Critical values: The critical values and the decision areas based on
α = 0.05 are shown in Figure D.4. Decision Rule:
Reject H0 if z > 1.96

or if z < –1.96
368 APPENDIX D
Figure D.4 Critical Values and Decision Areas
Test Statistic Value:
z =
( x1 − x 2 ) − ( µ1 − µ2 ) =
(104 − 106) − 0 = −1.53
s12 s 22
+ (8.4)2 + ( 7.6)2
n1 n2 80 70
The test statistic value z –1.53 > Zcritical = –1.96; do not reject H0.
Method (2): p-value Approach
Calculate the p-value and compare it to α. Note that in this method we

compare a probability to a probability; that is, we compare the given level
of significance α or the type I probability to the probability obtained
from the data (or the observed level of significance). The decision rule and
the procedure are explained below.
Decision Rule: If p ≥ α ; do not reject H0.

If p < α ; reject H0.
Calculating p-value: The p-value is calculated by converting the

test-statistic value into a probability. In method (1), we calculated the test
APPENDIX D 369
Figure D.5 p-Values
statistic value z. This value was –1.53 or z = –1.53. This test statistic value
is converted to a probability (see Figure D.5).
In the above figure, z = 1.53 is the test statistic value from method 1
above.
From the standard normal table, z = 1.53 corresponds to 0.4370. The
p-value is calculated as shown below.
Probability of z > 1.53 = 0.5 – 0.4370 = 0.0630

Probability of z < –1.53 = 0.5 – 0.4370 = 0.0630
In a two-sided test such as this one, the p-value is obtained by adding

the probabilities in both tails of the distribution. Thus,
p = 0.0630 + 0.0630 = 0.1260
Since p(= 0.1260) > α (= 0.05) , do not reject H0.
Testing Two Means for Dependent (Related) Populations or

Hypothesis Testing for Paired Samples (Paired t-Test)
In some situations, there may be a relationship between the data values of

the samples of two populations and the data values or the sample values
370 APPENDIX D
from one population may not be independent of the sample values from
the other population. The two populations may be considered dependent
in such cases.
In cases where the populations are considered related, the observa-
tions are paired to prevent other factors from inflating the estimate of the
variance. This method is used to improve the precision of comparisons
between means. The method of testing the difference between the two
means when the populations are related is also known as matched sample
test or the paired t-test.
We are interested in testing a two-sided or a one-sided hypothesis for
the difference between the two population means. The hypotheses can be
written as
H 0 : µd = 0 H 0 : µd ≤ 0 H 0 : µd ≥ 0
H1 : µd ≠ 0 H1 : µd > 0 H1 : µd < 0
Two-tailed or two-sided test Right tailed or right-sided test Left-tailed or left-sided test
Note: the difference d can be taken in any order (sample 1-sample 2) or

(sample 2-sample1).
Test Statistic: If the pairs of data values X1n and X2n are related and are not
independent, the average of the differences ( d ) follows a t-distribution
and the test statistic is given by
d − µd
t n −1 = (xii)
sd / n
where
d = average of the differences = ∑ di

n
Sd = standard deviation of the differences
(∑ d i )2
∑ (di − d ) 2 ∑ di2 − n
sd = =
n −1 n −1
n = number of observations (sample size)
t α = critical t-value from the t-table for (n – 1) degrees of freedom
n −1,
2
and appropriate α
APPENDIX D 371
The confidence interval given below can also be used to test the hypothesis
sd
d ±t α (xiii)
n −1, n
2
Summary
This section discussed three important topics that are critical to analyt-
ics. In particular, we studied sampling and sampling distribution, estima-
tion and confidence intervals, and hypothesis testing. Samples are used
to make inferences about the population and this can be done through
sampling distribution. The probability distribution of a sample statistic is
called its sampling distribution. We explained the central limit theorem
and its role in sampling, sampling distribution, and sample size deter-
mination. Besides sampling and sampling distribution, other key topics
covered included point and confidence interval estimates of means and
proportions.
Two types of estimates used in inferential statistics were discussed.
These estimates include (a) point estimates, which are single-value es-
timates of the population parameter, and (b) interval estimates or the
confidence intervals, which are a range of numbers that contain the
parameter with specified degree of confidence known as the confidence
level. Confidence level is a probability attached to a confidence interval
that provides the reliability of the estimate. In the discussion of estima-
tion, we also discussed the standard error of the estimates, the margin of
error, and the sample size determination.
We also discussed the concepts of hypothesis testing, which is directly
related to the analysis methods used in analytics. Hypothesis testing is
one of the most useful aspects of statistical inference. We provided several
examples on formulating and testing hypothesis about the population
mean and population proportion. Hypothesis tests are used in assessing
the validity of regression methods. They form the basis of many of the
assumptions underlying the analytical methods to be discussed in this
book.
Additional Readings
Albright, S. C, and W. Winston. 2015. Business Analytics: Data Analysis
and Decision Making. 5th ed. Boston, MA: Cengage Learning.
Albright, S. C., W. Winston, and C. Zappe. 2011. Data Analysis and Deci-
sion Making. 4th ed. Boston, MA: South Western Cengage Learning.
Anderson, D. R., D. J. Sweeny, T. A. William, J. D. Camm, and J. J.
Cochran. 2003. An Introduction to Management Science – Quantitative
Approaches to Decision Making. 10th ed. Boston, MA: South Western
Cengage Learning.
Benisis, A. (2010). Business Process Management: A Data Cube To Analyze
Business Process Simulation Data For Decision Making. Saarbrücken,
Germany: VDM Verlag Dr. Müller. p. 204. ISBN:978-3-639-22216-6.
Bowerman, B. L., R. T. O’Connell, and E. S. Murphree. 2017. Busi-
ness Statistics in Practice Using Data, Modeling, and Analytics. 8th ed.
New York, NY: McGraw-Hill Education.
Box, G. E. P., and G. M. Jenkins. 1976. Time Series Analysis: Forecasting
and Control. 2nd ed. San Francisco, CA: Wiley.
Camm, J. D., J. J. Cochran, M. J. Fry, J. W. Ohlmann, D. R. Anderson,
D. J. Sweeney, and T. A. Williams. 2015. Essentials of Business Analyt-
ics, 1st ed. Boston, MA: Cengage Learning.
Gould, F. J., C. P. Schmidt, J. H. Moore, and L. R. Weatherford. 1998.
Introductory Management Science – Decision Making with Spread
Sheets. 5th ed. Upper Saddle River, NJ: Prentice Hall.
Montgomery, D. C., and L. A. Johnson. 1976. Forecasting and Time Series
Analysis. New York, NY: McGraw Hill.
Russell, R. S., and Taylor, B. W. 2014. Operations and Supply Chain Man-
agement. In Operations Management, eds. W. Stevenson and J William.
Hoboken, NJ: McGraw Hill.
Sahay, A. 2016a. Applied Regression and Modeling – A computer Integrated
Approach. New York, NY: Business Expert Press.
374 ADDITIONAL READINGS
Sahay, A. 2016b. Business Analytics – A Data-Driven Decision Making

Approach for Business, Volume I. Salt Lake City, UT: Business Expert
Press.
Sahay, A. 2016c. Statistics & Data Analysis Concepts for Business With
Computer Applications. 4th ed. QMS Global LLC, A. M. T. Printing
Digital Solutions, Inc.
Sahay, A. 2017a. Data Visualization, Volume I – Recent Trends and
Applications Using Conventional and Big Data. New York, NY:

Business Expert Press.
Sahay, A. 2017b. Data Visualization, Volume II – Uncovering the Hidden
Pattern in Data Using Basic and New Quality Tools. New York, NY:
Business Expert Press.
Sahay, A. 2018. A Data-Driven Decision Making Approach for Business,
Volume I. New York, NY: Business Expert Press.
Sahay, A., and K. Metha. 2010. “Assisting Higher Education in Assess-
ing, Predicting, and Managing Issues Related to Student Success:
A Web-based Software using Data Mining and Quality Function
Deployment.” Academic and Business Research Institute Conference, Las
Vegas, 2010.
Schniederjans, M. J., C. M. Starkey, and D. G. Schniederjans. 2014.
Business Analytics Principles, Concepts, and Applications with SAS.
Upper Saddle River, NJ: Pearson Education, Inc.
Stephanie, D. H. E. 2017. Effective Data Visualization. United Kingdom,
India, Singapore: Sage Publications.
Swaroop, P., and B. Golden. 2009. Data Mining: Introduction, and a
Health Care Application. College Park, MD: University of Maryland.
Tufte, E. R. 2001. The Visual Display of Quantitative Information.
Cheshire, CT: Graphics Press.
Online References
The list of online research and related websites are as follows:
[1] Geisser, S. (1993). Predictive Inference: An Introduction. Chapman & Hall.
ISBN 978-0- 412-03471-8
[2] Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin,
Germany: Springer. ISBN 97387-31073-2
ADDITIONAL READINGS 375
[3] Bishop, C. M. (2006). Pattern Recognition and Machine Learning.

Berlin, Germany: Springer. ISBN 978-0-387-31073-2
[4] https://en.wikipedia.org/wiki/Predictive_analytics - Predictive Analytics
[5] https://en.wikipedia.org/wiki/Machine_learning
[6] https://en.wikipedia.org/wiki/Reinforcement_learning#cite_note-
kaelbling-2; https://en.wikipedia.org/wiki/Artificial_neural_network
[7] https://en.wikipedia.org/wiki/Artificial_neural_network
[8] https://en.wikipedia.org/wiki/Deep_learning
[9] https://en.wikipedia.org/wiki/Artificial_intelligence
[10] https://en.wikipedia.org/wiki/Business_intelligence#cite_note-14
[12] https://en.wikipedia.org/wiki/Business_intelligence#cite_note-power-7
[13] https://en.wikipedia.org/wiki/Business_process_management
[14] https://en.wikipedia.org/wiki/Business_performance_management
[16] https://en.wikipedia.org/wiki/Text_mining
[17] https://en.wikipedia.org/wiki/Natural_language#cite_note
-john_lyons-1
[18] https://en.wikipedia.org/wiki/Analytics
[19] http://searchbusinessanalytics.techtarget.com/definition/advanced
-analytics
[20] https://en.wikipedia.org/wiki/Machine_learning#cite_note-aima-20
[21] https://go.christiansteven.com/business-analytics-vs.-business
-intelligence-heres-the-difference
[22] https://www.sisense.com/blog/whats-the-difference-between-business
-intelligence-and-business-analytics/
[23] http://searchbusinessanalytics.techtarget.com/definition/key
-performance-indicators-KPIs
[24] http://searchsqlserver.techtarget.com/definition/data-warehouse
[25] https://en.wikipedia.org/wiki/Deep_learning
About the Author
Dr. Amar Sahay is a professor of decision
sciences engaged in teaching, research,
consulting, and training. He holds a BS in
production engineering (BIT, India), MS
in industrial engineering, and a PhD in
mechanical engineering --both from the
University of Utah, USA. He has taught
and is teaching at several institutions in
Utah, including the University of Utah
(School of Engineering and Manage-
ment), SLCC, Westminster College, and
others. Amar is a certified Six Sigma Master Black Belt and is also lean
manufacturing/lean management certified. He has contributed a number
of research papers in national and international journals/proceedings to
his credit. Amar has authored around 10 books in the areas of data visual-
ization, business analytics, Six Sigma, statistics and data analysis, model-
ing, and applied regression. He is also associated with QMS Global LLC,
a company engaged in data visualization, analytics, quality, lean six sigma,
manufacturing, and systems analysis services. Amar is a senior member
of the Industrial & Systems Engineers, the American Society for Quality
(ASQ), and Data Science.
Index
Advanced analytics, 39, 48 in modern business decision, 4–5
Analysis of variance (ANOVA), 86, 98 overall process, 25
Analytics. See also specific analytics overview of, 263
applications of, 43–46 statistical analysis, 265, 267
and business analytics, 2–4 tools of, 6–16, 31
defined, 2, 34–35 types of, xi–xii, 3, 5–6, 24
purpose of, 46 Business intelligence (BI), 5, 263,
types of, 46–48 265–266
ANN. See Artificial neural networks with analytics, 39
Anomaly detection, 253 applications of, 40–43, 49–50
ANOVA. See Analysis of variance versus business analytics, 29–32,
Artificial intelligence, 86 51–54
Artificial neural networks (ANN), 13, in companies, 48–49
86, 259–260 defined, 29, 38
Association learning, 251 origin of, 38–39
Associative forecasting techniques, overview of, 23–24
198–199, 236 success factors for
Autocorrelation, 166–167 implementation, 51
Average forecast error. See Mean error and support systems, 40
tools of, 31, 41
BA. See Business analytics Business process management (BPM),
Bayes’ theorem, 291 42–44
BI. See Business intelligence Business reporting, 38–39, 42
Bias, 207–208
Big data, 3, 6, 39 Central limit theorem, 326–328
BPM. See Business process Classical approach of probability,
management 285–286
Business analytics (BA). See also Classification technique, 253
specific analytics versus clustering technique,
analytics and, 2–4 254–257
applications and implementation Clustering technique, 253–254
of, 16–18 versus classification technique,
broad areas of, 263–264 254–257
versus business intelligence, 29–32, Coefficient of correlation (r),
51–54 125–126
certification and online courses detecting multicollinearity by
in, 277 calculating, 169–170
defined, xi, 3, 35 Coefficient of determination (r2)
future of, 276–277 multiple regression model, 156–157
introduction to, 2 simple regression model, 122–125
models of, 269–272 using EXCEL, 129–132
380 INDEX
Combinations, 284–285 Data summarization, 258

Confidence interval estimate, 78–79, Data transformation, 36, 37, 248
331–338 Data warehouse, 37
on difference between two means, Decision making, 283–318
344–349 Decision tree algorithm, 253
Confirmatory data analysis, 268 Deep learning, 13, 86, 260–261
Continuous random variables, Descriptive analytics, xii, 3,
296–300 6–7, 269
Correlation analysis, 80–81, 104 applications of, 16–18
CRM. See Customer relationship buying pattern of online customers
management in large department store, case
Customer relationship management study, 58–69
(CRM), 17 objective of, 24
Cyclical pattern, 201 overview of, 57–58
versus predictive analytics, 71–72,
DA. See Data analytics 249, 251
Dashboards, 38, 67–68 tools of, 7–8, 26
Data, 36–37 Descriptive statistics, 265, 267
scatter plot of, 118 Discrete random variable, 292–296
Data analysis, 35 Double moving average technique,
Data analytics (DA), 3, 5, 267–268 202–203, 224–228
applications of, 37–38, 269 Dummy or indicator variables, 91
defined, 35–36 example for, 185–192
future of, 276–277 in multiple regression, 181–182
prerequisites to, 36–37 at three levels, 183–185
requirements of, 36 at two levels, 182–183
Data cleaning. See Data cleansing
Data cleansing, 36, 37, 247 EDA. See Exploratory data analysis
Data-driven models, predictive Enterprise reporting. See Business
analytics, 72, 74 reporting
Data integration, 248 Equally likely events, 284
Data mining, 4–5, 8, 17–18, 43, 249 probabilities of, 287
application areas of, 242, 255 Estimation, 78–79, 330, 331
defined, 241 confidence interval, 331–333
descriptive versus predictive, 249, point, 331
251 Event, 283
introduction to, 240–241 EXCEL, 118
and knowledge discovery in coefficient of determination using,
databases, 243–249 129–132
and machine learning, 11–12, 243, multiple regression model using,
258–259 147–148
origin and areas of interaction, 243 second-order model using, 177
in predictive analytics, 86, 99–100 simple regression model using,
reasons for, 241 127–129
tasks, 249–251 Exhaustive events, 284
time series analysis, 255, 257 Expected value. See Mean
Data quality, 36–37 Experiment, 283
Data selection, 248 Exploratory data analysis (EDA), 268
INDEX
381
F-distribution, 317–318 estimation, 331–332

F-test, 157–161 sample size determination, 338–344
Filling slots. See Multiple-step tools of, 330
experiment INFORMS. See Institute of
First-order model, 172 Operations Research and
Fitted regression line, interpretation Management Science
of, 121 Institute of Operations Research
Forecast error, defined, 206 and Management Science
Forecasting (INFORMS), 24
accuracy measurement, 205–208 Internet of Things (IOT), 6–7
associative, 198 Interval estimate, 331–332
based on averages, 209–210 IOT. See Internet of Things
classification of, 197
common patterns in, 200–205 Judgmental forecasting. See
comparison for best method Qualitative forecasting
selection, 228–236
double moving average technique, K-Nearest Neighbor algorithm, 253
224–228 Key performance indicators (KPIs),
elements, 199 48
features of, 198 Knowledge discovery in databases
introduction to, 196–197 (KDD). See Data mining
models and techniques, 8–9, 85, KPIs. See Key performance indicators
93–97, 199–200
Naïve method, 208–209 Least squares method, 108, 110–114
process steps, 199 illustration of, 114–117
simple exponential smoothing Least squares multiple regression
method, 219–224 model, 142–144
simple moving average method, Linear regression, 105
210–217 regression model, 105–108
time series, 198 Linear trend model, 199, 204
weighted moving average method, Log transformation, 92
217–219 Logic-driven models, predictive
analytics, 72–73
Horizontal or constant pattern, Logistic regression model, 92
200, 202
Hypothesis, defined, 351 Machine learning, 11, 86, 100
Hypothesis testing, 79–80, 351–355 applications of, 259–261
each of three independent variables and data mining, 11–12, 243,
is significant, 162–164 258–259
on individual regression coefficients, tasks, 12–13, 259
161–162 MAE. See Mean absolute error
p-value approach, 161, 164–166 MAPE. See Mean absolute percentage
single population mean, 355–363 error
two population means, 363–371 Matrix plots, 148–153
Mean, 294
Inferential statistics, 267, 329 standard deviation of, 325–326
confidence intervals. See Confidence Mean absolute deviation (MAD). See
interval estimate Mean absolute error (MAE)
382 INDEX
Mean absolute error (MAE), 206–207 Naïve forecasting method, 208–209

Mean absolute percentage error Natural language, 45
(MAPE), 207 Neural network (NN). See Artificial
Mean error, 206 neural networks (ANN)
Mean squared error (MSE), 207 Non-mutually exclusive events,
Metrics, 49–50 addition law for, 287
MINITAB, 7, 117 Nonlinear regression model, 89–90
multiple regression model using, Normal distribution, 300–302
147–148 assessing normality, 308, 310–314
second-order model using, 174–177 versus t-distribution, 315–317
simple regression model using,
132–133 OLAP. See Online analytical
Model building processing
key features of, 181 OLTP. See Online transaction
overview of, 171 processing
quadratic model. See Second order Online analytical processing (OLAP),
model 29, 37, 39, 42
with qualitative independent Online transaction processing
variables. See Dummy or (OLTP), 42
indicator variables Operations management, 42–43
with single quantitative Optimization models, 274, 274–275
independent variable, Ordinary language. See Natural
171–173 language
Modeling, xv Outlier detection. See Anomaly
MSE. See Mean squared error detection
Multicollinearity, 166–167
detecting, 168–170 P-value approach, 161
effects of, 167–168 hypothesis testing using, 361–363,
Multiple regression model, 88, 368–369
140–142, 153–154 Permutations, 284
assumptions of, 146–147 Point estimate, 331
autocorrelation, 166–167 Predictive analytics, xiii, 3–4, 8,
coefficient of determination, 271–272
156–157 applications of, 10–13, 17,
computer analysis of, 147–148 251–254
F-test, 157–161 artificial intelligence, 272–273
hypothesis tests in, 157 background and prerequisites to,
hypothesis tests. See Hypothesis 274–276
testing data-driven models, 72, 74
key features of, 170–171 deep learning, 273
least squares, 142–144 versus descriptive analytics, 71–72,
multicollinearity. See 249, 251
Multicollinearity logic-driven models, 72–73
standard error of estimate, 155 machine learning, 273
with two quantitative variables, objective of, 24
145–146 prerequisites and background for,
Multiple-step experiment, 284 9–10, 74–81
Mutually exclusive events, 283 tools of, xiv, 8–9, 11–14, 27,
addition law for, 286 83–100, 251–254
INDEX
383
Predictive modeling. See Predictive Regression line

analytics estimated equation of, 108–109,
Prescriptive analytics, xiv–xv, 14, 274, 119–120
274–276 fitted, interpretation of, 121
applications of, 18 making predictions using, 121
objective of, 24 Regression models, 8, 85, 92
tools of, 14–16, 28 implementation steps and strategy
Probability, 75 for, 194
Probability density function, 300 overview of, 192–193
Probability distributions, 76–77, 291 Reinforcement learning, 12
Probability plots, 306–310 Relative frequency approach, 286
Probability theory, 283–291 Reporting. See Business reporting
Process mining, 44 Risk analysis, 48
Quadratic model. See Second order Sample size determination, 338–344

model Sample space, 283
Qualitative forecasting, 197 Sampling, 77, 319–321
Qualitative independent variables. See Sampling distribution, 77–78,
Dummy or indicator variables 319–321, 321–325
Quantitative forecasting, 197 SAS, 7
Scatter plots, 148–149, 151
R, 7, 127 Seasonal patterns, 200, 204
Random fluctuations, 201 trend and, 201
Random variable, 292 Second order model, 172, 203, 205
continuous, 296–300 analysis of computer results,
discrete, 292–296 177–180
Raw data, 36–37 examples for, 173–174, 180
Reciprocal transformation, 92 residual plots for, 175–177
Regression analysis. See Regression using EXCEL, 177
models using MINITAB, 174–175
assumptions of, 136–140 Simple exponential smoothing
coefficient of correlation, 125–126 method, 219–224
introduction to, 104 Simple moving average method,
least squares method, 108, 110–117 210–217
linear regression, 105–108 Simple regression model, 87
model adequacy test, 136 coefficient of determination,
model building. See Model building 122–125, 129–132
multiple regression. See Multiple main features of, 126
regression model problem analysis of, 117–118
output analysis, 134–136 standard error of estimate, 122
regression line. See Regression line using EXCEL, 127–129
scatter plot of data, 118 using MINITAB, 132–133
simple regression. See Simple Simulation models, 48, 249, 255
regression model Single quantitative independent
standard error of estimate, 122 variable, 171–173
using EXCEL, 127–129 Stable or constant process. See
using MINITAB, 132–133 Horizontal or constant pattern
Regression equation, 154 Standard deviation, 294
interpreting, 155 of mean, 325–326
384 INDEX
Standard error of estimate, 122, 155 Time series analysis, 255, 257
Standard normal distribution, 302–306 Time series forecasting, 198
Stata, 127 Tracking signal, 207–208
Statistical analysis, 265 Trend, 200, 202–203
data analytics, 267–268 forecasting data with, 224–228
descriptive statistics, 265, 267 and seasonal patterns, 201
inferential statistics, 267
Statistical dependence, 289–290 Unsupervised learning, 12
Statistical independence, 288–289
Statistical inference, 321. See also Variables, exploring relationships, 72
Inferential statistics Variance, 294
Subjective probability, 286 Variance inflation factor (VIF),
Supervised learning, 12 detecting multicollinearity
using, 168–169
t-distribution, 314–315 VIF. See Variance inflation factor
versus normal distribution,
315–317 Web analytics, 47–48
t-test, 129–132, 161–162 Weighted moving average method,
Text analytics, 45–46 217–219
Text data mining. See Text mining
Text mining, 44–45 z-value approach, hypothesis testing,
Third order model, 172–173 367–368
OTHER TITLES IN OUR BIG DATA, BUSINESS ANALYTICS,
AND SMART TECHNOLOGY COLLECTION
Mark Ferguson, University of South Carolina, Editor
• Business Intelligence and Data Mining by Anil Maheshwari

• Data Mining Models by David L. Olson
• Big Data War: How to Survive Global Big Data Competition by Patrick Park
• Analytics Boot Camp: Basic Analytics for Business Students and Professionals
by Linda Herkenhoff and John Fogli
• World Wide Data: The Future of Digital Marketing, E-Commerce, and Big Data
by Alfonso Asensio
• Data Mining Models, Second Edition by David L. Olson
• Location Analytics for Business: The Research and Marketing Strategic Advantage
by David Z. Beitz
• Business Analytics, Volume I: A Data-Driven Decision Making Approach for Business
by Amar Sahay
• Introduction to Business Analytics by Majid Nabavi and David L. Olson
• New World Technologies: 2020 and Beyond by Errol S. van Engelen
Announcing the Business Expert Press Digital Library

Concise e-books business students need for classroom and research
This book can also be purchased in an e-book collection by your library as
• a one-time purchase,
• that is owned forever,
• allows for simultaneous readers,
• has no restrictions on printing, and
• can be downloaded as PDFs from within the library community.
Our digital library collections are a great solution to beat the rising cost of textbooks. E-books
can be loaded into their course management systems or onto students’ e-book readers.
The Business Expert Press digital libraries are very affordable, with no obligation to buy in
future years. For more information, please visit www.businessexpertpress.com/librarians.
To set up a trial in the United States, please email sales@businessexpertpress.com.
B ig D ata , B usiness A nalytics , and S mart T echnology C ollection
SAHAY
Mark Ferguson, Editor
BUSINESS ANALYTICS,
BUSINESS ANALYTICS, VOLUME II VOLUME II
A Data-Driven Decision-Making Approach for Business
Amar Sahay, PhD A Data-Driven Decision-Making

This business analytics (BA) text discusses the models based on fact-based data to measure past business Approach for Business
performance to guide an organization in visualizing and predicting future business performance and outcomes.
It provides a comprehensive overview of analytics in general with an emphasis on predictive analytics. Given
the booming interest in analytics and data science, this book is timely and informative. It brings many terms,
tools, and methods of analytics together.
The first three chapters provide an introduction to BA, importance of analytics, types of BA–descriptive,
predictive, and prescriptive–along with the tools and models. Business intelligence (BI) and a case on
BUSINESS ANALYTICS, VOLUME II

descriptive analytics are discussed. Additionally, the book discusses the most widely used predictive models,
including regression analysis, forecasting, data mining, and an introduction to recent applications of predictive
analytics–machine learning, neural networks, and artificial intelligence. The concluding chapter discusses the
current state, job outlook, and certifications in analytics.
Dr. Amar Sahay is a professor of decision sciences engaged in teaching, research,

consulting, and training. He holds a BS in production engineering (BIT, India), MS in industrial
engineering, and a PhD in mechanical engineering--both from the University of Utah, USA.
He has taught and is teaching at several institutions in Utah, including the University of Utah
(School of Engineering and Management), SLCC, Westminster College, and others. Amar is
a certified Six Sigma Master Black Belt and is also lean manufacturing/lean management
certified. He has contributed a number of research papers in national and international journals/proceedings to
his credit. Amar has authored around 10 books in the areas of data visualization, business analytics, Six Sigma,
statistics and data analysis, modeling, and applied regression. He is also associated with QMS Global LLC, a
company engaged in data visualization, analytics, quality, lean six sigma, manufacturing, and systems analysis
services. Amar is a senior member of the Industrial & Systems Engineers, the American Society for Quality (ASQ),
and Data Science.
AMAR SAHAY, PhD

Amar Sahay - Business Analytics, Volume II - A Data Driven Decision Making Approach For Business-Business Expert Press (2019) PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Amar Sahay - Business Analytics, Volume II - A Data Driven Decision Making Approach For Business-Business Expert Press (2019) PDF

Uploaded by

Copyright:

Available Formats

B ig D ata , B usiness A nalytics , and S mart T echnology C ollection

Amar Sahay, PhD A Data-Driven Decision-Making

BUSINESS ANALYTICS, VOLUME II

Dr. Amar Sahay is a professor of decision sciences engaged in teaching, research,

AMAR SAHAY, PhD

Finally, I found the incorporation of practical examples to make the

Volume II (Predictive Analytics)

Amar Sahay, PhD

All rights reserved. No part of this publication may be reproduced, stored

First published in 2020 by

ISBN-13: 978-1-63157-479-5 (paperback)

Collection ISSN: 2333-6749 (print)

Cover and interior design by S4Carlisle Publishing Services Private Ltd.,

First edition: 2020

Printed in the United States of America.

BA is a data-driven decision-making approach that uses statistical

1. Descriptive analytics involves the use of descriptive statistics, in-

Besides the descriptive statistics tools, an understanding of a

these tools is critical in understanding and applying inferential statis-

Highlight of This Book: Business

1. Predictive Analytics: As the name suggests, predictive analytics

The predictive models also involve a class of time series analysis,

Other Models and Tools Used in Predictive Modeling

Data Mining and Advanced Data Analysis

2. Prescriptive Analytics: Prescriptive analytics is concerned with op-

• Business Analytics: A Data-Driven Decision-Making Approach

The first volume is available through amazon (www.amazon.com).

Chapter 1: Business Analytics (BA) at a Glance

Salt Lake City, UTAH, U.S.A.

I thank my parents who always emphasized the importance of what

• Descriptive Analytics: Graphical and Numerical Methods and

▪▪ Data Mining, Regression Models, and Time Series

and Deep Learning

Introduction to Business Analytics—What Is It?

Analytics and Business Analytics

and statistical analysis. It uses descriptive, predictive, and prescriptive

Business analytics is a data-driven decision-making approach that uses

BA has three broad categories—descriptive, predictive, and prescrip-

Predictive modeling uses different types of regression models to pre-

Business Analytics and Its Importance in Modern

What is happening and why did something happen?

BA uses statistical analysis and predictive modeling to establish trends,

behavior. Businesses use data mining to perform market analysis to iden-

Types of Business Analytics

Tools of Business Analytics

Descriptive Analytics: Graphical, Numerical Methods, and Tools

Descriptive analytics involves the use of descriptive statistics, including

Figure 1.1 Tools of descriptive analytics

A detailed treatment of the topics in Figure 1.1 are provided in

As the name suggests, predictive analytics is the application of predictive

Most Widely Used Predictive Analytics Models

Data mining techniques are used to extract useful information from

Figure 1.2 Tools of predictive analytics

Background and Prerequisites to Predictive

Figure 1.3 Prerequisite to predictive analytics

Other Areas Associated with Predictive Analytics

Figure 1.4 Recent applications and tools of predictive

Recent Applications and Tools of Predictive Modeling

In the broad area of data and predictive analytics, machine learning is a

Machine Learning and Data Mining

knowledge. Data mining, which is knowledge discovery from the data

Machine Learning Tasks

• Supervised learning: The computer is presented with example in-