You are on page 1of 67

Practical Business Analytics Using R

and Python 2nd Edition Umesh R.


Hodeghatta
Visit to download the full and correct content document:
https://ebookmass.com/product/practical-business-analytics-using-r-and-python-2nd-
edition-umesh-r-hodeghatta/
Umesh R. Hodeghatta and Umesha Nayak

Practical Business Analytics Using R


and Python
Solve Business Problems Using a Data-driven
Approach
2nd ed.
Umesh R. Hodeghatta, Ph.D
South Portland, ME, USA

Umesha Nayak
Bangalore, Karnataka, India

ISBN 978-1-4842-8753-8 e-ISBN 978-1-4842-8754-5


https://doi.org/10.1007/978-1-4842-8754-5

© Umesh R. Hodeghatta, Ph.D and Umesha Nayak 2017, 2023

Apress Standard

The use of general descriptive names, registered names, trademarks,


service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been
made.

This Apress imprint is published by the registered company APress


Media, LLC, part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
Foreword
We live in an era where mainstream media, business literature, and
boardrooms are awash in breathless hype about the data economy, AI
revolution, Industry/Business 4.0, and similar terms that are used to
describe a meaningful social and business inflection point. Indeed,
today’s discourse describes what seems to be an almost overnight act of
creation that has generated a new data-centric paradigm of societal and
business order that, according to the current narrative, is without
precedent and requires the creation of an entirely new set of
practitioners and best practices out of thin air. In this brave new world,
goes the narrative, everyone has already been reinvented as a “data-
something” (data scientist, data analyst, data storyteller, data citizen,
etc.) with the associated understanding of what that means for
themselves and for business. These newly minted “data-somethings”
are already overhauling current practice by operationalizing data-
driven decision-making practices throughout their organizations. They
just don’t know it yet.

Look under the covers of most operating organizations, however,


and a different picture appears. There is a yawning chasm of skillsets,
common language, and run-rate activity between those operating in
data-centric and data-adjacent roles and those in many operating
functions. As such, in many organizations, data-centric activity is
centered in only a few specialized areas or is performed only when
made available as a feature in the context of commercial, off-the-shelf
software (COTS). Similarly, while there is certainly more data volume
and more variation in the said data and all of that varied data is arriving
at a much higher velocity than in past, organizations often do not
possess either the infrastructure sophistication or the proliferation of
skillsets to validate whether the data is any good (veracity of data) and
actually use it for anything of value. And with the exception of a few
organizations for which data is their only or primary business, most
companies use data and associated analytic activities as a means to
accomplish their primary business objective more efficiently, not as an
end in and of itself.
A similar discontinuity occurred a few decades ago, with the
creation of the Internet. In the years that followed, entire job categories
were invented (who needed a “webmaster” or “e-commerce developer”
in 1990?), and yet it took almost a decade and a half before there was a
meaningful common language and understanding between
practitioners and operational business people at many organizations.
The resulting proliferation of business literature tended to focus on
“making business people more technical” so they could understand this
strange new breed of practitioners who held the keys to the new world.
And indeed, that helped to a degree. But the accompanying reality is
that the practitioners also needed to be able to understand and
communicate in the language of businesses and contextualize their
work as an enabler of business rather than the point of business.
Academia and industry both struggled to negotiate the balance
during the Internet age and once again struggle today in the nascent
data economy. Academia, too often erred on the side of theoretical
courses of study that teach technical skills (mathematics, statistics,
computer science, data science) without contextualizing the
applications to business, while on the other hand, industry rushes to
apply techniques they lack the technical skill to properly understand to
business problems. In both groups, technologists become overly
wedded to a given platform, language, or technique at the expense of
leveraging the right tool for the job. In all cases, the chasm of skills and
common language among stakeholders often leads to either or both of
incorrect conclusions or under-utilized analytics. Neither is a good
outcome.
There is space both for practitioners to be trained in the practical
application of techniques to business context and for business people to
not only understand more about the “black box” of data-centric activity
but be able to perform more of that activity in a self-service manner.
Indeed, the democratization of access to both high-powered computer
and low-code analytical software environments makes it possible for a
broader array of people to become practitioners, which is part of what
the hype is all about.
Enter this book, which provides readers with a stepwise walk-
through of the mathematical underpinnings of business analytics
(important to understand the proper use of various techniques) while
placing those techniques in the context of specific, real-world business
problems (important to understand the appropriate application of
those techniques). The authors (both of whom have longstanding
industry experience together, and one of whom is now bringing that
experience to the classroom in a professionally oriented academic
program) take an evenhanded approach to technology choices by
ensuring that currently fashionable platforms such as R and Python are
represented primarily as alternatives that can accomplish equivalent
tasks, rather than endpoints in and of themselves. The stack-agnostic
approach also helps readers prepare as to how they might incorporate
the next generation of available technology, whatever that may be in the
future.
As would-be practitioners in business, I urge you to read this book
with the associated business context in mind. Just as with the dawn of
the Internet, the true value of the data economy will only begin to be
realized when all the “data-somethings” we work with act as
appropriately contextualized practitioners who use data in the service
of the business of their organizations.

Dan Koloski
Professor of the Practice and Head of Learning Programs Roux Institute
at Northeastern University
October 2022
Dan Koloski is a professor of the practice in the analytics program
and director of professional studies at the Roux Institute at
Northeastern University.
Professor Koloski joined Northeastern after spending more than 20
years in the IT and software industry, working in both technical and
business management roles in companies large and small. This
included application development, product management and
partnerships, and helping lead a spin-out and sale from a venture-
backed company to Oracle. Most recently, Professor Koloski was vice
president of product management and business development at Oracle,
where he was responsible for worldwide direct and channel go-to-
market activities, partner integrations, product management,
marketing/branding, and mergers and acquisitions for more than $2
billion in product and cloud-services business. Before Oracle, he was
CTO and director of strategy of the web business unit at Empirix, a role
that included product management, marketing, alliances, mergers and
acquisitions, and analyst relations. He also worked as a freelance
consultant and Allaire-certified instructor, developing and deploying
database-driven web applications.
Professor Koloski earned a bachelor’s degree from Yale University
and earned his MBA from Harvard Business School in 2002.
Preface
Business analytics, data science, artificial intelligence (AI), and machine
learning (ML) are hot words right now in the business community.
Artificial intelligence and machine learning systems are enabling
organizations to make informed decisions by optimizing processes,
understanding customer behavior, maximizing customer satisfaction,
and thus accelerating overall top-line growth. AI and machine learning
help organizations by performing tasks efficiently and consistently, thus
improving overall customer satisfaction level.
In financial services, AI models are designed to help manage
customers’ loans, retirement plans, investment strategies, and other
financial decisions. In the automotive industry, AI models can help in
vehicle design, sales and marketing decisions, customer safety features
based on driving patterns of the customer, recommended vehicle type
for the customer, etc. This has helped automotive companies to predict
future manufacturing resources needed to build, for example, electric
and driverless vehicles. AI models also help them in making better
advertisement decisions.
AI can play a big role in customer relationship management (CRM),
too. Machine learning models can predict consumer behavior, start a
virtual agent conversation, and forecast trend analysis that can improve
efficiency and response time.
Recommendation systems (AI systems) can learn users’ content
preferences and can select customers’ choice of music, book, game, or
any items the customer is planning to buy online. Recommendation
systems can reduce return rates and help create better targeted content
management.
Sentiment analysis using machine learning techniques can predict
the opinions and feelings of users of content. This helps companies to
improve their products and services by analyzing the customers’
reviews and feedback.
These are a few sample business applications, but this can be
extended to any business problem provided you have data; for example,
an AI system can be developed for HR functions, manufacturing,
process engineering, IT infrastructure and security, software
development life cycle, and more.
There are several industries that have begun to adopt AI into their
business decision process. Investment in analytics, machine learning,
and artificial intelligence is predicted to triple in 2023, and by 2025, it
is predicted to become a $47 billion market (per International Data
Corp.). According to a recent research survey in the United States,
nearly 34 percent of businesses are currently implementing or plan to
implement AI solutions in their business decisions.
Machine learning refers to the learning algorithm of AI systems to
make decisions based on past data (historical data). Some of the
commonly used machine learning methods include neural networks,
decision trees, k-nearest neighbors, logistic regression, cluster analysis,
association rules, deep neural networks, hidden Markov models, and
natural language processing. Availability and abundance of data, lower
storage and processing costs, and efficient algorithms have made
machine learning and AI a reality in many organizations.
AI will be the biggest disruptor to the industry in the next five years.
This will no doubt have a significant impact on the workforce. Though
many say AI can replace a significant number of jobs, it can actually
enhance productivity and improve the efficiency of workers. AI systems
can help executives make better business decisions and allow
businesses to work on resources and investments to beat the
competition. When decision-makers and business executives make
decisions based on reliable data and recommendations arrived at
through AI systems, they can make better choices for their business,
investments, and employees thus enabling their business to stand out
from competition.
There are currently thousands of jobs posted on job portals in
machine learning, data science, and AI, and it is one of the fastest-
growing technology areas, according to the Kiplinger report of 2017.
Many of these jobs are going unfilled because of a shortage of qualified
engineers. Apple, IBM, Google, Facebook, Microsoft, Walmart, and
Amazon are some of the top companies hiring data scientists in
addition to other companies such as Uber, Flipkart, Citibank, Fidelity
Investments, GE, and many others including manufacturing, healthcare,
agriculture, and transportation companies. Many open job positions are
in San Jose, Boston, New York, London, Hong Kong, and many other
cities. If you have the right skills, then you can be a data scientist in one
of these companies tomorrow!
A data scientist/machine learning engineer may acquire the
following skills:
Communication skills to understand and interpret business
requirements and present the final outcome
Statistics, machine learning, and data mining skills
SQL, NoSQL, and other database knowledge
Knowledge of accessing XML data, connecting to databases, writing
SQL queries, reading JSON, reading unstructured web data, accessing
big data files such as HDFS, NoSQL MongoDB, Cassandra, Redis, Riak,
CouchDB, and Neo4j
Coding skills: Python, R or Java, C++
Tools: Microsoft Azure, IBM Watson, SAS
This book aims to cover the skills required to become a data
scientist. This book enables you to gain sufficient knowledge and skills
to process data and to develop machine learning models. We have made
an attempt to cover the most commonly used learning algorithms and
developing models by using open-source tools such as R and Python.
Practical Business Analytics Using R and Python is organized into five
parts. The first part covers the fundamental principles required to
perform analytics. It starts by defining the sometimes confusing
terminologies that exist in analytics, job skills, tools, and technologies
required for an analytical engineer, before describing the process
necessary to execute AI and analytics projects. The second and
subsequent chapters cover the basics of math, probability theory, and
statistics required for analytics, before delving into SQL, the business
analytics process, exploring data using graphical methods, and an in-
depth discussion of how to evaluate analytics model performance.
In Part II, we introduce supervised machine learning models. We
start with regression analysis and then introduce different
classification algorithms, including naïve Bayes, decision trees, logistic
regression, and neural networks.
Part III discusses time-series models. We cover the most commonly
used models including ARIMA.
Part IV covers unsupervised learning and text mining. In
unsupervised learning, we discuss clustering analysis and association
mining. We end the section by briefly introducing big data analytics.
In the final part, we discuss the open-source tools, R and Python,
and using them in programming for analytics. The focus here is on
developing sufficient programing skills to perform analytics.

Source Code
All the source code used in this book can be downloaded from
https://github.com/apress/practical-business-
analytics-r-python.
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub
(https://github.com/Apress). For more detailed information, please
visit www.apress.com/source-code.
Table of Contents
Part I: Introduction to Analytics
Chapter 1:​An Overview of Business Analytics
1.​1 Introduction
1.​2 Objectives of This Book
1.​3 Confusing Terminology
1.​4 Drivers for Business Analytics
1.​4.​1 Growth of Computer Packages and Applications
1.​4.​2 Feasibility to Consolidate Data from Various Sources
1.​4.​3 Growth of Infinite Storage and Computing Capability
1.​4.​4 Survival and Growth in the Highly Competitive World
1.​4.​5 Business Complexity Growing Out of Globalization
1.​4.​6 Easy-to-Use Programming Tools and Platforms
1.​5 Applications of Business Analytics
1.​5.​1 Marketing and Sales
1.​5.​2 Human Resources
1.​5.​3 Product Design
1.​5.​4 Service Design
1.​5.​5 Customer Service and Support Areas
1.​6 Skills Required for an Analytics Job
1.​7 Process of an Analytics Project
1.​8 Chapter Summary
Chapter 2:​The Foundations of Business Analytics
2.​1 Introduction
2.​2 Population and Sample
2.​2.​1 Population
2.​2.​2 Sample
2.​3 Statistical Parameters of Interest
2.​3.​1 Mean
2.​3.​2 Median
2.​3.​3 Mode
2.​3.​4 Range
2.​3.​5 Quantiles
2.​3.​6 Standard Deviation
2.​3.​7 Variance
2.​3.​8 Summary Command in R
2.​4 Probability
2.​4.​1 Rules of Probability
2.​4.​2 Probability Distributions
2.​4.​3 Conditional Probability
2.​5 Computations on Data Frames
2.​6 Scatter Plot
2.​7 Chapter Summary
Chapter 3:​Structured Query Language Analytics
3.​1 Introduction
3.​2 Data Used by Us
3.​3 Steps for Business Analytics
3.​3.​1 Initial Exploration and Understanding of the Data
3.​3.​2 Understanding Incorrect and Missing Data, and
Correcting Such Data
3.​3.​3 Further Exploration and Reporting on the Data
3.​4 Chapter Summary
Chapter 4:​Business Analytics Process
4.​1 Business Analytics Life Cycle
4.​1.​1 Phase 1:​Understand the Business Problem
4.​1.​2 Phase 2:​Data Collection
4.​1.​3 Phase 3:​Data Preprocessing and Preparation
4.​1.​4 Phase 4:​Explore and Visualize the Data
4.​1.​5 Phase 5:​Choose Modeling Techniques and Algorithms
4.​1.​6 Phase 6:​Evaluate the Model
4.​1.​7 Phase 7:​Report to Management and Review
4.​1.​8 Phase 8:​Deploy the Model
4.​2 Chapter Summary
Chapter 5:​Exploratory Data Analysis
5.​1 Exploring and Visualizing the Data
5.​1.​1 Tables
5.​1.​2 Describing Data:​Summary Tables
5.​1.​3 Graphs
5.​1.​4 Scatter Plot Matrices
5.​2 Plotting Categorical Data
5.​3 Chapter Summary
Chapter 6:​Evaluating Analytics Model Performance
6.​1 Introduction
6.​2 Regression Model Evaluation
6.​2.​1 Root-Mean-Square Error
6.​2.​2 Mean Absolute Percentage Error
6.​2.​3 Mean Absolute Error (MAE) or Mean Absolute
Deviation (MAD)
6.​2.​4 Sum of Squared Errors (SSE)
6.2.5 R2 (R-Squared)
6.2.6 Adjusted R2
6.​3 Classification Model Evaluation
6.​3.​1 Classification Error Matrix
6.​3.​2 Sensitivity Analysis in Classification
6.​4 ROC Chart
6.​5 Overfitting and Underfitting
6.​5.​1 Bias and Variance
6.​6 Cross-Validation
6.​7 Measuring the Performance of Clustering
6.​8 Chapter Summary
Part II: Supervised Learning and Predictive Analytics
Chapter 7:​Simple Linear Regression
7.​1 Introduction
7.​2 Correlation
7.​2.​1 Correlation Coefficient
7.​3 Hypothesis Testing
7.​4 Simple Linear Regression
7.​4.​1 Assumptions of Regression
7.​4.​2 Simple Linear Regression Equation
7.​4.​3 Creating a Simple Regression Equation in R
7.​4.​4 Testing the Assumptions of Regression
7.​4.​5 Conclusion
7.​4.​6 Predicting the Response Variable
7.​4.​7 Additional Notes
7.​5 Using Python to Generate the Model and Validating the
Assumptions
7.​5.​1 Load Important Packages and Import the Data
7.​5.​2 Generate a Simple Linear Regression Model
7.​5.​3 Alternative Way for Generation of the Model
7.​5.​4 Validation of the Significance of the Generated Model
7.​5.​5 Validating the Assumptions of Linear Regression
7.​5.​6 Predict Using the Model Generated
7.​6 Chapter Summary
Chapter 8:​Multiple Linear Regression
8.​1 Using Multiple Linear Regression
8.​1.​1 The Data
8.​1.​2 Correlation
8.​1.​3 Arriving at the Model
8.​1.​4 Validation of the Assumptions of Regression
8.​1.​5 Multicollinearit​y
8.​1.​6 Stepwise Multiple Linear Regression
8.​1.​7 All Subsets Approach to Multiple Linear Regression
8.​1.​8 Multiple Linear Regression Equation
8.​1.​9 Conclusion
8.​2 Using an Alternative Method in R
8.​3 Predicting the Response Variable
8.​4 Training and Testing the Model
8.​5 Cross Validation
8.​6 Using Python to Generate the Model and Validating the
Assumptions
8.​6.​1 Load the Necessary Packages and Import the Data
8.​6.​2 Generate Multiple Linear Regression Model
8.​6.​3 Alternative Way to Generate the Model
8.​6.​4 Validating the Assumptions of Linear Regression
8.​6.​5 Predict Using the Model Generated
8.​7 Chapter Summary
Chapter 9:​Classification
9.​1 What Are Classification and Prediction?​
9.​1.​1 K-Nearest Neighbor
9.​1.​2 KNN Algorithm
9.​1.​3 KNN Using R
9.​1.​4 KNN Using Python
9.​2 Naïve Bayes Models for Classification
9.​2.​1 Naïve Bayes Classifier Model Example
9.​2.​2 Naïve Bayes Classifier Using R (Use Same Data Set as
KNN)
9.​2.​3 Advantages and Limitations of the Naïve Bayes
Classifier
9.​3 Decision Trees
9.​3.​1 Decision Tree Algorithm
9.​3.​2 Building a Decision Tree
9.​3.​3 Classification Rules from Tree
9.​4 Advantages and Disadvantages of Decision Trees
9.​5 Ensemble Methods and Random Forests
9.​6 Decision Tree Model Using R
9.​7 Decision Tree Model Using Python
9.​7.​1 Creating the Decision Tree Model
9.​7.​2 Making Predictions
9.​7.​3 Measuring the Accuracy of the Model
9.​7.​4 Creating a Pruned Tree
9.​8 Chapter Summary
Chapter 10:​Neural Networks
10.​1 What Is an Artificial Neural Network?​
10.​2 Concept and Structure of Neural Networks
10.​2.​1 Perceptrons
10.​2.​2 The Architecture of Neural Networks
10.​3 Learning Algorithms
10.​3.​1 Predicting Attrition Using a Neural Network
10.​3.​2 Classification and Prediction Using a Neural Network
10.​3.​3 Training the Model
10.​3.​4 Backpropagation
10.​4 Activation Functions
10.​4.​1 Linear Function
10.​4.​2 Sigmoid Activation Function
10.​4.​3 Tanh Function
10.​4.​4 ReLU Activation Function
10.​4.​5 Softmax Activation Function
10.​4.​6 Selecting an Activation Function
10.​5 Practical Example of Predicting Using a Neural Network
10.​5.​1 Implementing a Neural Network Model Using R
10.​6 Implementation of a Neural Network Model Using Python
10.​7 Strengths and Weaknesses of Neural Network Models
10.​8 Deep Learning and Neural Networks
10.​9 Chapter Summary
Chapter 11:​Logistic Regression
11.​1 Logistic Regression
11.​1.​1 The Data
11.​1.​2 Creating the Model
11.​1.​3 Model Fit Verification
11.​1.​4 General Words of Caution
11.​1.​5 Multicollinearit​y
11.​1.​6 Dispersion
11.​1.​7 Conclusion for Logistic Regression
11.​2 Training and Testing the Model
11.​2.​1 Example of Prediction
11.​2.​2 Validating the Logistic Regression Model on Test Data
11.​3 Multinomial Logistic Regression
11.​4 Regularization
11.​5 Using Python to Generate Logistic Regression
11.​5.​1 Loading the Required Packages and Importing the
Data
11.​5.​2 Understanding the Dataframe
11.​5.​3 Getting the Data Ready for the Generation of the
Logistic Regression Model
11.​5.​4 Splitting the Data into Training Data and Test Data
11.​5.​5 Generating the Logistic Regression Model
11.​5.​6 Predicting the Test Data
11.​5.​7 Fine-Tuning the Logistic Regression Model
11.​5.​8 Logistic Regression Model Using the statsmodel()
Library
11.​6 Chapter Summary
Part III: Time-Series Models
Chapter 12:​Time Series:​Forecasting
12.​1 Introduction
12.​2 Characteristics of Time-Series Data
12.​3 Decomposition of a Time Series
12.​4 Important Forecasting Models
12.​4.​1 Exponential Forecasting Models
12.​4.​2 ARMA and ARIMA Forecasting Models
12.​4.​3 Assumptions for ARMA and ARIMA
12.​5 Forecasting in Python
12.​5.​1 Loading the Base Packages
12.​5.​2 Reading the Time-Series Data and Creating a
Dataframe
12.​5.​3 Trying to Understand the Data in More Detail
12.​5.​4 Decomposition of the Time Series
12.​5.​5 Test Whether the Time Series Is “Stationary”
12.​5.​6 The Process of “Differencing”
12.​5.​7 Model Generation
12.​5.​8 ACF and PACF Plots to Check the Model
Hyperparameters and the Residuals
12.​5.​9 Forecasting
12.​6 Chapter Summary
Part IV: Unsupervised Models and Text Mining
Chapter 13:​Cluster Analysis
13.​1 Overview of Clustering
13.​1.​1 Distance Measure
13.​1.​2 Euclidean Distance
13.​1.​3 Manhattan Distance
13.​1.​4 Distance Measures for Categorical Variables
13.​2 Distance Between Two Clusters
13.​3 Types of Clustering
13.​3.​1 Hierarchical Clustering
13.​3.​2 Dendrograms
13.​3.​3 Nonhierarchical Method
13.​3.​4 K-Means Algorithm
13.​3.​5 Other Clustering Methods
13.​3.​6 Evaluating Clustering
13.​4 Limitations of Clustering
13.​5 Clustering Using R
13.​5.​1 Hierarchical Clustering Using R
13.​6 Clustering Using Python sklearn( )
13.​7 Chapter Summary
Chapter 14:​Relationship Data Mining
14.​1 Introduction
14.​2 Metrics to Measure Association:​Support, Confidence, and
Lift
14.​2.​1 Support
14.​2.​2 Confidence
14.​2.​3 Lift
14.​3 Generating Association Rules
14.​4 Association Rule (Market Basket Analysis) Using R
14.​5 Association Rule (Market Basket Analysis) Using Python
14.​6 Chapter Summary
Chapter 15:​Introduction to Natural Language Processing
15.​1 Overview
15.​2 Applications of NLP
15.​2.​1 Chatbots
15.​2.​2 Sentiment Analysis
15.​2.​3 Machine Translation
15.​3 What Is Language?​
15.​3.​1 Phonemes
15.​3.​2 Lexeme
15.​3.​3 Morpheme
15.​3.​4 Syntax
15.​3.​5 Context
15.​4 What Is Natural Language Processing?​
15.​4.​1 Why Is NLP Challenging?​
15.​5 Approaches to NLP
15.​5.​1 WordNet Corpus
15.​5.​2 Brown Corpus
15.​5.​3 Reuters Corpus
15.​5.​4 Processing Text Using Regular Expressions
15.​6 Important NLP Python Libraries
15.​7 Important NLP R Libraries
15.​8 NLP Tasks Using Python
15.​8.​1 Text Normalization
15.​8.​2 Tokenization
15.​8.​3 Lemmatization
15.​8.​4 Stemming
15.​8.​5 Stop Word Removal
15.​8.​6 Part-of-Speech Tagging
15.​8.​7 Probabilistic Language Model
15.​8.​8 N-gram Language Model
15.​9 Representing Words as Vectors
15.​9.​1 Bag-of-Words Modeling
15.​9.​2 TF-IDF Vectors
15.​9.​3 Term Frequency
15.​9.​4 Inverse Document Frequency
15.​9.​5 TF-IDF
15.​10 Text Classifications
15.​11 Word2vec Models
15.​12 Text Analytics and NLP
15.​13 Deep Learning and NLP
15.​14 Case Study:​Building a Chatbot
15.​15 Chapter Summary
Chapter 16:​Big Data Analytics and Future Trends
16.​1 Introduction
16.​2 Big Data Ecosystem
16.​3 Future Trends in Big Data Analytics
16.​3.​1 Growth of Social Media
16.​3.​2 Creation of Data Lakes
16.​3.​3 Visualization Tools at the Hands of Business Users
16.​3.​4 Prescriptive Analytics
16.​3.​5 Internet of Things
16.​3.​6 Artificial Intelligence
16.​3.​7 Whole Data Processing
16.​3.​8 Vertical and Horizontal Applications
16.​3.​9 Real-Time Analytics
16.​4 Putting the Analytics in the Hands of Business Users
16.​5 Migration of Solutions from One Tool to Another
16.​6 Cloud Analytics
16.​7 In-Database Analytics
16.​8 In-Memory Analytics
16.​9 Autonomous Services for Machine Learning
16.​10 Addressing Security and Compliance
16.​11 Big data Applications
16.​12 Chapter Summary
Part V: Business Analytics Tools
Chapter 17:​R for Analytics
17.​1 Data Analytics Tools
17.​2 Data Wrangling and Data Preprocessing Using R
17.​2.​1 Handling NAs and NULL Values in the Data Set
17.​2.​2 Apply() Functions in R
17.​2.​3 lapply()
17.​2.​4 sapply()
17.​3 Removing Duplicate Records in the Data Set
17.​4 split()
17.​5 Writing Your Own Functions in R
17.​6 Chapter Summary
Chapter 18:​Python Programming for Analytics
18.​1 Introduction
18.​2 pandas for Data Analytics
18.​2.​1 Data Slicing Using pandas
18.​2.​2 Statistical Data Analysis Using pandas
18.​2.​3 Pandas Database Functions
18.​2.​4 Data Preprocessing Using pandas
18.​2.​5 Handling Data Types
18.​2.​6 Handling Dates Variables
18.​2.​7 Feature Engineering
18.​2.​8 Data Preprocessing Using the apply() Function
18.​2.​9 Plots Using pandas
18.​3 NumPy for Data Analytics
18.​3.​1 Creating NumPy Arrays with Zeros and Ones
18.​3.​2 Random Number Generation and Statistical Analysis
18.​3.​3 Indexing, Slicing, and Iterating
18.​3.​4 Stacking Two Arrays
18.​4 Chapter Summary
References
Index
About the Authors
Dr. Umesh R Hodeghatta
is an engineer, scientist, and an educator.
He is currently a faculty member at
Northeastern University, specializing in
data analytics, AI, machine learning,
deep learning, natural language
processing (NLP), and cybersecurity. He
has more than 25 years of work
experience in technical and senior
management positions at AT&T Bell
Laboratories, Cisco Systems, McAfee, and
Wipro. He was also a faculty member at
Kent State University in Kent, Ohio, and
Xavier Institute of Management in
Bhubaneswar, India. He earned a
master’s degree in electrical and computer engineering (ECE) from
Oklahoma State University and a doctorate degree from the Indian
Institute of Technology (IIT). His research interest is applying
AI/machine learning to strengthen an organization’s information
security based on his expertise in information security and machine
learning. As a chief data scientist, he is helping business leaders to
make informed decisions and recommendations linked to the
organization’s strategy and financial goals, reflecting an awareness of
external dynamics based on a data-driven approach.
He has published many journal articles in international journals and
conference proceedings. In addition, he has authored books titled
Business Analytics Using R: A Practical Approach and The InfoSec
Handbook: An Introduction to Information Security, published by
Springer Apress. Furthermore, Dr. Hodeghatta has contributed his
services to many professional organizations and regulatory bodies. He
was an executive committee member of the IEEE Computer Society
(India); academic advisory member for the Information and Security
Audit Association (ISACA); IT advisor for the government of India;
technical advisory member of the International Neural Network Society
(INNS) India; and advisory member of the Task Force on Business
Intelligence & Knowledge Management. He was listed in “Who’s Who in
the World” for the years 2012, 2013, 2014, 2015, and 2016. He is also a
senior member of the IEEE (USA).

Umesha Nayak
is a director and principal consultant of
MUSA Software Engineering Pvt. Ltd.,
which focuses on
systems/process/management
consulting. He is also the chief executive
officer of N-U Sigma U-Square Analytics
Lab, which specializes in high-end
consulting on artificial intelligence and
machine learning. He has 41 years’
experience, of which 19 years are in
providing consulting to
IT/manufacturing and other
organizations from across the globe. He
has a master’s degree in software
systems and a master’s degree in
economics; he is certified as a CAIIB,
Certified Information Systems Auditor (CISA), and Certified Risk and
Information Systems Control (CRISC) professional from ISACA, PGDFM,
certified lead auditor for many of the ISO standards, and certified coach,
among others. He has worked extensively in banking, software
development, product design and development, project management,
program management, information technology audits, information
application audits, quality assurance, coaching, product reliability,
human resource management and culture development, and
management consultancy, including consultancy in artificial
intelligence and machine learning. He was a vice president and
corporate executive council member at Polaris Software Lab, Chennai,
prior to his current assignment. He has also held various roles such as
head of quality, head of SEPG, and head of Strategic Practice Unit –
Risks & Treasury at Polaris Software Lab. He started his journey with
computers in 1981 with ICL mainframes and continued with minis and
PCs. He was one of the founding members of information systems
auditing in the banking industry in India. He has effectively guided
many organizations through successful ISO 9001/ISO 27001/CMMI and
other certifications and process/product improvements and solved
problems through artificial intelligence and machine learning. He
coauthored the book The InfoSec Handbook: An Introduction to
Information Security, published by Apress.
Part I
Introduction to Analytics
© Umesh R. Hodeghatta, Ph.D and Umesha Nayak 2023
U. R. Hodeghatta, U. Nayak, Practical Business Analytics Using R and Python
https://doi.org/10.1007/978-1-4842-8754-5_1

1. An Overview of Business Analytics


Umesh R. Hodeghatta1 and Umesha Nayak2

(1) South Portland, ME, USA


(2) Bangalore, Karnataka, India

1.1 Introduction
Today’s world is data-driven and knowledge-based. In the past,
knowledge was gained mostly through observation now, knowledge is
secured not only through observation but also by analyzing data that is
available in abundance. In the 21st century, knowledge is acquired and
applied by analyzing data available through various applications, social
media sites, blogs, and much more. The advancement of computer
systems complements knowledge of statistics, mathematics,
algorithms, and programming. Enormous storage and exten computing
capabilities have ensured that knowledge can be quickly derived from
huge amounts of data and be used for many other purposes. The
following examples demonstrate how seemingly obscure or
unimportant data can be used to make better business decisions:
A hotel in Switzerland welcomes you with your favorite drink and
dish; you are so delighted!
You are offered a stay at a significantly discounted rate at your
favorite hotel on your birthday or marriage anniversary when
traveling to your destination.
Based on your daily activities and/or food habits, you are warned
about the high probability of becoming a diabetic so you can take the
right steps to avoid it.
You enter a grocery store and find that your regular monthly
purchases are already selected and set aside for you. The only
decision you have to make is whether you require all of them or want
to remove some from the list. How happy you are!
There are many such scenarios that are made possible by analyzing
data about you and your activities that is collected through various
means—including mobile phones, your Google searches, visits to
various websites, your comments on social media sites, your activities
using various computer applications, and more. The use of data
analytics in these scenarios has focused on your individual perspective.
Now, let’s look at scenarios from a business perspective.
As a hotel business owner, you are able to provide competitive yet
profitable rates to your prospective customers. At the same time, you
can ensure that your hotel is completely occupied all the time by
providing additional benefits, including discounts on local travel and
local sightseeing offers tied to other local vendors.
As a taxi business owner, you are able to repeatedly attract the same
customers based on their travel history and preferences of taxi type
and driver.
As a fast-food business owner you are able to offer discounted rates
to attract customers on slow days. These discounts enable you to
ensure full occupancy on those days also.
You are in the human resources (HR) department of an organization
and are bogged down by high attrition. But now you are able to
understand the types of people you should focus on recruiting based
on the characteristics of those who perform well and who are more
loyal and committed to the organization.
You are in the business of designing, manufacturing, and selling
medical equipment used by hospitals. You are able to understand the
possibility of equipment failure well before the equipment actually
fails, by carrying out analysis of the errors or warnings captured in
the equipment logs.
All these scenarios are possible by analyzing data that the
businesses and others collect from various sources. There are many
such possible scenarios. The application of data analytics to the field of
business is called business analytics.
You have most likely observed the following scenarios:
You’ve been searching, for the past few days, on Google for
adventurous places to visit. You’ve also tried to find various travel
packages that might be available. You suddenly find that when you
are on Facebook, Twitter, or other websites, they show a specific
advertisement of what you are looking for, usually at a discounted
rate.
You’ve been searching for a specific item to purchase on Amazon (or
any other site). Suddenly, on other sites you visit, you find
advertisements related to what you are looking for or find
customized mail landing in your mailbox, offering discounts along
with other items you might be interested in.
You’ve also seen recommendations that Netflix, Amazon,
Walmart.com, etc., make based on your searches, your wish list, or
previous purchases or movies you have watched. Many times you’ve
also likely observed these sites offering you discounts or promoting
new products based on the huge amount of customer data these
companies have collected.
All of these possibilities are now a reality because of data analytics
specifically used by businesses.

1.2 Objectives of This Book


Many professionals are interested in learning analytics. But not all of
them have rich statistical or mathematical backgrounds. This book is
the right place for techies as well as those who are not so technical to
get started with business analytics. You’ll learn about data analytics
processes, tools, and techniques, as well as different types of analytics,
including predictive modeling and big data. This is an introductory
book in the field of business analytics.
The following are some of the advantages of this book:
It offers the right mix of theory and hands-on labs. The concepts are
explained using business scenarios or case studies where required.
It is written by industry professionals who are currently working in
the field of analytics on real-life problems for paying customers.
This book provides the following:
Practical insights into the use of data that has been collected,
collated, purchased, or available for free from government sources or
others. These insights are attained via computer programming,
statistical and mathematical knowledge, and expertise in relevant
fields that enable you to understand the data and arrive at predictive
capabilities.
Information on the effective use of various techniques related to
business analytics.
Explanations of how to effectively use the programming platform R
or Python for business analytics.
Practical cases and examples that enable you to apply what you learn
from this book.
The dos and don’ts of business analytics.
The book does not do the following:
Deliberate on the definitions of various terms related to analytics,
which can be confusing.
Elaborate on the fundamentals behind any statistical or
mathematical technique or particular algorithm beyond certain
limits.
Provide a repository of all the techniques or algorithms used in the
field of business analytics (but does explore many of them).
Advanced concepts in the field of neural networks and deep
learning.
Not a guide to programming R or Python.

1.3 Confusing Terminology


Many terms are used in discussions of this topic—for example, data
analytics, business analytics, big data analytics, and data science. Most of
these are, in a sense, the same. However, the purpose of the analytics,
the extent of the data that’s available for analysis, and the difficulty of
the data analysis may vary from one to the other. Finally, regardless of
the differences in terminology, we need to know how to use the data
effectively for our businesses. These differences in terminology should
not get in the way of applying techniques to the data (especially in
analyzing it and using it for various purposes, including understanding
it, deriving models from it, and then using these models for predictive
purposes).
In layperson’s terms, let’s look at some of this terminology:
Data analytics is the analysis of data, whether huge or small, in order
to understand it and decide how to use the knowledge hidden within
it. An example is the analysis of data related to various classes of
travelers.
Business analytics is the application of data analytics to business. An
example of how business analytics might be applied is offering
specific discounts to different classes of travelers based on the
amount of business they offer or have the potential to offer.
Data science is an interdisciplinary field (including disciplines such
as statistics, mathematics, and computer programming) that derives
knowledge from data and applies it for predictive or other purposes.
Expertise about underlying processes, systems, and algorithms is
used. An example is the application of t-values and p-values from
statistics in identifying significant model parameters in a regression
equation.
Big data analytics is the analysis of huge amounts of data (for
example, trillions of records) or the analysis of difficult-to-crack
problems. Usually, this requires an enormous amount of storage
and/or computing power, including massive amounts of memory to
hold the data and a huge number of high-speed processors to crunch
the data and get its essence. An example is the analysis of geospatial
data captured by satellites to identify weather patterns and make
related predictions.
Data mining and machine learning
Data mining refers to the process of mining data to find meaningful
patterns hidden within the data. The term data mining brings to mind
mining earth to find useful minerals. Just as important minerals are often
buried deep within earth and rock, useful information is often hidden in
data, neither easily visible nor understandable. Data mining uses
techniques such as supervised machine learning and unsupervised
machine learning to perform mining tasks. Since advanced analytics
requires data mining, we will cover these in later chapters.
1.4 Drivers for Business Analytics
Though many data techniques, algorithms, and tools have been around
for a very long time, recent developments in technology, computation,
and algorithms have led to tremendous popularity of analytics. We
believe the following are the growth drivers for business analytics in
recent years:
Increasing numbers of relevant computer packages and applications.
One example is the Python and R programming environment with its
various data sets, documentation on its packages, and open-source
libraries with built-in algorithms.
Integration of data from various sources and of various types,
including both structured and unstructured data, such as data from
flat files, data from relational databases, data from log files, data from
Twitter messages, and more. An example is the consolidation of
information from data files in a Microsoft SQL Server database with
data from a Twitter message stream.
Growth of seemingly infinite storage and computing capabilities by
clustering multiple computers and extending these capabilities via
the cloud. An example is the use of Apache Hadoop clusters to
distribute and analyze huge amounts of data.
Emergence of many new techniques and algorithms to effectively use
statistical and mathematical concepts for business analysis. One
example is the recent development in natural language processing
(NLP) and neural networks.
Business complexity arising from globalization. An economic or
political situation in a particular country can affect the sales in that
country. The need for business survival and growth in a highly
competitive world requires each company and business to deep dive
into data to understand customer behavior patterns and take
advantage of them.
Availability of many easy-to-use programming tools, platforms, and
frameworks.

A note of caution here Not all business problems require


complicated analytics solutions. Some may be easy to understand
and to solve by using techniques such as the visual depiction of data.

Now let’s discuss each of these drivers for business analytics in more
detail.

1.4.1 Growth of Computer Packages and


Applications
Computer packages and applications have completely flooded modern
life. This is true at both an individual and business level. This is
especially true with our extensive use of smartphones, which enable
the following:
Communication with others through email packages
Activities in social media and blogs
Business communications through email, instant messaging, and
other tools
Day-to-day searches for information and news through search
engines
Recording of individual and business financial transactions through
accounting packages
Recording of our travel details via online ticket-booking websites or
apps
Recording of our various purchases in e-commerce websites
Recording our daily exercise routines, calories burned, and diets
through various applications
We are surrounded by many computer packages and applications
that collect a lot of data about us. This data is used by businesses to
make them more competitive, attract more business, and retain and
grow their customer base. With thousands of apps on platforms such as
Android, iOS, and Windows, capturing data encompasses nearly all the
activities carried out by individuals across the globe (who are the
consumers for most of the products and services). This has been
enabled further by the reach of hardware devices such as computers,
laptops, mobile phones, and smartphones even to remote places.
1.4.2 Feasibility to Consolidate Data from Various
Sources
Technology has grown by leaps and bounds over the last few years. The
growth of technology coupled with almost unlimited storage capability
has enabled us to consolidate related or relevant data from various
sources—right from flat files to database data to data in various
formats. This ability to consolidate data from various sources has
provided a great deal of momentum to effective business analysis.

1.4.3 Growth of Infinite Storage and Computing


Capability
The growth of server technologies, processing storage, and memories
have changed the computation power today. This advancement in
technology has made processing huge data by AI algorithms simpler
and faster. Similarly, the memory and storage capacity of individual
computers has increased drastically, whereas external storage devices
have provided a significant increase in storage capacity. This has been
augmented by cloud-based storage services that can provide a virtually
unlimited amount of storage. The growth of cloud platforms has also
contributed to virtually unlimited computing capability. Now you can
rent the processing power of multiple CPUs and graphical processing
units (GPUs) coupled with huge memory and large storage to carry out
any analysis—however big the data is. This has reduced the need to
rely on a sampling of data for analysis. Instead, you can take the entire
population of data available with you and analyze it by using the power
of cloud storage and computing capabilities that was not possible 20
years back.

1.4.4 Survival and Growth in the Highly


Competitive World
Without the Internet, life can come to a standstill. Everyone is
connected on social media, and information is constantly flowing across
the world, both good and bad, reliable and unreliable. Internet users
are also potential customers who are constantly discussing about their
experiences of products and services on social media. Businesses have
become highly competitive. Businesses want to take advantage of such
information by analyzing such data. Each business is targeting the same
customer and that customer’s spending capability. Using social media,
businesses are fiercely competing with each other. To survive,
businesses have to find the best ways to target other businesses that
require their products and services as well as the end consumers who
require their products and services. Data or business analytics has
enabled this effectively. Analytics provides various techniques to find
hidden patterns in the data and provides enough knowledge for
businesses to make better business decisions.

1.4.5 Business Complexity Growing Out of


Globalization
Economic globalization that cuts across the boundaries of the countries
where businesses produce goods or provide services has drastically
increased the complexities of business. Businesses now have the
challenge of catering to cultures that may have been previously
unknown to them. With the large amount of data now possible to
acquire (or already at their disposal), businesses can easily gauge
differences between local and regional cultures, demands, and practices
including spending trends and preferences.

1.4.6 Easy-to-Use Programming Tools and


Platforms
In addition to commercially available data analytics tools, many open-
source tools or platforms such as Python, R, Spark, PyTorch,
TensorFlow, and Hadoop are available for easy development of
analytics applications and developing models. These powerful tools are
easy to use and well documented. They usually require an
understanding of basic programming concepts with much knowledge of
complex programming skills. Hadoop is useful in the effective and
efficient analysis of big data.

1.5 Applications of Business Analytics


Business analytics has been applied effectively to many fields, including
retail, e-commerce, travel (including the airline business), hospitality,
logistics, and manufacturing. It also finds many applications in
marketing and sales, human resources, finance, manufacturing, product
design, service design, and customer service and support. Furthermore,
business analytics has been applied to a whole range of other
businesses, including energy, production, pharmaceutical, and other
industries, for finding patterns and predicting failures of machines,
equipment, and processes.
In this section, we discuss some of the areas in which data/business
analytics is used effectively to benefit organizations. These examples
are only illustrative and not exhaustive.

1.5.1 Marketing and Sales


Marketing and sales teams are the ones that have heavily used business
analytics to identify appropriate approaches to marketing to reach a
maximum number of potential customers at an optimized or reduced
effort. These teams use business analytics to identify which marketing
channel would be most effective (for example, emails, websites, or
direct telephone contacts). They also use business analytics to
determine which offers make sense to which types of customers (in
terms of geographical regions, for instance) and to specifically tune
their offers.
A marketing and sales team in the retail business would like to
enable retail outlets (physical or online) to promote new products
along with other products, as a bundled offer, based on the purchasing
pattern of consumers. Logistics companies always want to optimize
delivery time and commitment to customers by sticking to delivery
commitments. This is always an important factor for businesses to tie
up for their services. Similarly, an airline company would like to
promote exciting offers based on a customer’s travel history, thus
encouraging customers to travel again and again via the same airline,
thereby creating a loyal customer over a period of time. A travel agency
would like to determine whether people like adventurous vacations,
spiritual vacations, or historical vacations based on past customer data,
and analytics can provide the inputs needed to focus marketing and
sales efforts according to those specific interests of the people—thus
optimizing the time spent by the marketing and sales team.

1.5.2 Human Resources


Retention is the biggest problem faced by an HR department in any
industry, especially in the service industry. An HR department can
identify which employees have high potential for retention by
processing past employee data. Similarly, an HR department can also
analyze which competence (qualification, knowledge, skill, or training)
has the most influence on the organization’s or team’s capability to
deliver quality output within committed timelines.

1.5.3 Product Design


Product design is not easy and often involves complicated processes.
Risks factored in during product design, subsequent issues faced during
manufacturing, and any resultant issues faced by customers or field
staff can be a rich source of data that can help you understand potential
issues with a future design. This analysis may reveal issues with
materials, issues with the processes employed, issues with the design
process itself, issues with the manufacturing, or issues with the
handling of the equipment installation or later servicing. The results of
such an analysis can substantially improve the quality of future designs
by any company. Another interesting aspect is that data can help
indicate which design aspects (color, sleekness, finish, weight, size, or
material) customers like and which ones customers do not like.

1.5.4 Service Design


Like products, services are also carefully designed and priced by
organizations. Identifying components of the service (and what are not)
also depends on product design and cost factors compared to pricing.
The length of the warranty, coverage during warranty, and pricing for
various services can also be determined based on data from earlier
experiences and from target market characteristics. Some customer
regions may more easily accept “use and throw” products, whereas
other regions may prefer “repair and use” kinds of products. Hence, the
types of services need to be designed according to the preferences of
regions. Again, different service levels (responsiveness) may have
different price tags and may be targeted toward a specific segment of
customers (for example, big corporations, small businesses, or
individuals).

1.5.5 Customer Service and Support Areas


After-sales service and customer service are important aspects that no
business can ignore. A lack of effective customer service can lead to
negative publicity, impacting future sales of new versions of the product
or of new products from the same company. Hence, customer service is
an important area in which data analysis is applied significantly.
Customer comments on the Web or on social media (for example,
Twitter) provide a significant source of understanding about the
customer pulse as well as the reasons behind the issues faced by
customers. A service strategy can be accordingly drawn up, or
necessary changes to the support structure may be carried out, based
on the analysis of the data available to the industry.

1.6 Skills Required for an Analytics Job


Till now we have discussed various terminologies of analytics, reasons
for the exponential growth in analytics, and illustrative examples of
some of the applications of business analytics. Let’s now discuss the
skills required to perform data analytics and create models. Typically,
this task requires substantial knowledge of the following topics:
1. Communications skills to understand the problem and the
requirements.

Having a clear understanding of the problem/data task is one of the


most important requirements. If the person analyzing the data does not
understand the underlying problem or the specific characteristics of
the task, then the analysis performed by the data analytics person can
lead to the wrong conclusions or lead the business in the wrong
direction. Also, if an individual does not know the specific domain in
which the problem is being solved, then one should consult the domain
expert to perform the analysis. Not understanding requirements, and
just having only programming skills along with statistical or
mathematical knowledge, can sometimes lead to proposing impractical
(or even dangerous) suggestions for the business. These suggestions
also waste the time of core business personnel.
2. Tools, techniques, and algorithms that can be applied to the data.

Understanding data and data types, preprocessing data, exploring


data, choosing the right techniques, selecting supervised learning or
unsupervised learning algorithms, and creating models are an essential
part of an analytics job. It is equally important to apply the proper
analysis techniques and algorithms to suitable situations or analyses.
The depth of this knowledge may vary from the job titles and
experience. For example, linear regression or multiple linear regression
(supervised method) may be suitable if you know (based on business
characteristics) that there exists a strong relationship between a
response variable and various predictors. Clustering (unsupervised
method) can allow you to cluster data into various segments. Using and
applying business analytics effectively can be difficult without
understanding these techniques and algorithms.
Having knowledge of tools is important. Though it is not possible to
learn all the tools that are available, knowing as many tools helps fetch
job interviews. Computer knowledge is required for a capable data
analytics person as well so that there is no dependency on other
programmers who don’t understand the statistics or mathematics
behind the techniques or algorithms. Platforms such as R, Python, and
Hadoop have reduced the pain of learning programming, even though
at times we may have to use other complementary programming
languages.
3. Data structures and data storage or data warehousing techniques,
including how to query the data effectively.

Knowledge of data structures and data storage/data warehousing


eliminates dependence on database administrators and database
programmers. This enables you to consolidate data from varied sources
(including databases and flat files), arrange them into a proper
structure, and store them appropriately in a data repository required
for the analysis. The capability to query such a data repository is
another additional competence of value to any data analyst.
4. Statistical and mathematical concepts (probability theory, linear
algebra, matrix algebra, calculus, and cost-optimization algorithms
such as gradient descent or ascent algorithms).

Data analytics and data mining techniques use many statistical and
mathematical concepts on which various algorithms, measures, and
computations are based. Good knowledge of statistical and
mathematical concepts is essential to properly use the concepts to
depict, analyze, and present the data and the results of the analysis.
Otherwise, the wrong interpretations, wrong models, and wrong
theories can lead others in the wrong direction by misinterpreting the
results because the application of the technique or interpretation of the
result itself was wrong.
Statistics contribute to a significant aspect of effective data analysis.
Similarly, the knowledge discovery enablers such as machine learning
have contributed significantly to the application of business analytics.
Another area that has given impetus to business analytics is the growth
of database systems, from SQL-oriented ones to NoSQL ones. All these
combined, along with easy data visualization and reporting capabilities,
have led to a clear understanding of what the data tells us and what we
understand from the data. This has led to the vast application of
business and data analytics to solve problems faced by organizations
and to drive a competitive edge in business through the application of
this understanding.
There are umpteen tools available to support each piece of the
business analytics framework. Figure 1-1 presents some of these tools,
along with details of the typical analytics framework.
Figure 1-1 Business analytics framework

1.7 Process of an Analytics Project


Earlier we discussed the various data analytics principles, tools,
algorithms, and skills required to perform data and business analytics.
In this section, we briefly touch upon the typical process of business
analytics and data mining projects, which is shown in Figure 1-2.
Another random document with
no related content on Scribd:
33. Changes in the United States and Japan[109]

The occidentalization of Japan, in methods although not in national


spirit,—which changes much more slowly,—has been fully
demonstrated to an astonished world by the war of 1894 with China.
It is one of the incidents of the closing nineteenth century. To this
achievement in the military sphere, in the practice of war which
Napoleon called the science of barbarians, must be added the
development of civil institutions that has resulted in the concession
to Japan of all international dignity and privilege; and consequently
of a control over the administration of justice among foreigners
within her borders, not heretofore obtained by any other Oriental
State. It has thus become evident that the weight of Japan in the
international balances depends not upon the quality of her
achievement, which has been shown to be excellent, but upon the
gross amount of her power. Moreover, while in wealth and
population, with the resources dependent upon them, she may be
deficient,—though rapidly growing,—her geographical position
relatively to the Eastern center of interest, and her advantage of
insularity, go far to compensate such defect. These confer upon her
as a factor in the Eastern problem an influence resembling in kind, if
not equaling in degree, that which Great Britain has held and still
holds in the international relations centering around Europe, the
Atlantic, and the Mediterranean.
Yet the change in Japan, significant as it is and influential upon
the great problem of the Pacific and Asia, is less remarkable and less
important than that which has occurred in the United States. If in the
Orient a nation may be said to have been born in a day, even so the
event is less sudden and less revolutionary than the conversion of
spirit and of ideals—the new birth—which has come over our own
country. In this are evident a rapidity and a thoroughness which
bespeak impulse from an external source rather than any conscious
set process of deliberation, of self-determination within, such as has
been that of Japan in her recognition and adoption of material
improvements forced upon her attention in other peoples. No man or
group of men can pretend to have guided and governed our people in
the adoption of a new policy, the acceptance of which has been rather
instinctive—I would prefer to say inspired—than reasoned. There is
just this difference between Japan and ourselves, the two most
changed of peoples within the last half-century. She has adopted
other methods; we have received another purpose. The one
conversion is material, the other spiritual. When we talk about
expansion we are in the realm of ideas. The material addition of
expansion—the acreage, if I may so say—is trivial compared with our
previous possessions, or with the annexations by European states
within a few years. The material profit otherwise, the national gain to
us, is at best doubtful. What the nation has gained in expansion is a
regenerating idea, an uplifting of the heart, a seed of future
beneficent activity, a going out of self into the world to communicate
the gift it has so bountifully received.
34. Our Interests in the Pacific[110]

[The preceding pages of the essay explain the dependence of the


“Open Door” policy on an international balance of power in the
Pacific, and the modification of this balance owing to the growth of
the German Navy and the increasing European tension.—Editor.]
The result is to leave the two chief Pacific nations, the United
States and Japan, whose are the only two great navies that have
coastlines on that ocean, to represent there the balance of power.
This is the best security for international peace; because it
represents, not a bargain, but a fact, readily ascertainable. Those two
navies are more easily able than any other to maintain there a
concentration of force; and it may even be questioned whether sound
military policy may not make the Pacific rather than the Atlantic the
station for the United States battle fleet. For the balance of naval
power in Europe, which compels the retention of the British and
German fleets in the North Sea, protects the Atlantic coast of the
United States,—and the Monroe Doctrine,—to a degree to which
nothing in Pacific conditions corresponds. Under existing
circumstances, neither Germany nor Great Britain can afford, even
did they desire, to infringe the external policy of the United States
represented in the Monroe Doctrine.
With Japan in the Pacific, and in her attitude towards the Open
Door, the case is very different from that of European or American
Powers. Her nearness to China, Manchuria, Korea, gives the natural
commercial advantages that short and rapid transportation always
confers. Labor with her is still cheap, another advantage in open
competition; but the very fact of these near natural markets, and her
interest in them, cannot but breed that sense of proprietorship
which, in dealing with ill-organized states, easily glides into the
attempt at political control that ultimately means control by force.
Hence the frequent reports, true or untrue, that such advantage is
sought and accomplished. Whether true or not, these illustrate what
nations continually seek, when opportunity offers or can be made.
This is in strict line with that which we call Protection; but with the
difference that Protection is exercised within the sphere commonly
recognized as legitimate, either by International Law or by the policy
of competing states. The mingled weakness and perverseness of
Chinese negotiators invite such attempt, and endanger the Open
Door; give rise to continual suspicion that undue influence resting
upon force is affecting equality of treatment, or is establishing a basis
for inequality in the future. There can be no question that the general
recent attitude of Russia and Japan, however laudably meant, does
arouse such suspicions.
Then again, the American possession, the Hawaiian Islands, are
predominantly Japanese in labor population; a condition which, as
the outcome of little more than a generation, warrants the jealousy of
Japanese immigration on the part of the Pacific coast. Finally, the
population of that coast is relatively scanty, and its communications
with the East, though rapid for express trains, are slow for the
immense traffic of men and stores which war implies and requires.
That is, the power of the country east of the Rocky Mountains has far
to go, and with poor conveyance, in order to reinforce the Western
Coast; the exact opposite of our advantage of rapid maritime access
to the Panama Canal. In the absence of the fleet, invasion may be
easy. Harm may be retrieved in measure by the arrival of the fleet
later; but under present world conditions the Pacific coast seems
incomparably the more exposed of the three great divisions of the
American shore line—the Atlantic, the Gulf, and the Pacific.
35. The German State and its Menace[111]

The prototype of modern Germany is to be found rather in the


Roman Empire, to which in a certain sense the present German
Empire may be said to be—if not heir—at least historically affiliated.
The Holy Roman Empire merged into that somewhat extenuated
figment attached to the Austrian Hapsburgs, which finally deceased
at the opening of the nineteenth century; but the idea itself survived,
and was influential in determining the form and name which the
existing powerful Germanic unity has assumed. To this unity the
national German character contributes an element not unlike that of
antiquity, in the subordination of the individual to the state. As a
matter of national characteristic, this differs radically from the more
modern conception of the freedom and rights of the individual,
exemplified chiefly in England and the United States. It is possible to
accept the latter as the superior ideal, as a higher stage of advance, as
ultimately more fruitful of political progress, yet at the same time to
recognize the great immediate advantage of the massed action which
subordinates the interests of the individual, sinks the unit in the
whole, in order to promote the interests of the community. It may be
noted incidentally, without further insistence just here, that the
Japanese Empire, which in a different field from the German is
manifesting the same restless need for self-assertion and expansion,
comes to its present with the same inheritance from its past, of the
submergence of the individual in the mass. It was equally the
characteristic of Sparta among the city states of ancient Greece, and
gave to her among them the preponderance she for a time possessed.
As an exhibition of social development, it is generally anterior and
inferior to that in which the rights of the individual are more fully
recognized; but as an element of mere force, whether in economics or
in international policies, it is superior.
The two contrasted conceptions, the claims of the individual and
the claims of the state, are familiar to all students of history. The two
undoubtedly must coexist everywhere, and have to be reconciled; but
the nature of the adjustment, in the clear predominance of the one or
the other, constitutes a difference which in effect upon the particular
community is fundamental. In international relations, between states
representing the opposing ideas, it reproduces the contrast between
the simple discipline of an army and the complicated disseminated
activities of the people, industrial, agricultural, and commercial. It
repeats the struggle of the many minor mercantile firms against a
single great combination. In either field, whatever the ultimate issue,
—and in the end the many will prevail,—the immediate result is that
preponderant concentrated force has its way for a period which may
thus be one of great and needless distress; and it not only has its way,
but it takes its way, because, whatever progress the world has made,
the stage has not been reached when men or states willingly
subordinate their own interests to even a reasonable regard for that
of others. It is not necessary to indulge in pessimistic apprehension,
or to deny that there is a real progress of the moral forces lumped
under the name of “public opinion.” This unquestionably tells for
much more than it once did; but still the old predatory instinct, that
he should take who has the power, survives, in industry and
commerce, as well as in war, and moral force is not sufficient to
determine issues unless supported by physical. Governments are
corporations, and corporations have not souls. Governments
moreover are trustees, not principals; and as such must put first the
lawful interests of their wards, their own people.
It matters little what may be the particular intentions now
cherished by the German government. The fact upon which the
contemporary world needs to fasten its attention is that it is
confronted by the simple existence of a power such as is that of the
German Empire; reinforced necessarily by that of Austria-Hungary,
because, whatever her internal troubles and external ambitions,
Austria is bound to Germany by nearness, by inferior power, and by
interests, partly common to the two states, as surely as the moon is
bound to the earth and with it constitutes a single group in the
planetary system. Over against this stands for the moment a number
of states, Russia, Italy, France, Great Britain. The recent action of
Russia has demonstrated her international weakness, the internal
causes of which are evident even to the most careless observer. Italy
still belongs to the Triple Alliance, of which Germany and Austria are
the other members; but the inclination of Italy towards England,
springing from past sympathies, and as a state necessarily naval,
because partly insular, partly peninsular, is known, as is also her
recent drawing towards France as compared with former
estrangement. Also, in the Balkan regions and in the Adriatic Sea
there is more than divergence between the interest of Italy and the
ambitions of Austria,—supported by Germany,—as shown in the late
annexations and their antecedents. An Austrian journal, which fore-
shadowed the annexations with singular acumen, has written
recently,[112] “We most urgently need a fleet so strong that it can rule
the Northern Adriatic basin,”—in which lies the Italian Venice, as
well as the Austrian Trieste,—“support the operations of our land
army, protect our chief commercial ports against hostile maritime
undertakings, and prevent us from being throttled at the Strait of
Otranto. To do this, the fleet must at least attain the approximate
strength of our probable enemy. If we lag behind in developing our
naval programme, Italy will so outrun us that we can never overtake
her. Here more than elsewhere to stand still is to recede; but to
recede would be to renounce the historical mission of Austria.” The
Austrian Dreadnoughts are proceeding, and the above throws an
interesting side light upon the equipoise of the Triple Alliance. In the
Algeciras Conference, concerning the affairs of Morocco, Italy did not
sustain Germany; Austria only did so.
Analyzing thus the present international relations of Europe, we
find on the one side the recently constituted Triple Entente, France,
Great Britain, and Russia; on the other the Triple Alliance, Austria-
Hungary, Germany, and Italy, of thirty years’ standing. The
sympathies of Italy, as distinguished from the pressure of conditions
upon her, and from her formal association, are doubtful; and the
essentials of the situation seem to be summed up in the Triple
Entente opposed by the two mid-Europe military monarchies.

The Bulwark of British Sea Power[113]

[The intervening pages show that exposure on their land frontiers


would weaken the aid that could be given Great Britain by her allies
in continental Europe.—Editor.]
These conclusions, if reasonable, not only emphasize the
paramount importance in world politics of the British navy, but they
show also that there are only two naval states which can afford to
help Great Britain with naval force, because they alone have no land
frontiers which march with those of Germany. These states are Japan
and the United States. In looking to the future, it becomes for them a
question whether it will be to their interest, whether they can afford,
to exchange the naval supremacy of Great Britain for that of
Germany; for this alternative may arise. Those two states and
Germany cannot, as matters now stand, touch one another, except on
the open sea; whereas the character of the British Empire is such that
it has everywhere sea frontiers, is everywhere assailable where local
naval superiority does not exist, as for instance in Australia, and
other Eastern possessions. The United States has upon Great Britain
the further check of Canada, open to land attack.
A German navy, supreme by the fall of Great Britain, with a
supreme German army able to spare readily a large expeditionary
force for over-sea operations, is one of the possibilities of the future.
Great Britain for long periods, in the Seven Years War and
Napoleonic struggle, 1756–1815, has been able to do, and has done,
just this; not because she has had a supreme army, but because,
thanks to her insular situation, her naval supremacy covered
effectually both the home positions and the expedition. The future
ability of Germany thus to act is emphasized to the point of
probability by the budgetary difficulties of Great Britain, by the
general disorganization of Russia, and by the arrest of population in
France. Though vastly the richer nation, the people of Great Britain,
for the very reason of greater wealth long enjoyed, are not habituated
to the economical endurance of the German; nor can the habits of
individual liberty in England or America accept, unless under duress,
the heavy yoke of organization, of regulation of individual action,
which constitutes the power of Germany among modern states.
The rivalry between Germany and Great Britain to-day is the
danger point, not only of European politics, but of world politics as
well.
36. Advantages of Insular Position[114]

Great Britain and the Continental Powers

Every war has two aspects, the defensive and the offensive, to each of
which there is a corresponding factor of activity. There is something
to gain, the offensive; there is something to lose, the defensive. The
ears of men, especially of the uninstructed, are more readily and
sympathetically open to the demands of the latter. It appeals to the
conservatism which is dominant in the well-to-do, and to the
widespread timidity which hesitates to take any risk for the sake of a
probable though uncertain gain. The sentiment is entirely
respectable in itself, and more than respectable when its power is
exercised against breach of the peace for other than the gravest
motives—for any mere lucre of gain. But its limitations must be
understood. A sound defensive scheme, sustaining the bases of the
national force, is the foundation upon which war rests; but who lays
a foundation without intending a superstructure? The offensive
element in warfare is the superstructure, the end and aim for which
the defensive exists, and apart from which it is to all purposes of war
worse than useless. When war has been accepted as necessary,
success means nothing short of victory; and victory must be sought
by offensive measures, and by them only can be ensured. “Being in,
bear it, that the opposer may be ware of thee.” No mere defensive
attitude or action avails to such end. Whatever the particular mode
of offensive action adopted, whether it be direct military attack, or
the national exhaustion of the opponent by cutting off the sources of
national well-being, whatsoever method may be chosen, offense,
injury, weakening of the foe, to annihilation if need be, must be the
guiding purpose of the belligerent. Success will certainly attend him
who drives his adversary into the position of the defensive and keeps
him there.
Offense therefore dominates, but it does not exclude. The necessity
for defense remains obligatory, though subordinate. The two are
complementary. It is only in the reversal of rôles, by which priority of
importance is assigned to the defensive, that ultimate defeat is
involved. Nor is this all. Though opposed in idea and separable in
method of action, circumstances not infrequently have permitted the
union of the two in a single general plan of campaign, which protects
at the same time that it attacks. “Fitz James’s blade was sword and
shield.” Of this the system of blockades by the British Navy during
the Napoleonic wars was a marked example. Thrust up against the
ports of France, and lining her coasts, they covered—shielded—the
operations of their own commerce and cruisers in every sea; while at
the same time, crossing swords, as it were, with the fleets within,
ever on guard, ready to attack, should the enemy give an opening by
quitting the shelter of his ports, they frustrated his efforts at a
combination of his squadrons by which alone he could hope to
reverse conditions. All this was defensive; but the same operation cut
the sinews of the enemy’s power by depriving him of sea-borne
commerce, and promoted the reduction of his colonies. Both these
were measures of offense; and both, it may be added, were directed
upon the national communications, the sources of national well-
being. The means was one, the effect twofold....
[It is shown that, in the case of insular states, offense and defense
are often closely combined, home security depending on control of
the sea assured by offensive action of the national fleet.—Editor.]
An insular state, which alone can be purely maritime, therefore
contemplates war from a position of antecedent probable superiority
from the twofold concentration of its policy; defense and offense
being closely identified, and energy, if exerted judiciously, being
fixed upon the increase of naval force to the clear subordination of
that more narrowly styled military. The conditions tend to minimize
the division of effort between offensive and defensive, purpose, and,
by greater comparative development of the fleet, to supply a larger
margin of disposable numbers in order to constitute a mobile
superiority at a particular point of the general field. Such a decisive
local superiority at the critical point of action is the chief end of the
military art, alike in tactics and strategy. Hence it is clear that an
insular state, if attentive to the conditions that should dictate its
policy, is inevitably led to possess a superiority in that particular kind
of force, the mobility of which enables it most readily to project its
power to the more distant quarters of the earth, and also to change
its point of application at will with unequalled rapidity.
The general considerations that have been advanced concern all
the great European nations, in so far as they look outside their own
continent, and to maritime expansion, for the extension of national
influence and power; but the effect upon the action of each differs
necessarily according to their several conditions. The problem of sea-
defense, for instance, relates primarily to the protection of the
national commerce everywhere, and specifically as it draws near the
home ports; serious attack upon the coast, or upon the ports
themselves, being a secondary consideration, because little likely to
befall a nation able to extend its power far enough to sea to protect
its merchant ships. From this point of view the position of Germany
is embarrassed at once by the fact that she has, as regards the world
at large, but one coast-line. To and from this all her sea commerce
must go; either passing the English Channel, flanked for three
hundred miles by France on the one side and England on the other,
or else going north about by the Orkneys, a most inconvenient
circuit, and obtaining but imperfect shelter from recourse to this
deflected route. Holland, in her ancient wars with England, when the
two were fairly matched in point of numbers, had dire experience of
this false position, though her navy was little inferior in numbers to
that of her opponent. This is another exemplification of the truth that
distance is a factor equivalent to a certain number of ships. Sea-
defense for Germany, in case of war with France or England, means
established naval predominance at least in the North Sea; nor can it
be considered complete unless extended through the Channel and as
far as Great Britain will have to project hers into the Atlantic. This is
Germany’s initial disadvantage of position, to be overcome only by
adequate superiority of numbers; and it receives little compensation
from the security of her Baltic trade, and the facility for closing that
sea to her enemies. In fact, Great Britain, whose North Sea trade is
but one-fourth of her total, lies to Germany as Ireland does to Great
Britain, flanking both routes to the Atlantic; but the great
development of the British sea-coast, its numerous ports and ample
internal communications, strengthen that element of sea-defense
which consists in abundant access to harbors of refuge.
For the Baltic Powers, which comprise all the maritime States east
of Germany, the commercial drawback of the Orkney route is a little
less than for Hamburg and Bremen, in that the exit from the Baltic is
nearly equidistant from the north and south extremities of England;
nevertheless the excess in distance over the Channel route remains
very considerable. The initial naval disadvantage is in no wise
diminished. For all the communities east of the Straits of Dover it
remains true that in war commerce is paralyzed, and all the resultant
consequences of impaired national strength entailed, unless decisive
control of the North Sea is established. That effected, there is
security for commerce by the northern passage; but this alone is
mere defense. Offense, exerted anywhere on the globe, requires a
surplusage of force, over that required to hold the North Sea,
sufficient to extend and maintain itself west of the British Islands. In
case of war with either of the Channel Powers, this means, as
between the two opponents, that the eastern belligerent has to guard
a long line of communications, and maintain distant positions,
against an antagonist resting on a central position, with interior
lines, able to strike at choice at either wing of the enemy’s extended
front. The relation which the English Channel, with its branch the
Irish Sea, bears to the North Sea and the Atlantic—that of an interior
position—is the same which the Mediterranean bears to the Atlantic
and the Indian Sea; nor is it merely fanciful to trace in the passage
round the north of Scotland an analogy to that by the Cape of Good
Hope. It is a reproduction in miniature. The conditions are similar,
the scale different. What the one is to a war whose scene is the north
of Europe, the other is to operations by European Powers in Eastern
Asia.
To protract such a situation is intolerable to the purse and morale
of the belligerent who has the disadvantage of position. This of
course leads us straight back to the fundamental principles of all
naval war, namely, that defense is ensured only by offense, and that
the one decisive objective of the offensive is the enemy’s organized
force, his battle fleet. Therefore, in the event of a war between one of
the Channel Powers, and one or more of those to the eastward, the
control of the North Sea must be at once decided. For the eastern
State it is a matter of obvious immediate necessity, of commercial
self-preservation. For the western State the offensive motive is
equally imperative; but for Great Britain there is defensive need as
well. Her Empire imposes such a development of naval force as
makes it economically impracticable to maintain an army as large as
those of the Continent. Security against invasion depends therefore
upon the fleet. Postponing more distant interests, she must here
concentrate an indisputable superiority. It is, however, inconceivable
that against any one Power Great Britain should not be able here to
exert from the first a preponderance which would effectually cover
all her remoter possessions. Only an economical decadence, which
would of itself destroy her position among nations, could bring her
so to forego the initial advantage she has, in the fact that for her
offense and defense meet and are fulfilled in one factor, the
command of the sea. History has conclusively demonstrated the
inability of a state with even a single continental frontier to compete
in naval development with one that is insular, although of smaller
population and resources. A coalition of Powers may indeed affect
the balance. As a rule, however, a single state against a coalition
holds the interior position, the concentrated force; and while
calculation should rightly take account of possibilities, it should
beware of permitting imagination too free sway in presenting its
pictures. Were the eastern Powers to combine they might prevent
Great Britain’s use of the North Sea for the safe passage of her
merchant shipping; but even so she would but lose commercially the
whole of a trade, the greater part of which disappears by the mere
fact of war. Invasion is not possible, unless her fleet can be wholly
disabled from appearing in that sea. From her geographical position,
she still holds her gates open to the outer world, which maintains
three fourths of her commerce in peace.
37. Bearing of Political Developments on Naval
Policy and Strategy[115]

The external activities of Europe, noted a dozen years ago and before,
have now to a certain extent been again superseded by rivalries
within Europe itself. Those rivalries, however, are the result of their
previous external activities, and in the last analysis they depend
upon German commercial development. This has stimulated the
German Empire to a prodigious naval programme, which affects the
whole of Europe and may affect the United States. In 1897 I summed
up two conspicuous European conditions as being the equilibrium
then existing between France and Germany, with their respective
allies, and the withdrawal of Great Britain from active association
with the affairs of the Continent. At that date the Triple Alliance,
Austria, Germany, Italy, stood against the Dual Alliance, France and
Russia; Great Britain apart from both, but with elements of
antagonism against Russia and France, and not against the German
monarchies or Italy. These antagonisms arose wholly from
conditions external to Europe,—in India against Russia, and in Africa
against France. Later, the paralysis of Russia, through her defeat by
Japan, and through her internal troubles, left France alone for a
time; during which Germany, thus assured against land attack, was
better able to devote much money to the fleet, as the protector of her
growing commerce. The results have been a projected huge German
navy, and a German altercation with France relative to Moroccan
affairs; incidents which have aroused Great Britain to a sense of
naval danger, and have propelled her to the understandings—
whatever they amount to—with France and Russia, which we now
know as the Triple Entente. In short, Great Britain has abandoned
the isolation of twenty years ago, stands joined to the Dual Alliance,
and it becomes a Triple Entente.
To the United States this means that Great Britain, once our chief
opponent in matters covered by the Monroe Doctrine, but later by
the logic of events drawn to recede from that opposition, so that she
practically backed us against Europe in 1898, and subsequently
conceded the Panama arrangement known as the Hay-Pauncefote
Treaty, cannot at present count for as much as she did in naval
questions throughout the world. It means to the United States and to
Japan that Great Britain has too much at stake at home to side with
the one or the other, granting she so wished, except as bound by
treaty, which implies reciprocal obligations. Between her and Japan
such specific obligations exist. They do not in the case of the United
States; and the question whether the two countries are disposed to
support one another, and, if so, to what extent, or what the attitude
of Great Britain would be in case of difficulty between Japan and the
United States, are questions directly affecting naval strategy.[116]
Great Britain does indeed for the moment hold Germany so far in
check that the German Empire also can do no more than look after
its European interests; but should a naval disaster befall Great
Britain, leaving Germany master of the naval situation, the world
would see again a predominant fleet backed by a predominant army,
and that in the hands, not of a state satiated with colonial
possessions, as Great Britain is, but of one whose late entry into
world conditions leaves her without any such possessions at all of
any great value. The habit of mind is narrow which fails to see that a
navy such as Germany is now building will be efficacious for other
ends than those immediately proposed. The existence of such a fleet
is a constant factor in contemporary politics; the part which it shall
play depending upon circumstances not always to be foreseen.
Although the colonial ambitions of Germany are held in abeyance for
the moment, the wish cannot but exist to expand her territory by
foreign acquisitions, to establish external bases for the support of
commercial or political interests, to build up such kindred
communities as now help to constitute the British Empire, homes for
emigrants, markets for industries, sources of supplies of raw
materials, needed by those industries.
All such conditions and ambitions are incidents with which
Strategy, comprehensively considered, has to deal. By the successive
enunciations of the Monroe Doctrine the United States stands
committed to the position that no particle of American soil shall pass
into the hands of a non-American State other than the present
possessor. No successful war between foreign states, no purchase, no
exchange, no merger, such as the not impossible one of Holland with
Germany, is allowed as valid cause for such transfer. This is a very
large contract; the only guarantee of which is an adequate navy,
however the term “adequate” be defined. Adequacy often depends
not only upon existing balances of power, such, for instance, as that
by which the British and German navies now affect one another,
which for the moment secures the observance of the Doctrine.
Account must be taken also of evident policies which threaten to
disturb such balances, such as the official announcement by
Germany of her purpose to create a “fleet of such strength that, even
for the mightiest naval power, a war with Germany would involve
such risks as to jeopardize its own supremacy.” This means, at least,
that Great Britain hereafter shall not venture, as in 1898, to back the
United States against European interference; nor to support France
in Morocco; nor to carry out as against Germany her alliance with
Japan. It is a matter of very distinct consequence in naval strategy
that Great Britain, after years of contention with the United States,
essentially opposed to the claims of the Monroe Doctrine, should at
last have come to substantial coincidence with the American point of
view, even though she is not committed to a formal announcement to
that effect.[117] Such relations between states are primarily the
concern of the statesman, a matter of international policies; but they
are also among the data which the strategist, naval as well as land,
has to consider, because they are among the elements which
determine the constitution and size of the national fleet.
I here quote with approval a statement of the French Captain
Darrieus:
“Among the complex problems to which the idea of strategy gives
rise there is none more important than that of the constitution of the
fleet; and every project which takes no account of the foreign
relations of a great nation, nor of the material limit fixed by its
resources, rests upon a weak and unstable base.”
I repeat also the quotation from Von der Goltz: “We must have a
national strategy, a national tactics.” I cannot too entirely repudiate
any casual word of mine, reflecting the tone which once was so
traditional in the navy that it might be called professional,—that
“political questions belong rather to the statesman than to the
military man.” I find these words in my old lectures, but I very soon
learned better, from my best military friend, Jomini; and I believe
that no printed book of mine endorses the opinion that external
politics are of no professional concern to military men.
It was in accordance with this changed opinion that in 1895, and
again in 1897, I summed up European conditions as I conceived
them to be; pointing out that the distinguishing feature at that time
was substantial equilibrium on the Continent, constituting what is
called the Balance of Power; and, in connection with the calm thus
resulting, an immense colonizing movement, in which substantially
all the great Powers were concerned. This I indicated as worthy of
the notice of naval strategists, because there were parts of the
American continents which for various reasons might attract upon
themselves this movement, in disregard of the Monroe Doctrine.
Since then the scene has shifted greatly, the distinctive feature of
the change being the growth of Germany in industrial, commercial,
and naval power,—all three; while at the same time maintaining her
military pre-eminence, although that has been somewhat qualified
by the improvement of the French army, just as the growth of the
German navy has qualified British superiority at sea. Coincident with
this German development has been the decline of Russia, owing to
causes generally understood; the stationariness of France in
population, while Germany has increased fifty per cent; and the very
close drawing together of Germany and Austria, for reasons of much
more controlling power than the mere treaty which binds them. The
result is that to-day central Europe, that is, Austria and Germany,
form a substantially united body, extending from water to water,
from North Sea to Adriatic, wielding a military power against which,
on the land, no combination in Europe can stand. The Balance of
Power no longer exists; that is, if my estimate is correct of the
conditions and dispersion which characterize the other nations
relatively to this central mass.
This situation, coinciding with British trade jealousies of the new
German industries, and with the German naval programme, have
forced Great Britain out of the isolation which the Balance of Power
permitted her. Her ententes are an attempt to correct the
disturbance of the balance; but, while they tend in that direction,
they are not adequate to the full result desired. The balance remains
uneven; and consequently European attention is concentrated upon
European conditions, instead of upon the colonizing movements of
twenty years ago. Germany even has formally disavowed such
colonizing ambitions, by the mouth of her ambassador to the United
States, confirmed by her minister of foreign affairs, although a dozen
years ago they were conspicuous. Concerning these colonizing
movements, indeed, it might be said that they have reached a
moment of quiet, of equilibrium, while internally Europe is
essentially disquieted, as various incidents have shown.
The important point to us here is the growing power of the
German Empire, in which the efficiency of the State as an organic
body is so greatly superior to that of Great Britain, and may prove to
be to that of the United States. The two English-speaking countries
have wealth vastly superior, each separately, to that of Germany;
much more if acting together. But in neither is the efficiency of the
Government for handling the resources comparable to that of
Germany; and there is no apparent chance or recognized inducement
for them to work together, as Germany and Austria now work in
Europe. The consequence is that Germany may deal with each in
succession much more effectively than either is now willing to
consider; Europe being powerless to affect the issue so long as
Austria stands by Germany, as she thoroughly understands that she
has every motive to do.
It is this line of reasoning which shows the power of the German
navy to be a matter of prime importance to the United States. The
power to control Germany does not exist in Europe, except in the
British navy; and if social and political conditions in Great Britain
develop as they now promise, the British navy will probably decline
in relative strength, so that it will not venture to withstand the
German on any broad lines of policy, but only in the narrowest sense
of immediate British interests. Even this condition may disappear,
for it seems as if the national life of Great Britain were waning at the
same time that that of Germany is waxing. The truth is, Germany, by
traditions of two centuries, inherits now a system of state control,
not only highly developed but with a people accustomed to it,—a
great element of force; and this at the time when control of the
individual by the community—that is, by the state—is increasingly
the note of the times. Germany has in this matter a large start. Japan
has much the same.
When it is remembered that the United States, like Great Britain
and like Japan, can be approached only by sea, we can scarcely fail to
see that upon the sea primarily must be found our power to secure
our own borders and to sustain our external policy, of which at the
present moment there are two principal elements; namely, the
Monroe Doctrine and the Open Door. Of the Monroe Doctrine
President Taft, in his first message to Congress, has said that it has
advanced sensibly towards general acceptance; and that
maintenance of its positions in the future need cause less anxiety
than it has in the past. Admitting this, and disregarding the fact that
the respect conceded to it by Europe depends in part at least upon
European rivalries modifying European ability to intervene,—a
condition which may change as suddenly as has the power of Russia
within the decade,—it remains obvious that the policy of the Open
Door requires naval power quite as really and little less directly than
the Monroe Doctrine. For the scene of the Open Door contention is
the Pacific; the gateway to the Pacific for the United States is the
Isthmus; the communications to the Isthmus are by way of the Gulf
of Mexico and the Caribbean Sea. The interest of that maritime
region therefore is even greater now than it was when I first
undertook the strategic study of it, over twenty years ago. Its
importance to the Monroe Doctrine and to general commercial
interests remains, even if modified.
At the date of my first attempt to make this study of the Caribbean,
and to formulate certain principles relative to Naval Strategy, there
scarcely could be said to exist any defined public consciousness of
European and American interest in sea power, and in the methods of
its application which form the study of Strategy. The most striking
illustration of this insensibility to the sea was to be found in
Bismarck, who in a constructive sense was the greatest European
statesman of that day. After the war with France and the acquisition
of Alsace and Lorraine, he spoke of Germany as a state satiated with
territorial expansion. In the matter of external policy she had
reached the limits of his ambitions for her; and his mind thenceforth
was set on internal development, which should harmonize the body
politic and ensure Germany the unity and power which he had won
for her. His scheme of external relations did not stretch beyond
Europe. He was then too old to change to different conceptions,
although he did not neglect to follow the demand of the people as
their industry and commerce developed.
The contrast between the condition of indifference to the sea
which he illustrated and that which now exists is striking; and the
German Empire, which owes to him above all men its modern
greatness, offers the most conspicuous illustration of the change. The
new great navies of the world since 1887 are the German, the
Japanese, and the American. Every state in Europe is now awake to
the fact that the immediate coming interests of the world, which are
therefore its own national interest, must be in the other continents.
Europe in its relatively settled conditions offers really the base of
operations for enterprises and decisive events, the scene of which
will be in countries where political or economical backwardness must
give place to advances which will be almost revolutionary in kind.
This can scarcely be accomplished without unsettlements, the
composing of which will depend upon force. Such force by a
European state—with the single exception of Russia, and possibly, in
a less degree, of Austria—can be exerted only through a navy.

You might also like