You are on page 1of 51

Bayesian Optimization : Theory and

Practice Using Python Peng Liu


Visit to download the full and correct content document:
https://ebookmass.com/product/bayesian-optimization-theory-and-practice-using-pyth
on-peng-liu/
Peng Liu

Bayesian Optimization
Theory and Practice Using Python
Peng Liu
Singapore, Singapore

ISBN 978-1-4842-9062-0 e-ISBN 978-1-4842-9063-7


https://doi.org/10.1007/978-1-4842-9063-7

© Peng Liu 2023

Apress Standard

The use of general descriptive names, registered names, trademarks,


service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress


Media, LLC, part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
For my wife Zheng and children Jiaxin, Jiaran, and Jiayu.
Introduction
Bayesian optimization provides a unified framework that solves the
problem of sequential decision-making under uncertainty. It includes
two key components: a surrogate model approximating the unknown
black-box function with uncertainty estimates and an acquisition
function that guides the sequential search. This book reviews both
components, covering both theoretical introduction and practical
implementation in Python, building on top of popular libraries such as
GPyTorch and BoTorch. Besides, the book also provides case studies on
using Bayesian optimization to seek a simulated function's global
optimum or locate the best hyperparameters (e.g., learning rate) when
training deep neural networks. The book assumes readers with a
minimal understanding of model development and machine learning
and targets the following audiences:
Students in the field of data science, machine learning, or
optimization-related fields
Practitioners such as data scientists, both early and middle in their
careers, who build machine learning models with good-performing
hyperparameters
Hobbyists who are interested in Bayesian optimization as a global
optimization technique to seek the optimal solution as fast as
possible
All source code used in this book can be downloaded from
github.com/apress/Bayesian-optimization.
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub
(https://github.com/Apress). For more detailed information, please
visit http://www.apress.com/source-code.
Acknowledgments
This book summarizes my learning journey in Bayesian optimization
during my (part-time) Ph.D. study. It started as a personal interest in
exploring this area and gradually grew into a book combining theory
and practice. For that, I thank my supervisors, Teo Chung Piaw and
Chen Ying, for their continued support in my academic career.
Table of Contents
Chapter 1:​Bayesian Optimization Overview
Global Optimization
The Objective Function
The Observation Model
Bayesian Statistics
Bayesian Inference
Frequentist vs.​Bayesian Approach
Joint, Conditional, and Marginal Probabilities
Independence
Prior and Posterior Predictive Distributions
Bayesian Inference:​An Example
Bayesian Optimization Workflow
Gaussian Process
Acquisition Function
The Full Bayesian Optimization Loop
Summary
Chapter 2:​Gaussian Processes
Reviewing the Gaussian Basics
Understanding the Covariance Matrix
Marginal and Conditional Distribution of Multivariate
Gaussian
Sampling from a Gaussian Distribution
Gaussian Process Regression
The Kernel Function
Extending to Other Variables
Learning from Noisy Observations
Gaussian Process in Practice
Drawing from GP Prior
Obtaining GP Posterior with Noise-Free Observations
Working with Noisy Observations
Experimenting with Different Kernel Parameters
Hyperparameter Tuning
Summary
Chapter 3:​Bayesian Decision Theory and Expected Improvement
Optimization via the Sequential Decision-Making
Seeking the Optimal Policy
Utility-Driven Optimization
Multi-step Lookahead Policy
Bellman’s Principle of Optimality
Expected Improvement
Deriving the Closed-Form Expression
Implementing the Expected Improvement
Using Bayesian Optimization Libraries
Summary
Chapter 4:​Gaussian Process Regression with GPyTorch
Introducing GPyTorch
The Basics of PyTorch
Revisiting GP Regression
Building a GP Regression Model
Fine-Tuning the Length Scale of the Kernel Function
Fine-Tuning the Noise Variance
Delving into Kernel Functions
Combining Kernel Functions
Predicting Airline Passenger Counts
Summary
Chapter 5:​Monte Carlo Acquisition Function with Sobol Sequences
and Random Restart
Analytic Expected Improvement Using BoTorch
Introducing Hartmann Function
GP Surrogate with Optimized Hyperparameters
Introducing the Analytic EI
Optimization Using Analytic EI
Grokking the Inner Optimization Routine
Using MC Acquisition Function
Using Monte Carlo Expected Improvement
Summary
Chapter 6:​Knowledge Gradient:​Nested Optimization vs.​One-Shot
Learning
Introducing Knowledge Gradient
Monte Carlo Estimation
Optimizing Using Knowledge Gradient
One-Shot Knowledge Gradient
Sample Average Approximation
One-Shot Formulation of KG Using SAA
One-Shot KG in Practice
Optimizing the OKG Acquisition Function
Summary
Chapter 7:​Case Study:​Tuning CNN Learning Rate with BoTorch
Seeking Global Optimum of Hartmann
Generating Initial Conditions
Updating GP Posterior
Creating a Monte Carlo Acquisition Function
The Full BO Loop
Hyperparameter Optimization for Convolutional Neural
Network
Using MNIST
Defining CNN Architecture
Training CNN
Optimizing the Learning Rate
Entering the Full BO Loop
Summary
Index
About the Author
Peng Liu
is an assistant professor of quantitative
finance (practice) at Singapore
Management University and an adjunct
researcher at the National University of
Singapore. He holds a Ph.D. in Statistics
from the National University of
Singapore and has ten years of working
experience as a data scientist across the
banking, technology, and hospitality
industries.
About the Technical Reviewer
Jason Whitehorn
is an experienced entrepreneur and
software developer and has helped many
companies automate and enhance their
business solutions through data
synchronization, SaaS architecture, and
machine learning. Jason obtained his
Bachelor of Science in Computer Science
from Arkansas State University, but he
traces his passion for development back
many years before then, having first
taught himself to program BASIC on his
family’s computer while in middle
school. When he’s not mentoring and
helping his team at work, writing, or
pursuing one of his many side-projects,
Jason enjoys spending time with his wife and four children and living in
the Tulsa, Oklahoma, region. More information about Jason can be
found on his website: https://jason.whitehorn.us.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2023
P. Liu, Bayesian Optimization
https://doi.org/10.1007/978-1-4842-9063-7_1

1. Bayesian Optimization Overview


Peng Liu1
(1) Singapore, Singapore

As the name suggests, Bayesian optimization is an area that studies


optimization problems using the Bayesian approach. Optimization aims
at locating the optimal objective value (i.e., a global maximum or
minimum) of all possible values or the corresponding location of the
optimum in the environment (the search domain). The search process
starts at a specific initial location and follows a particular policy to
iteratively guide the following sampling locations, collect new
observations, and refresh the guiding policy.
As shown in Figure 1-1, the overall optimization process consists of
repeated interactions between the policy and the environment. The
policy is a mapping function that takes in a new input observation (plus
historical ones) and outputs the following sampling location in a
principled way. Here, we are constantly learning and improving the
policy, since a good policy guides our search toward the global
optimum more efficiently and effectively. In contrast, a good policy
would save the limited sampling budget on promising candidate
locations. On the other hand, the environment contains the unknown
objective function to be learned by the policy within a specific
boundary. When probing the functional value as requested by the
policy, the actual observation revealed by the environment to the policy
is often corrupted by noise, making learning even more challenging.
Thus, Bayesian optimization, a specific approach for global
optimization, would like to learn a policy that can help us efficiently and
effectively navigate to the global optimum of an unknown, noise-
corrupted environment as quickly as possible.

Figure 1-1 The overall Bayesian optimization process. The policy digests the
historical observations and proposes the new sampling location. The environment
governs how the (possibly noise-corrupted) observation at the newly proposed
location is revealed to the policy. Our goal is to learn an efficient and effective policy
that could navigate toward the global optimum as quickly as possible

Global Optimization
Optimization aims to locate the optimal set of parameters of interest
across the whole domain through carefully allocating limited resources.
For example, when searching for the car key at home before leaving for
work in two minutes, we would naturally start with the most promising
place where we would usually put the key. If it is not there, think for a
little while about the possible locations and go to the next most
promising place. This process iterates until the key is found. In this
example, the policy is digesting the available information on previous
searches and proposing the following promising location. The
environment is the house itself, revealing if the key is placed at the
proposed location upon each sampling.
This is considered an easy example since we are familiar with the
environment in terms of its structural design. However, imagine
locating an item in a totally new environment. The policy would need to
account for the uncertainty due to unfamiliarity with the environment
while sequentially determining the next sampling location. When the
sampling budget is limited, as is often the case in real-life searches in
terms of time and resources, the policy needs to argue carefully on the
utility of each candidate sampling location.
Let us formalize the sequential global optimization using
mathematical terms. We are dealing with an unknown scalar-valued
objective function f based on a specific domain Α. In other words, the
unknown subject of interest f is a function that maps a certain sample
in Α to a real number in ℝ, that is, f : Α → ℝ. We typically place no
specific assumption about the nature of the domain Α other than that it
should be a bounded, compact, and convex set.
Unless otherwise specified, we focus on the maximization setting
instead of minimization since maximizing the objective function is
equivalent to minimizing the negated objective, and vice versa. The
optimization procedure thus aims at locating the global maximum f∗ or
its corresponding location x∗ in a principled and systematic manner.
Mathematically, we wish to locate f∗ where

Or equivalently, we are interested in its location x∗ where

Figure 1-2 provides an example one-dimensional objective function


with its global maximum f ∗ and its location x∗ highlighted. The goal of
global optimization is thus to systematically reason about a series of
sampling decisions within the total search space Α, so as to locate the
global maximum as fast as possible, that is, sampling as few times as
possible.
Figure 1-2 An example objective function with the global maximum and its location
marked with star. The goal of global optimization is to systematically reason about a
series of sampling decisions so as to locate the global maximum as fast as possible
Note that this is a nonconvex function, as is often the case in real-life
functions we are optimizing. A nonconvex function means we could not
resort to first-order gradient-based methods to reliably search for the
global optimum since it will likely converge to a local optimum. This is
also one of the advantages of Bayesian optimization compared with
other gradient-based optimization procedures.

The Objective Function


There are different types of objective functions. For example, some
functions are wiggly shaped, while others are smooth; some are convex,
while others are nonconvex. An objective function is an unknown object
to us; the problem would be considered solved if we could access its
underlying mathematical form. Many complex functions are almost
impossible to be expressed using an explicit expression. For Bayesian
optimization, the specific type of objective function typically bears the
following attributes:
We do not have access to the explicit expression of the objective
function, making it a “black-box” function. This means that we can
only interact with the environment, that is, the objective function, to
perform a functional evaluation by sampling at a specific location.
The returned value by probing at a specific location is often
corrupted by noise and does not represent the exact true value of the
objective function at that location. Due to the indirect evaluation of
its actual value, we need to account for such noise embedded in the
actual observations from the environment.
Each functional evaluation is costly, thus ruling out the option for an
exhaustive probing. We need to have a sample-efficient method to
minimize the number of evaluations of the environment while trying
to locate its global optimum. In other words, the optimizer needs to
fully utilize the existing observations and systematically reason
about the next sampling decision so that the limited resource is well
spent on promising locations.
We do not have access to its gradient. When the functional evaluation
is relatively cheap and the functional form is smooth, it would be
very convenient to compute the gradient and optimize using the first-
order procedure such as gradient descent. Access to the gradient is
necessary for us to understand the adjacent curvature of a particular
evaluation point. With gradient evaluations, the follow-up direction
of travel is easier to determine.
The “black-box” function is challenging to optimize for the
preceding reasons. To further elaborate on the possible functional form
of the objective, we list three examples in Figure 1-3. On the left is a
convex function with only one global minimum; this is considered easy
for global optimization. In the middle is a nonconvex function with
multiple local optima; it is difficult to ascertain if the current local
optimum is also globally optimal. It is also difficult to identify whether
this is a flat region vs. a local optimum for a function with a flat region
full of saddle points. All three scenarios are in a minimization setting.
Figure 1-3 Three possible functional forms. On the left is a convex function whose
optimization is easy. In the middle is a nonconvex function with multiple local
minima, and on the right is also a nonconvex function with a wide flat region full of
saddle points. Optimization for the latter two cases takes a lot more work than for
the first case
Let us look at one example of hyperparameter tuning when training
machine learning models. A machine learning model is a function that
involves a set of parameters to be optimized given the input data. These
parameters are automatically tuned via a specific optimization
procedure, typically governed by a set of corresponding meta
parameters called hyperparameters, which are fixed before the model
training starts. For example, when training deep neural networks using
the gradient descent algorithm, a learning rate that determines the step
size of each parameter update needs to be manually selected in
advance. If the learning rate is too large, the model may diverge and
eventually fails to learn. If the learning rate is too small, the model may
converge very slowly as the weights are updated by only a small margin
in this iteration. See Figure 1-4 for a visual illustration.
Figure 1-4 Slow convergence due to a small learning rate on the left and divergence
due to a large learning rate on the right
Choosing a reasonable learning rate as a preset hyperparameter
thus plays a critical role in training a good machine learning model.
Locating the best learning rate and other hyperparameters is an
optimization problem that fits Bayesian optimization. In the case of
hyperparameter tuning, evaluating each learning rate is a time-
consuming exercise. The objective function would generally be the
model’s final test set loss (in a minimization setting) upon model
convergence. A model needs to be fully trained to obtain a single
evaluation, which typically involves hundreds of epochs to reach stable
convergence. Here, one epoch is a complete pass of the entire training
dataset. The book’s last chapter covers a case study on tuning the
learning rate using Bayesian optimization.
The functional form of the test set loss or accuracy may also be
highly nonconvex and multimodal for the hyperparameters. Upon
convergence, it is not easy to know whether we are in a local optimum,
a saddle point, or a global optimum. Besides, some hyperparameters
may be discrete, such as the number of nodes and layers when training
a deep neural network. We could not calculate its gradient in such a
case since it requires continuous support in the domain.
The Bayesian optimization approach is designed to tackle all these
challenges. It has been shown to deliver good performance in locating
the best hyperparameters under a limited budget (i.e., the number of
evaluations allowed). It is also widely and successfully used in other
fields, such as chemical engineering.
Next, we will delve into the various components of a typical
Bayesian optimization setup, including the observation model, the
optimization policy, and the Bayesian inference.

The Observation Model


Earlier, we mentioned that a functional evaluation would give an
observation about the true objective function, and the observation may
likely be different from the true objective value due to noise. The
observations gathered for the policy learning would thus be inexact and
corrupted by an additional noise term, which is often assumed to be
additive. The observation model is an approach to formalize the
relationship between the true objective function, the actual
observation, and the noise. It governs how the observations would be
revealed from the environment to the policy.
Figure 1-5 illustrates a list of observations of the underlying
objective function. These observations are dislocated from the objective
function due to additive random noises. These additive noises manifest
as the vertical shifts between the actual observations and the
underlying objective function. Due to these noise-induced deviations
inflicted on the observations, we need to account for such uncertainty
in the observation model. When learning a policy based on the actual
observations, the policy also needs to be robust enough to focus on the
objective function’s underlying pattern and not be distracted by the
noises. The model we use to approximate the objective function, while
accounting for uncertainty due to the additive noise, is typically a
Gaussian process. We will cover it briefly in this chapter and in more
detail in the next chapter.
Figure 1-5 Illustrating the actual observations (in dots) and the underlying
objective function (in dashed line). When sampling at a specific location, the
observation would be disrupted by an additive noise. The observation model thus
determines how the observation would be revealed to the policy, which needs to
account for the uncertainty due to noise perturbation
To make our discussion more precise, let us use f (x) to denote the
(unknown) objective function value at location x. We sometimes write f
(x) as f for simplicity. We use y to denote the actual observation at
location x, which will slightly differ from f due to noise perturbation. We
can thus express the observation model, which governs how the policy
sees the observation from the environment, as a probability
distribution of y based on a specific location x and true function value f:

Let us assume an additive noise term ε inflicted on f; the actual


observation y can thus be expressed as

Here, the noise term ε arises from measurement error or inaccurate


statistical approximation, although it may disappear in certain
computer simulations. A common practice is to treat the error as a
random variable that follows a Gaussian distribution with a zero mean
and fixed standard deviation σ, that is, ε~N(0, σ2). Note that it is
unnecessary to fix σ across the whole domain A; the Bayesian
optimization allows for both homoscedastic noise (i.e., fixed σ across A)
and heteroskedastic noise (i.e., different σ that depends on the specific
location in A).
Therefore, we can formulate a Gaussian observation model as
follows:

This means that for a specific location x, the actual observation y is


treated as a random variable that follows a Gaussian/normal
distribution with mean f and variance σ2. Figure 1-6 illustrates an
example probability distribution of y centered around f. Note that the
variance of the noise is often estimated by sampling a few initial
observations and is expected to be small, so that the overall observation
model still strongly depends on and stays close to f.

Figure 1-6 Assuming a normal probability distribution for the actual observation as
a random variable. The Gaussian distribution is centered around the objective
function f value evaluated at a given location x and spread by the variance of the
noise term
The following section introduces Bayesian statistics to lay the
theoretical foundation as we work with probability distributions along
the way.

Bayesian Statistics
Bayesian optimization is not a particular algorithm for global
optimization; it is a suite of algorithms based on the principles of
Bayesian inference. As the optimization proceeds in each iteration, the
policy needs to determine the next sampling decision or if the current
search needs to be terminated. Due to uncertainty in the objective
function and the observation model, the policy needs to cater to such
uncertainty upon deciding the following sampling location, which bears
both an immediate impact on follow-up decisions and a long-term
effect on all future decisions. The samples selected thus need to
reasonably contribute to the ultimate goal of global optimization and
justify the cost incurred due to sampling.
Using Bayesian statistics in optimization paves the way for us to
systematically and quantitatively reason about these uncertainties
using probabilities. For example, we would place a prior belief about
the characteristics of the objective function and quantify its
uncertainties by assigning high probability to specific ranges of values
and low probability to others. As more observations are collected, the
prior belief is gradually updated and calibrated toward the true
underlying distribution of the objective function in the form of a
posterior distribution.
We now cover the fundamental concepts and tools of Bayesian
statistics. Understanding these sections is essential to appreciate the
inner workings of Bayesian optimization.

Bayesian Inference
Bayesian inference essentially relies on the Bayesian formula (also
called Bayes’ rule) to reason about the interactions among three
components: the prior distribution p(θ) where θ represents the
parameter of interest, the likelihood p(data| θ) given a specific
parameter θ, and the posterior distribution p(θ| data). There is one
more component, the evidence of the data p(data), which is often not
computable. The Bayesian formula is as follows:

Let us look closely at this widely used, arguably the most important
formula in Bayesian statistics. Remember that any Bayesian inference
procedure aims to derive the posterior distribution p(θ| data) (or
calculate its marginal expectation) for the parameter of interest θ, in
the form of a probability density function. For example, we might end
up with a continuous posterior distribution as in Figure 1-7, where θ
varies from 0 to 1, and all the probabilities (i.e., area under the curve)
would sum to 1.

Figure 1-7 Illustrating a sample (continuous) posterior distribution for the


parameter of interest. The specific shape of the curve will change as new data are
being collected

We would need access to three components to obtain the posterior


distribution of θ. First, we need to derive the probability of seeing the
actual data given our choice of θ, that is, p(data| θ). This is also called
the likelihood term since we are assessing how likely it is to generate
the data after specifying a certain observation model for the data. The
likelihood can be calculated based on the assumed observation model
for data generation.
The second term p(θ) represents our prior belief about the
distribution of θ without observing any actual data; we encode our pre-
experimental knowledge of the parameter θ in this term. For example,
p(θ) could take the form of a uniform distribution that assigns an equal
probability to any value between 0 and 1. In other words, all values in
this range are equally likely, and this is a common prior belief we would
place on θ given that we do not have any information that suggests a
preference over specific values. However, as we collect more
observations and gather more data, the prior distribution will play a
decreasing role, and the subjective belief will gradually reduce in
support of the factual evidence in the data. As shown in Figure 1-8, the
distribution of θ will progressively approach a normal distribution
given that more data is being collected, thus forming a posterior
distribution that better approximates the true distribution of θ.

Figure 1-8 Updating the prior uniform distribution toward a posterior normal
distribution as more data is collected. The role of the prior distribution decreases as
more data is collected to support the approximation to the true underlying
distribution

The last term is the denominator p(data), also referred to as the


evidence, which represents the probability of obtaining the data over
all different choices of θ and serves as a normalizing constant
independent of θ in Bayes’ theorem. This is the most difficult part to
compute among all the components since we need to integrate over all
possible values of θ by taking an integration. For each given θ, the
likelihood is calculated based on the assumed observation model for
data generation, which is the same as how the likelihood term is
calculated. The difference is that the evidence considers every possible
value of θ and weights the resulting likelihood based on the probability
of observing a particular θ. Since the evidence is not connected to θ, it is
often ignored when analyzing the proportionate change in the
posterior. As a result, it focuses only on the likelihood and the prior
alone.
A relatively simple case is when the prior p(θ) and the likelihood
p(data| θ) are conjugate, making the resulting posterior p(θ| data)
analytic and thus easy to work with due to its closed-form expression.
Bayesian inference becomes much easier and less restrictive if we can
write down the explicit form and generate the exact shape of the
posterior p(θ| data) without resorting to sampling methods. The
posterior will follow the same distribution as the prior when the prior
is conjugate with the likelihood function. One example is when both the
prior and the likelihood functions follow a normal distribution, the
resulting posterior will also be normally distributed. However, when
the prior and the likelihood are not conjugate, we can still get more
insight on the posterior distribution via efficient sampling techniques
such as Gibbs sampling.

Frequentist vs. Bayesian Approach


The Bayesian approach is a systematic way of assigning probabilities to
possible values of θ and updating these probabilities based on the
observed data. However, sometimes we are only interested in the most
probable (expected) value of θ that gives rise to the data we observe.
This can be achieved using the frequentist approach, treating the
parameter of interest (i.e., θ) as a fixed quantity instead of a random
variable. This approach is often adopted in the machine learning
community, placing a strong focus on optimizing a specific objective
function to locate the optimal set of parameters.
More generally, we use the frequentist approach to find the correct
answer about θ. For example, we can locate the value of θ by
maximizing the joint probability of the actual data via maximum
likelihood estimation (MLE), where the resulting solution is
. There is no distribution involved with θ since
we treat it as a fixed quantity, which makes the calculation easier as we
only need to work with the probability distribution for the data. The
final solution using the frequentist approach is a specific value of θ. And
since we are working with samples that come from the underlying data-
generating distribution, different samples would vary from each other,
and the goal is to find the optimal parameter θ that best describes the
current sample we are observing.
On the other hand, the Bayesian approach takes on the extra
complexity by treating θ as a random variable with its own probability
distribution, which gets updated as more data is collected. This
approach offers a holistic view on all possible values of θ and the
corresponding probabilities instead of the most probable value of θ
alone. This is a different approach because the data is now treated as
fixed and the parameter θ as a random variable. The optimal
probability distribution for θ is then derived, given the observed fixed
sample. There is no right or wrong in the Bayesian approach, only
probabilities. The final solution is thus a probability distribution of θ
instead of one specific value. Figure 1-9 summarizes these two different
schools of thought.
Figure 1-9 Comparing the frequentist approach and the Bayesian approach
regarding the parameter of interest. The frequentist approach treats θ as a fixed
quantity that can be estimated via MLE, while the Bayesian approach employs a
probability distribution which gets refreshed as more data is collected

Joint, Conditional, and Marginal Probabilities


We have been characterizing the random variable θ using a
(continuous) probability distribution p(θ). A probability distribution is
a function that maps a specific value of θ to a probability, and the
probabilities of all values of θ sum to one, that is, ∫ p(θ)dθ = 1.
Things become more interesting when we work with multiple
(more than one) variables. Suppose we have two random variables x
and y, and we are interested in two events x = X and y = Y, where both X
and Y are specific values that x and y may assume, respectively. Also, we
assume the two random variables are dependent in some way. This
would lead us to three types of probabilities commonly used in modern
machine learning and Bayesian optimization literature: joint
probability, marginal probability, and conditional probability, which we
will look at now in more detail.
The joint probability of the two events refers to the probability of
them occurring simultaneously. It is also referred to as the joint
probability distribution since the probability now represents all
possible combinations of the two simultaneous events.
We can write the joint probability of the two events as p(X and
Y) = p(x = X ∩ y = Y) = p(X ∩ Y). Using the chain rule of probability, we
can further write p(X and Y) = p(X given Y) ∗ p(Y) = p(X| Y)p(Y), where
p(X| Y) denotes the probability of event x = X occurs given that the
event y = Y has occurred. It is thus referred to as conditional probability,
as the probability of the first event is now conditioned on the second
event. All conditional probabilities for a (continuous) random variable x
given a specific value of another random variable (i.e., y = Y) form the
conditional probability distribution p(x| y = Y). More generally, we can
write the joint probability distribution of random variables x and y as
p(x, y) and conditional probability distribution as p(x ∣ y).
The joint probability is also symmetrical, that is, p(X and Y) = p(Y
and X), which is a result of the exchangeability property of probability.
Another random document with
no related content on Scribd:
poor Galilean, and even their gravest and most puzzling attacks
upon his wisdom and prudence, turned into an absolute jest against
them,――it was quite clear that the amused and delighted multitude
would soon cease to regard the authority and opinions of their
venerable religious and legal rulers, whose subtleties were so easily
foiled by one of the common, uneducated mass. But the very
circumstances which effected and constituted the evil, were also the
grand obstacles to the removal of it. Jesus was by these means
seated firmly in the love and reverence of the people,――and of the
vast numbers of strangers then in Jerusalem at the feast, there were
very many who would have their feelings strongly excited in his
favor, by the circumstance that they, as well as he, were Galileans,
and would therefore be very apt to make common cause with him in
case of any violent attack. All these obstacles required management;
and after having been very many times foiled in their attempts to
seize him, by the resolute determination of the thousands by whom
he was always encircled, to defend him, they found that they must
contrive some way to get hold of him when he was without the
defenses of this admiring host. This could be done, of course, only
by following him to his secret haunts, and coming quietly upon him
before the multitude could assemble to his aid. But his movements
were altogether beyond their notice. No armed band could follow him
about, as he went from the city to the country in his daily and nightly
walks. They needed some spy who could watch his private
movements when unattended, save by the little band of the twelve,
and give notice of the favorable moment for a seizure, when the
time, the place, and the circumstances, would all conspire to prevent
a rescue. Thus taken, he might be safely lodged in some of the
impregnable fortresses of the temple and city, so as to defy the
momentary burst of popular rage, on finding that their idol had been
taken away. They knew too, the fickle character of the commonalty,
well enough to feel certain, that when the tide of condemnation was
once strongly set against the Nazarene, the lip-worship of
“Hosannas” could be easily turned, by a little management, into the
ferocious yell of deadly denunciation. The mass of the people are
always essentially the same in their modes of action. Mobs were
then managed by the same rules as now, and demagogues were
equally well versed in the tricks of their trade. Besides, when Jesus
had once been formally indicted and presented before the secular
tribunal of the Roman governor, as a rioter and seditious person, no
thought of a rescue from the military force could be thought of; and
however unwilling Pilate might be to minister to the wishes of the
Jews, in an act of unnecessary cruelty, he could not resist a call thus
solemnly made to him, in the character of preserver of the Roman
sway, though he would probably have rejected entirely any
proposition to seize Jesus by a military force, in open day, in the
midst of the multitude, so as to create a troublesome and bloody
tumult, by such an imprudent act. In the full consideration of all these
difficulties, the Jewish dignitaries were sitting in conclave, contriving
means to effect the settlement of their troubles, by the complete
removal of him who was unquestionably the cause of all. At once
their anxious deliberations were happily interrupted by the entrance
of the trusted steward of the company of Jesus, who changed all
their doubts and distant hopes into absolute certainty, by offering, for
a reasonable consideration, to give up Jesus into their hands, a
prisoner, without any disturbance or riot. How much delay and
debate there was about terms, it would be hard to say; but after all,
the bargain made, does not seem to have been greatly to the credit
of the liberality of the Sanhedrim, or the sharpness of Judas. Thirty
of the largest pieces of silver then coined, would make but a poor
price for such an extraordinary service, even making all allowance
for a scarcity of money in those times. And taking into account the
wealth and rank of those concerned, as well as the importance of the
object, it is fair to pronounce them a very mean set of fellows. But
Judas especially seems to forfeit almost all right to the character
given him of acuteness in money matters; and it is only by supposing
him to be quite carried out of his usual prudence, by his woful
abandonment to crime, that so poor a bargain can be made
consistent with the otherwise reasonable view of his character.

Thirty pieces of silver.――The value of these pieces is seemingly as vaguely expressed


in the original as in the translation; but a reference to Hebrew usages throws some light on
the question of definition. The common Hebrew coin thus expressed was the
shekel,――equivalent to the Greek didrachmon, and worth about sixteen cents. In Hebrew
the expression, thirty “shekels of silver,” was not always written out in full; but the name of
the coin being omitted, the expression was always equally definite, because no other coin
was ever left thus to be implied. Just so in English, the phrase, “a million of money,” is
perfectly well understood here, to mean “a million of dollars;” while in England, the current
coin of that country would make the expression mean so many pounds. In the same
manner, to say, in this country, that any thing or any man is worth “thousands,” always
conveys, with perfect definiteness, the idea of “dollars;” and in every other country the same
expression would imply a particular coin. Thirty pieces of silver, each of which was worth
sixteen cents, would amount only to four dollars and eighty cents, which are just one pound
sterling. A small price for the great Jewish Sanhedrim to pay for the ruin of their most
dangerous foe! Yet for this little sum, the Savior of the world was bought and sold!

Having thus settled this business, the cheaply-purchased traitor


returned to the unsuspecting fellowship of the apostles, mingling with
them, as he supposed, without the slightest suspicion on the part of
any one, respecting the horrible treachery which he had contrived for
the bloody ruin of his Lord. But there was an eye, whose power he
had never learned, though dwelling beneath its gaze for
years,――an eye, which saw the vainly hidden results of his
treachery, even as for years it had scanned the base motives which
governed him. Yet no word of reproach or denunciation broke forth
from the lips of the betrayed One; the progress of crime was suffered
unresistedly to bear him onward to the mournfully necessary
fulfilment of his destiny. Judas meanwhile, from day to day, waited
and watched for the most desirable opportunity of meeting his
engagements with his priestly employers. The first day of the feast of
unleavened bread having arrived, Jesus sat down at evening to eat
the Paschal lamb with his twelve disciples, alone. The whole twelve
were there without one exception,――and among those who
reclined around the table, sharing in the social delights of the
entertainment which celebrated the beginning of the grand national
festival, was the dark-souled accuser also, like Satan among the
sons of God. Even here, amid the general joyous hilarity, his great
scheme of villainy formed the grand theme of his
meditations,――and while the rest were entering fully into the natural
enjoyments of the occasion, he was brooding over the best means of
executing his plans. During the supper, after the performance of the
impressive ceremony of washing their feet, Jesus made a sudden
transition from the comments with which he was illustrating it; and, in
a tone of deep and sorrowful emotion, suddenly exclaimed, “I
solemnly assure you, that one of you will betray me.” This surprising
assertion, so emphatically made, excited the most distressful
sensations among the little assembly;――all enjoyment was at an
end; and grieved by the imputation, in which all seemed included
until the individual was pointed out, they each earnestly inquired,
“Lord, is it I?” As they sat thus looking in the most painful doubt
around their lately cheerful circle, the disciple who held the place of
honor and affection at the table, at the request of Peter, whose
position gave him less advantage for familiar and private
conversation,――plainly asked of Jesus, “Who is it, Lord?” Jesus, to
make his reply as deliberate and impressive as possible, said, “It is
he to whom I shall give a sop when I have dipped it.” The design of
all this circumlocution in pointing out the criminal, was, to mark the
enormity of the offense. “He that eateth bread with me, hath lifted up
his heel against me.” It was his familiar friend, his chosen
companion, enjoying with him at that moment the most intimate
social pleasures of the entertainment, and occupying one of the
places nearest to him, at the board. As he promised, after dipping
the sop, he gave it to Judas Iscariot, who, receiving it, was moved to
no change in his dark purpose; but with a new Satanic spirit,
resolved immediately to execute his plan, in spite of this open
exposure, which, he might think, was meant to shame him from his
baseness. Jesus, with an eye still fixed on his most secret inward
movements, said to him, “What thou doest, do quickly.” Judas, utterly
lost to repentance and to shame, coolly obeyed the direction, as if it
had been an ordinary command, in the way of his official duty, and
went out at the words of Jesus. All this, however, was perfectly
without meaning, to the wondering disciples, who, not yet recovered
from their surprise at the very extraordinary announcement which
they had just heard of the expected treachery, could not suppose
that this quiet movement could have anything to do with the
occurrence which preceded it; but concluded that Judas was going
about the business necessary for the preparation of the next day’s
festal entertainment,――or that he was following the directions of
Jesus about the charity to be administered to the poor out of the
funds in his keeping, in accordance with the commendable Hebrew
usage of remembering the poor on great occasions of
enjoyment,――a custom to which, perhaps, the previous words of
Judas, when he rebuked the waste of the ointment by Mary, had
some especial reference, since at that particular time, money was
actually needed for bestowment in alms to the poor. Judas, after
leaving the place where the declaration of Jesus had made him an
object of such suspicion and dislike, went, under the influence of that
evil spirit, to whose direction he was now abandoned, directly to the
chief priests, (who were anxiously waiting the fulfilment of his
promise,) and made known to them that the time was now come.
The band of watchmen and servants, with their swords and cudgels,
were accordingly mustered and put under the guidance of Judas,
who, well knowing the place to which Jesus would of course go from
the feast, conducted his band of low assistants across the brook
Kedron, to the garden of Gethsemane. On the way he arranged with
them the sign by which they should recognize, in spite of the
darkness and confusion, the person whose capture was the grand
object of this expedition. “The man whom I shall kiss is he: seize
him.” Entering the garden, at length, he led them straight to the spot
which his intimate familiarity with Jesus enabled him to know, as his
favorite retreat. Going up to him with the air of friendly confidence,
he saluted him, as if rejoiced to find him, even after this brief
absence,――another instance of the very close intimacy which had
existed between the traitor and the betrayed. Jesus submitted to this
hollow show, without any attempt to repulse the movement which
marked him for destruction, only saying, in mild but expressive
reproach,――“Judas! Betrayest thou the Son of Man with a kiss?”
Without more delay he announced himself in plain terms, to those
who came to seize him; thus showing how little need there was of
artful contrivance in taking one who did not seek to escape. “If ye
seek Jesus of Nazareth, I am he.” The simple majesty with which
these words were uttered, was such as to overawe even the low
officials; and it was not till he himself had again distinctly reminded
them of their object, that they could execute their errand. So vain
was the arrangement of signals, which had been studiously made by
the careful traitor.

No further mention is made of Iscariot after the scene of his


treachery, until the next morning, when Jesus had been condemned
by the high court of the Sanhedrim, and dragged away to undergo
punishment from the secular power. The sun of another day had
risen on his crime; and after a very brief interval, he now had time for
cool meditation on the nature and consequences of his act. Spite
and avarice had both now received their full gratification. The thirty
pieces of silver were his, and the Master whose instructions he had
hated for their purity and spirituality, because they had made known
to him the vileness of his own character and motives, was now in the
hands of those who were impelled, by the darkest passions, to
secure his destruction. But after all, now came the thought, and
inquiry, ‘what had the pure and holy Jesus done, to deserve this
reward at his hands?’ He had called him from the sordid pursuits of a
common life, to the high task of aiding in the regeneration of Israel.
He had taught him, labored with him, prayed for him, trusted him as
a near and worthy friend, making him the steward of all the earthly
possessions of his apostolic family, and the organ of his ministrations
of charity to the poor. All this he had done without the prospect of a
reward, surely. And why? To make him an instrument, not of the
base purposes of a low ambition;――not to acquire by this means
the sordid and bloody honors of a conqueror,――but to effect the
moral and spiritual emancipation of a people, suffering far less under
the evils of a foreign sway, than under the debasing dominion of folly
and sin. And was this an occasion to arm against him the darker
feelings of his trusted and loved companions?――to turn the
instruments of his mercy into weapons of death? Ought the mere
disappointment of a worldly-minded spirit, that was ever clinging to
the love of material things, and that would not learn the solemn truth
of the spiritual character of the Messiah’s reign, now to cause it to
vent its regrets at its own errors, in a traitorous attack upon the life of
him who had called it to a purpose whose glories and rewards it
could not appreciate? These and other mournful thoughts would
naturally rise to the repentant traitor’s mind, in the awful revulsion of
feeling which that morning brought with it. But repentance is not
atonement; nor can any change of feeling in the mind of the sinner,
after the perpetration of the sinful act, avail anything for the removal
or expiation of the evil consequences of it. So vain and unprofitable,
both to the injurer and the injured, are the tears of remorse! And
herein lay the difference between the repentance of Judas and of
Peter. The sin of Peter affected no one but himself, and was criminal
only as the manifestation of a base, selfish spirit of deceit, that fell
from truth through a vain-glorious confidence,――and the effusion of
his gushing tears might prove the means of washing away the
pollution of such an offense from his soul. But the sin of Judas had
wrought a work of crime whose evil could not be affected by any
tardy change of feeling in him. Peter’s repentance came too late
indeed, to exonerate him from guilt; because all repentance is too
late for such a purpose, when it comes after the commission of the
sin. The repentance of an evil purpose, coming in time to prevent the
execution of the act, is indeed available for good; but both Peter and
Judas came to the sense of the heinousness of sin, only after its
commission. Peter however, had no evil to repair for
others,――while Judas saw the bloody sequel of his guilt, coming
with most irrevocable certainty upon the blameless One whom he
had betrayed. Overwhelmed with vain regrets, he took the now
hateful, though once-desired price of his villainy, and seeking the
presence of his purchasers, held out to them the money, with the
useless confession of the guilt, which was too accordant with their
schemes and hopes, for them to think of redeeming him from its
consequences. The words of his confession were, “I have sinned, in
betraying innocent blood.” This late protestation was received by the
proud priests, with as much regard as might have been expected
from exulting tyranny, when in the enjoyment of the grand object of
its efforts. With a cold sneer they replied, “What is that to us? See
thou to that!” Maddened with the immovable and remorseless
determination of the haughty condemners of the just, he flung down
the price of his infamy and woe, upon the floor of the temple, and
rushed out of their presence, to seal his crimes and eternal misery
by the act that put him for ever beyond the power of redemption.
Seeking a place removed from the observation of men, he hurried
out of the city, and contriving the fatal means of death for himself,
before the bloody doom of him whom he betrayed had been fulfilled,
the wretched man saved his eyes the renewed horrors of the sight of
the crucifixion, by closing them in the sleep which earthly sights can
not disturb. But even in the mode of his death, new circumstances of
horror occurred. Swinging himself into the air, by falling from a
highth, as the cord tightened around his neck, checking his descent,
the weight of his body produced the rupture of his abdomen, and his
bowels bursting through, made him, as he swung stiffening and
convulsed in the agonies of this doubly horrid death, a disgusting
and appalling spectacle,――a monument of the vengeance of God
on the traitor, and a shocking witness of his own remorse and self-
condemnation.

A very striking difference is noticeable between the account given by Matthew of the
death of Judas, and that given by Luke in the speech of Peter, Acts i. 18, 19. The various
modes of reconciling these difficulties are found in the ordinary commentaries. In respect to
a single expression in Acts i. 18, there is an ingenious conjecture offered by Granville Penn,
in a very interesting and learned article in the first volume of the transactions of the Royal
Society of Literature, which may very properly be mentioned here, on account of its
originality and plausibility, and because it is found only in an expensive work, hardly ever
seen in this country. Mr. Penn’s view is, that “the word ελακησε (elakese,) in Acts i. 18, is
only an inflection of the Latin verb, laqueo, (to halter or strangle,) rendered insititious in the
Hellenistic Greek, under the form λακεω.” He enters into a very elaborate argument, which
can not be given here, but an extract may be transcribed, in order to enable the learned to
apprehend the nature and force of his views. (Translated by R. S. Lit. Vol. I. P. 2, pp. 51,
52.)

“Those who have been in the southern countries of Europe know, that the operation in
question, as exercised on a criminal, is performed with a great length of cord, with which the
criminal is precipitated from a high beam, and is thus violently laqueated, or snared in a
noose, mid-way――medius or in medio; μεσος, and medius, referring to place as well as to
person; as, μεσος ὑμων ἑστηκεν. (John i. 26.) ‘Considit scopulo medius――――’ (Virgil,
Georgics, iv. 436.) ‘―――― medius prorumpit in hostes.’ (Aeneid, x. 379.)

“Erasmus distinctly perceived this sense in the words πρηνης γενομενος, although he did
not discern it in the word ελακησε, which confirms it: ‘πρηνης Graecis dicitur, qui vultu est in
terram dejecto: expressit autem gestum et habitum laqueo praefocati; alioquin, ex hoc
sane loco non poterat intelligi, quod Judas suspenderit se,’ (in loc.) And so Augustine also
had understood those words, as he shows in his Recit. in Act. Apostol. l. i. col. 474. ‘et
collem sibi alligavit, et dejectus in faciem,’ &c. Hence one MS., cited by Sabatier, for πρηνης
γενομενος, reads αποκρεμαμένος; and Jerom, in his new vulgate, has substituted suspensus
for the pronus factus of the old Latin version, which our old English version of 1542
accordingly renders, and when he was hanged.

“That which follows, and which evidently determined the vulgar interpretation of
ελακησε――εξεχυνθη παντα τα σπλαγχνα αυτου, all his bowels gushed out――states a natural
and probable effect produced, by the sudden interruption in the fall and violent capture in
the noose, in a frame of great corpulency and distension, such as Christian antiquity has
recorded that of the traitor to have been; so that a term to express rupture would have been
altogether unnecessary, and it is therefore equally unnecessary to seek for it in the verb
ελακησε. Had the historian intended to express disruption, we may justly presume that he
would have said, as he had already said in his gospel, v. 6, διερρηγνυτο, or xxiii. 45, εσχισθη
μεσος: it is difficult to conceive, that he would here have traveled into the language of
ancient Greek poetry for a word to express a common idea, when he had common terms at
hand and in practice; but he used the Roman laqueo, λακεω, to mark the infamy of the
death.

“(Πρησθεις επι τοσουτον την σαρκα, ὡστε μη δυνασθαι δειλθειν. Papias, from Routh's
Reliquiæ Sacræ tom. I. p. 9. and Oecumenius, thus rendered by Zegers, Critici Sacri, Acts i.
18, in tantum enim corpore inflatus est ut progredi non posset. The tale transmitted by those
writers of the first and tenth centuries, that Judas was crushed to death by a chariot
proceeding rapidly, from which his unwieldiness rendered him unable to escape, merits no
further attention, after the authenticated descriptions of the traitor’s death which we have
here investigated, than to suggest a possibility that the place where the suicide was
committed might have overhung a public way, and that the body falling by its weight might
have been traversed, after death, by a passing chariot;――from whence might have arisen
the tales transmitted successively by those writers; the first of whom, being an inhabitant of
Asia Minor, and therefore far removed from the theater of Jerusalem, and being also (as
Eusebius witnesses, iii. 39,) a man of a very weak mind――σφοδρα μκρος τον νουν――was
liable to be deceived by false accounts.)

“The words of St. Peter, in the Hellenistic version of St. Luke, will therefore import,
praeceps in ora fusus, laqueavit (i. e. implicuit se laqueo) medius; (i. e. in medio, inter
trabem et terram;) et effusa sunt omnia viscera ejus――throwing himself headlong, he
caught mid-way in the noose, and all his bowels gushed out. And thus the two reporters of
the suicide, from whose respective relations charges of disagreement, and even of
contradiction, have been drawn in consequence of mistaking an insititious Latin word for a
genuine Greek word of corresponding elements, are found, by tracing that insititious word to
its true origin, to report identically the same fact; the one by a single term, the other by a
periphrasis.”

Such was the end of the twelfth of Jesus Christ’s chosen ones. To
such an end was the intimate friend, the trusted steward, the festal
companion of the Savior, brought by the impulse of some not very
unnatural feelings, excited by occasion, into extraordinary action.
The universal and intense horror which the relation of his crime now
invariably awakens, is by no means favorable to a just and fair
appreciation of his sin and its motives, nor to such an honest
consideration of his course from rectitude to guilt, as is most
desirable for the application of the whole story to the moral
improvement of its readers. Originally not an infamous man, he was
numbered among the twelve as a person of respectable character,
and long held among his fellow-disciples a responsible station, which
is itself a testimony of his unblemished reputation. He was sent forth
with them, as one of the heralds of salvation to the lost sheep of the
house of Israel. He shared with them the counsels, the instructions,
and the prayers of Jesus. If he was stupid in apprehending, and
unspiritual in conceiving the truths of the gospel, so were they. If he
was an unbeliever in the resurrection of Jesus, so were they; and
had he survived till the accomplishment of that prophecy, he could
not have been slower in receiving the evidence of the event, than
they. As it was, he died in his unbelief; while they lived to feel the
glorious removal of all their doubts, the purification of all their gross
conceptions, and the effusion of that spirit of truth, through which, by
the grace of God alone, they afterwards were what they were.
Without a merit, in faith, beyond Judas, they maintained their dim
and doubtful adherence to the truth, only by their nearer
approximation to moral perfection; and by their nobler freedom from
the pollution of sordid and spiteful feeling. Through passion alone he
fell, a victim, not to a want of faith merely,――for therein, the rest
could hardly claim a superiority,――but to the radical deficiency of
true love for Jesus, of that “charity which never faileth,” but “endureth
to the end.” It was their simple, devoted affection, which, through all
their ignorance, their grossness of conception, and their
faithlessness in his word, made them still cling to his name and his
grave, till the full revelations of his resurrection and ascension had
displaced their doubts by the most glorious certainties, and given
their faith an eternal assurance. The great cause of the awful ruin of
Judas Iscariot, then, was the fact, that he did not love Jesus. Herein
was his grand distinction from all the rest; for though their regard
was mingled with so much that was base, there was plainly, in all of
them, a solid foundation of true, deep affection. The most ambitious
and skeptical of them, gave the most unquestionable proofs of this.
Peter, John, both the Jameses, and others, are instances of the
mode in which these seemingly opposite feelings were combined.
But Judas was without this great refining and elevating principle,
which so redeemed the most sordid feelings of his fellows. It was not
merely for the love of money that he was led into this horrid crime.
The love of four dollars and eighty cents! Who can believe that this
was the sole motive? It was rather that his sordidness and
selfishness, and ambition, if he had any, lacked this single, purifying
emotion, which redeemed their characters. Is there not, in this
reflection, a moral which each Christian reader can improve to his
own use? For the lack of the love of Jesus alone, Judas fell from his
high estate, to an infamy as immortal as their fame. Wherever,
through all ages, the high heroic energy of Peter, the ready faith of
Andrew, the martyr-fire of James Boanerges, the soul-absorbing love
of John, the willing obedience of Philip, the guileless purity of
Nathanael, the recorded truth of Matthew, the slow but deep
devotion of Thomas, the blameless righteousness of James the Just,
the appellative zeal of Simon, and the earnest warning eloquence of
Jude, are all commemorated in honor and bright renown,――the
murderous, sordid spite of Iscariot, will insure him an equally lasting
proverbial shame. Truly, “the sin of judas is written with a pen
of iron on a tablet of marble.”
MATTHIAS.
The events which concern this person’s connection with the
apostolic company, are briefly these. Soon after the ascension of
Jesus, the eleven disciples being assembled in their “upper room,”
with a large company of believers, making in all, together, a meeting
of one hundred and twenty, Peter arose and presented to their
consideration, the propriety and importance of filling, in the apostolic
college, the vacancy caused by the sad defection of Judas Iscariot.
Beginning with what seems to be an apt allusion to the words of
David concerning Ahithophel,――(a quotation very naturally
suggested by the striking similarity between the fate of that ancient
traitor, and that of the base Iscariot,) he referred to the peculiarly
horrid circumstances of the death of this revolted apostle, and also
applied to these occurrences the words of the same Psalmist
concerning those upon whom he invoked the wrath of God, in words
which might with remarkable emphasis be made descriptive of the
ruin of Judas. “Let his habitation be desolate,” and “let another take
his office.” Applying this last quotation more particularly to the
exigency of their circumstances, he pronounced it to be in
accordance with the will of God that they should immediately
proceed to select a person to “take the office” of Judas. He declared
it an essential requisite for this office, moreover, that the person
should be one of those who, though not numbered with the select
twelve, had been among the intimate companions of Jesus, and had
enjoyed the honors and privileges of a familiar discipleship, so that
they could always testify of his great miracles and divine instructions,
from their own personal knowledge as eye-witnesses of his actions,
from the beginning of his divine career at his baptism by John, to the
time of his ascension.
Agreeably to this counsel of the apostolic chief, the whole
company of the disciples selected two persons from those who had
been witnesses of the great actions of Christ, and nominated them to
the apostles, as equally well qualified for the vacant office. To decide
the question with perfect impartiality, it was resolved, in conformity
with the common ancient practice in such cases, to leave the point
between these two candidates to be settled by lot; and to give this
mode of decision a solemnity proportioned to the importance of the
occasion, they first invoked, in prayer, the aid of God in the
appointment of a person best qualified for his service. They then
drew the lots of the two candidates, and Matthias being thus
selected, was thenceforth enrolled with the eleven apostles.

Of his previous history nothing whatever is known, except that,


according to what is implied in the address of Peter, he must have
been, from the beginning of Christ’s career to his ascension, one of
his constant attendants and hearers. Some have conjectured that he
was one of the seventy, sent forth by Jesus as apostles, in the same
manner as the twelve had gone; and there is nothing unreasonable
in the supposition; but still it is a conjecture merely, without any fact
to support it. The New Testament is perfectly silent with respect to
both his previous and his subsequent life, and not a fact can be
recorded respecting him. Yet the productive imaginations of the
martyrologists of the Roman and Greek churches, have carried him
through a protracted series of adventures, during his alleged
preaching of the gospel, first in Judea, and then in Ethiopia. They
also pretend that he was martyred, though as to the precise mode
there is some difference in the stories,――some relating that he was
crucified, and others, that he was first stoned and then dispatched by
a blow on the head with an axe. But all these are condemned by the
discreet writers even of the Romish church, and the whole life of
Matthias must be included among those many mysteries which can
never be in any way brought to light by the most devoted and
untiring researches of the Apostolic historian; and this dim and
unsatisfactory trace of his life may well conclude the first grand
division of a work, in which the reader will expect to find so much
curious detail of matters commonly unknown, but which no research
nor learning can furnish, for the prevention of his disappointment.
II. THE HELLENIST APOSTLES.

SAUL,
AFTERWARDS NAMED PAUL.
his country.

On the farthest north-eastern part of the Mediterranean sea,


where its waters are bounded by the great angle made by the
meeting of the Syrian coast with the Asian, there is a peculiarity in
the course of the mountain ranges, which deserves notice in a view
of the countries of that region, modifying as it does, all their most
prominent characteristics. The great chain of Taurus, which can be
traced far eastward in the branching ranges of Singara, Masius and
Niphates, running connectedly also into the distant peaks of mighty
Ararat, here sends off a spur to the shore of the Mediterranean,
which under the name of Mount Amanus meets its waters, just at
their great north-eastern angle in the ancient gulf of Issus, now
called the gulf of Scanderoon. Besides this connection with the
mountain chains of Mesopotamia and Armenia on the northeast,
from the south the great Syrian Lebanon, running very nearly parallel
with the eastern shore of the Mediterranean, at the Issic angle, joins
this common center of convergence, so insensibly losing its
individual character in the Asian ridge, that by many writers, Mount
Amanus itself is considered only a regular continuation of Lebanon.
These, however, are as distinct as any of the chains here uniting,
and the true Libanic mountains cease just at this grand natural
division of Syria from the northern coast of the Mediterranean. A
characteristic of the Syrian mountains is nevertheless prominent in
the northern chain. They all take a general course parallel with the
coast and very near it, occasionally sending out lateral ridges which
mark the projections of the shore with high promontories. Of these,
however, there are much fewer on the southern coast of Asia Minor;
and the western ridge of Taurus, after parting from the grand angle of
convergence, runs exactly parallel to the margin of the sea, in most
parts about seven miles distant. The country thus fenced off by
Taurus, along the southern coast of Asia Minor, is very distinctly
characterized by these circumstances connected with its orography,
and is in a very peculiar manner bounded and inclosed from the rest
of the continent, by these natural features. The great mountain
barrier of Taurus, as above described, stretches along the north,
forming a mighty wall, which is at each end met at right angles by a
lateral ridge, of which the eastern is Amanus, descending within a
few rods of the water, while the western is the true termination of
Taurus in that direction,――the mountains here making a grand
curve from west to south, and stretching out into the sea, in a bold
promontory, which definitely marks the farthest western limit of the
long, narrow section, thus remarkably enclosed. This simple natural
division, in the apostolic age, contained two principal artificial sub-
divisions. On the west, was the province of Pamphylia, occupying
about one fourth of the coast;――and on the east, the rest of the
territory constituted the province of Cilicia, far-famed as the land of
the birth of that great apostle of the Gentiles, whose life is the theme
of these pages.

Cilicia,――opening on the west into Pamphylia,――is elsewhere


inclosed in mountain barriers, impenetrable and impassable, except
in two or three points, which are the only places in which it is
accessible by land, though widely exposed, on the sea, by its long
open coast. Of these two adits, the most important, and the one
through which the vast proportion of its commercial intercourse with
the world, by land, has always been carried on, is the eastern, which
is just at the oft-mentioned great angle of the Mediterranean, where
the mountains descend almost to the waters of the gulf of Issus.
Mount Amanus, coming from the north-east, and stretching along the
eastern boundary of Cilicia an impassable barrier, here advances to
the shore; but just before its base reaches the water, it abruptly
terminates, leaving between the high rocks and the sea a narrow
space, which is capable of being completely commanded and
defended from the mountains which thus guard it; and forming the
only land passage out of Cilicia to the eastern coast of the
Mediterranean, it was thence anciently called “the gates of Syria.”
Through these “gates,” has always passed all the traveling by land
between Asia Minor and Palestine; and it is therefore an important
point in the most celebrated route in apostolic history. The other
main opening in the mountain walls of this region, is the passage
through the Taurus, made by the course of the Sarus, the largest
river of the province, which breaks through the northern ridge, in a
defile that is called “the gates of Cilicia.”

The boundaries of Cilicia are then,――on the north, mountainous


Cappadocia, perfectly cut off by the impenetrable chain of Taurus,
except the narrow pass through “the gates of Cilicia;”――on the
east, equally well guarded by Mount Amanus, Northern Syria, the
only land passages being through the famed “Syrian gates,” and
another defile north of the coast, toward the Euphrates;――on the
south, stretches the long margin of the sea, which in the western
two-thirds of the coast takes the name of “the Cilician strait,”
because it here flows between the mainland and the great island of
Cyprus, which lies off the shore, always in sight, being less than
thirty miles distant, the eastern third of the coast being bounded by
the waters of the gulf of Issus;――and on the west Cilicia ends in the
rough highlands of Pamphylia. The territory itself is distinguished by
natural features, into two divisions,――Rocky Cilicia and “Level
Cilicia,”――the former occupying the western third, and the latter the
eastern part,――each district being abundantly well described by the
term applied to it. Within the latter, lay the opening scenes of the
apostle’s life.
Thus peculiarly guarded, and shut off from the world, it might be
expected that this remarkable region would nourish, on the narrow
plains of its fertile shores, and the vast rough mountains of its
gigantic barriers, a race strongly marked in mental, as in physical
characteristics. In all parts of the world, the philosophical observer
may notice a relation borne by man to the soil on which he lives, and
to the air which he breathes,――hardly less striking than the
dependence of the inferior orders of created things, on the material
objects which surround them. Man is an animal, and his natural
history displays as many curious correspondences between his
varying peculiarities and the locality which he inhabits, as can be
observed between the physical constitution of inferior creatures, and
the similar circumstances which affect them. The inhabitants of a
wild, broken region, which rises into mighty inland mountains, or
sends its cliffs and vallies into a vast sea, are, in all ages and climes,
characterized by a peculiar energy and quickness of mind, which
often marks them in history as the prominent actors in events of the
highest importance to mankind in all the world. Even the dwellers of
the cities of such regions, share in that peculiar vivacity of their
countrymen, which is especially imbibed in the air of the mountains;
and carry through all the world, till new local influences have again
subjected them, the original characteristics of the land of their birth.
The restless activity and dauntless spirit of Saul, present a striking
instance of this relation of scenery to character. The ever-rolling
waters of the tideless sea on one side presenting a boundless view,
and on the other the blue mountains rearing a mighty barrier to the
vision,――the thousand streams thence rolling to the former,――the
white sands of the long plains, gemmed with the green of shaded
fountains, as well as the active movements of a busy population, all
living under these same inspiring influences,――would each have
their effect on the soul of the young Cilician as he grew up in the
midst of these modifying circumstances.

Along these shores, from the earliest period of Hellenic


colonization, Grecian enterprise had planted its busy centers of
civilization. On each favorable site, where agriculture or commerce
could thrive, cities grew up in the midst of prosperous colonies, in
which wealth and power in their rapid advance brought in the lights
of science, art, literature, and all the refinements and elegances
which Grecian colonization made the invariable accompaniments of
its march,――adorning its solid triumphs with the graceful polish of
all that could exalt the enjoyment of prosperity. Issus, Mopsuestia,
Anchialus, Selinus and others, were among the early seats of
Grecian refinement; and the more modern efforts of the Syro-
Macedonian sway, had blessed Cilicia with the fruits of royal
munificence, in such cities as Cragic Antioch, Seleucia the Rocky,
and Arsinoe; and in still later times, the ever-active and wide-
spreading beneficence of Roman dominion, had still farther
multiplied the peaceful triumphs and trophies of civilization, by here
raising or renewing cities, of which Baiae, Germanicia and
Pompeiopolis are only a specimen. But of all these monuments of
ancient or later refinement, there was none of higher antiquity or
fame than Tarsus, the city where was born this illustrious apostle,
whose life was so greatly instrumental in the triumphs of Christianity.

Tarsus stands north of the point of a wide indentation of the coast


of Cilicia, forming a very open bay, into which, a few miles south,
flow the waters of the classic Cydnus, a narrow stream which runs a
brief course from the barrier of Taurus, directly southward to the sea.
The river’s mouth forms a spacious and convenient harbor, to which
the light vessels of ancient commerce all easily found safe and ready
access, though most of the floating piles in which the productions of
the world are now transported, might find such a harbor altogether
inaccessible to their heavier burden.

Ammianus Marcellinus, the elegant historian of the decline of the


Roman empire, speaks in high descriptive terms, both of the
province, and the city which makes it eminent in Christian history. In
narrating important events here performed during the times whose
history he records, he alludes to the character of the region in a
preliminary description. “After surmounting the peaks of Taurus,
which towards the east rise into higher elevation, Cilicia spreads out
before the observer, in far stretching areas,――a land, rich in all
good things. To its right (that is the west, as the observer looks south
from the summits of Taurus) is joined Isauria,――in equal degree
verdant with palms and many fruits, and intersected by the navigable
river Calycadnus. This, besides many towns, has two
cities,――Seleucia, the work of Seleucus Nicator of Syria, and
Claudiopolis, a colony founded by Claudius Caesar. Isauria however,
once exceedingly powerful, has formerly been desolated for a
destructive rebellion, and therefore shows but very few traces of its
ancient splendor. But Cilicia, which rejoices in the river Cydnus, is
ennobled by Tarsus, a splendid city,――by Anazarbus, and by
Mopsuestia, the dwelling-place of that Mopsus, who accompanied
the Argonauts. These two provinces (Isauria or ‘Cilicia the Rocky,’
and Cilicia proper or ‘level’) being formerly connected with hordes of
plunderers in a piratical war, were subjugated by the proconsul
Servilius, and made tributary. And these regions, placed, as it were,
on a long tongue of land, are separated from the eastern world by
Mount Amanus.”

This account by Ammianus Marcellinus is found in book XIV. of his history, (p. 19, edited
by Vales.)

The native land of Saul was classic ground. Within the limits of
Cilicia, were laid the scenes of some of the most splendid passages
in early Grecian fable; and here too, were acted some of the
grandest events in authentic history, both Greek and Roman. The
very city of his birth, Tarsus, is said to have been founded by
Perseus, the son of Jupiter and Danae, famed for his exploit at
another place on the shore of this part of the Mediterranean. More
authentic history however, refers its earliest foundation to
Sardanapalus, king of Assyria, who built Tarsus and Anchialus in
Cilicia, nine hundred years before Christ. Its origin is by others
ascribed to Triptolemus with an Argive colony, who is represented on
some medals as the founder. These two stories may be made
consistent with each other, on the supposition that the same place
was successively the scene of the civilizing influence of each of

You might also like