You are on page 1of 53

Learning Data Mining with Python

Robert Layton
Visit to download the full and correct content document:
https://textbookfull.com/product/learning-data-mining-with-python-robert-layton/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Learning Data Mining with Python Layton

https://textbookfull.com/product/learning-data-mining-with-
python-layton/

Practical Python Data Visualization: A Fast Track


Approach To Learning Data Visualization With Python
Ashwin Pajankar

https://textbookfull.com/product/practical-python-data-
visualization-a-fast-track-approach-to-learning-data-
visualization-with-python-ashwin-pajankar/

Advanced Data Analytics Using Python: With Machine


Learning, Deep Learning and NLP Examples Mukhopadhyay

https://textbookfull.com/product/advanced-data-analytics-using-
python-with-machine-learning-deep-learning-and-nlp-examples-
mukhopadhyay/

Introduction to Machine Learning with Python A Guide


for Data Scientists Andreas C. Müller

https://textbookfull.com/product/introduction-to-machine-
learning-with-python-a-guide-for-data-scientists-andreas-c-
muller/
Machine Learning Pocket Reference Working with
Structured Data in Python 1st Edition Matt Harrison

https://textbookfull.com/product/machine-learning-pocket-
reference-working-with-structured-data-in-python-1st-edition-
matt-harrison/

Applied Text Analysis with Python Enabling Language


Aware Data Products with Machine Learning 1st Edition
Benjamin Bengfort

https://textbookfull.com/product/applied-text-analysis-with-
python-enabling-language-aware-data-products-with-machine-
learning-1st-edition-benjamin-bengfort/

Handbook of Statistical Analysis and Data Mining


Applications 2nd Edition Robert Nisbet

https://textbookfull.com/product/handbook-of-statistical-
analysis-and-data-mining-applications-2nd-edition-robert-nisbet/

Hands-on Scikit-Learn for machine learning


applications: data science fundamentals with Python
David Paper

https://textbookfull.com/product/hands-on-scikit-learn-for-
machine-learning-applications-data-science-fundamentals-with-
python-david-paper/

Encyclopedia of Machine Learning and Data Mining 2nd


Edition Claude Sammut

https://textbookfull.com/product/encyclopedia-of-machine-
learning-and-data-mining-2nd-edition-claude-sammut/
Learning Data Mining
with Python
Second Edition

6TF1ZUIPOUPNBOJQVMBUFEBUB
BOECVJMEQSFEJDUJWFNPEFMT

Robert Layton

BIRMINGHAM - MUMBAI
Learning Data Mining with Python
Second Edition
Copyright © 2017 Packt Publishing

First published: July 2015

Second edition: April 2017

Production reference: 1250417

1VCMJTIFECZ1BDLU1VCMJTIJOH-UE
-JWFSZ1MBDF
-JWFSZ4USFFU
#JSNJOHIBN
#1#6,

ISBN 978-1-78712-678-7
XXXQBDLUQVCDPN
Contents
Preface 1

Chapter 1: Getting Started with Data Mining 7


Introducing data mining 7
Using Python and the Jupyter Notebook 9
Installing Python 10
Installing Jupyter Notebook 11
Installing scikit-learn 13
A simple affinity analysis example 14
What is affinity analysis? 14
Product recommendations 14
Loading the dataset with NumPy 15
Downloading the example code 17
Implementing a simple ranking of rules 18
Ranking to find the best rules 21
A simple classification example 23
What is classification? 24
Loading and preparing the dataset 24
Implementing the OneR algorithm 26
Testing the algorithm 28
Summary 31
Chapter 2: Classifying with scikit-learn Estimators 32
scikit-learn estimators 32
Nearest neighbors 33
Distance metrics 34
Loading the dataset 37
Moving towards a standard workflow 39
Running the algorithm 40
Setting parameters 41
Preprocessing 43
Standard pre-processing 45
Putting it all together 46
Pipelines 46
Summary 48
Chapter 3: Predicting Sports Winners with Decision Trees 49
Loading the dataset 49
Collecting the data 50
Using pandas to load the dataset 51
Cleaning up the dataset 52
Extracting new features 54
Decision trees 56
Parameters in decision trees 57
Using decision trees 58
Sports outcome prediction 59
Putting it all together 59
Random forests 63
How do ensembles work? 64
Setting parameters in Random Forests 65
Applying random forests 65
Engineering new features 67
Summary 68
Chapter 4: Recommending Movies Using Affinity Analysis 69
Affinity analysis 70
Algorithms for affinity analysis 71
Overall methodology 72
Dealing with the movie recommendation problem 72
Obtaining the dataset 73
Loading with pandas 73
Sparse data formats 74
Understanding the Apriori algorithm and its implementation 75
Looking into the basics of the Apriori algorithm 77
Implementing the Apriori algorithm 78
Extracting association rules 81
Evaluating the association rules 84
Summary 87
Chapter 5: Features and scikit-learn Transformers 88
Feature extraction 89
Representing reality in models 89
Common feature patterns 92
Creating good features 96
Feature selection 97
Selecting the best individual features 99
Feature creation 102
Principal Component Analysis 105
Creating your own transformer 108
The transformer API 108
Implementing a Transformer 109
Unit testing 110
Putting it all together 111
Summary 112
Chapter 6: Social Media Insight using Naive Bayes 113
Disambiguation 114
Downloading data from a social network 115
Loading and classifying the dataset 117
Creating a replicable dataset from Twitter 121
Text transformers 125
Bag-of-words models 125
n-gram features 127
Other text features 128
Naive Bayes 129
Understanding Bayes' theorem 129
Naive Bayes algorithm 130
How it works 131
Applying of Naive Bayes 133
Extracting word counts 133
Converting dictionaries to a matrix 134
Putting it all together 135
Evaluation using the F1-score 136
Getting useful features from models 137
Summary 140
Chapter 7: Follow Recommendations Using Graph Mining 141
Loading the dataset 141
Classifying with an existing model 143
Getting follower information from Twitter 146
Building the network 148
Creating a graph 151
Creating a similarity graph 153
Finding subgraphs 157
Connected components 157
Optimizing criteria 161
Summary 164
Chapter 8: Beating CAPTCHAs with Neural Networks 166
Artificial neural networks 167
An introduction to neural networks 168
Creating the dataset 170
Drawing basic CAPTCHAs 171
Splitting the image into individual letters 174
Creating a training dataset 177
Training and classifying 179
Back-propagation 182
Predicting words 183
Improving accuracy using a dictionary 188
Ranking mechanisms for word similarity 188
Putting it all together 189
Summary 190
Chapter 9: Authorship Attribution 192
Attributing documents to authors 193
Applications and use cases 194
Authorship attribution 195
Getting the data 197
Using function words 200
Counting function words 201
Classifying with function words 204
Support Vector Machines 205
Classifying with SVMs 206
Kernels 207
Character n-grams 207
Extracting character n-grams 208
The Enron dataset 209
Accessing the Enron dataset 210
Creating a dataset loader 210
Putting it all together 213
Evaluation 214
Summary 216
Chapter 10: Clustering News Articles 218
Trending topic discovery 219
Using a web API to get data 219
Reddit as a data source 222
Getting the data 223
Extracting text from arbitrary websites 226
Finding the stories in arbitrary websites 226
Extracting the content 228
Grouping news articles 230
The k-means algorithm 231
Evaluating the results 234
Extracting topic information from clusters 237
Using clustering algorithms as transformers 238
Clustering ensembles 239
Evidence accumulation 239
How it works 243
Implementation 245
Online learning 246
Implementation 247
Summary 250
Chapter 11: Object Detection in Images using Deep Neural Networks 251
Object classification 252
Use cases 252
Application scenario 254
Deep neural networks 257
Intuition 257
Implementing deep neural networks 259
An Introduction to TensorFlow 260
Using Keras 264
Convolutional Neural Networks 269
GPU optimization 271
When to use GPUs for computation 272
Running our code on a GPU 273
Setting up the environment 274
Application 275
Getting the data 276
Creating the neural network 277
Putting it all together 279
Summary 280
Chapter 12: Working with Big Data 282
Big data 283
Applications of big data 284
MapReduce 286
The intuition behind MapReduce 288
A word count example 290
Hadoop MapReduce 292
Applying MapReduce 293
Getting the data 293
Naive Bayes prediction 295
The mrjob package 295
Extracting the blog posts 296
Training Naive Bayes 298
Putting it all together 302
Training on Amazon's EMR infrastructure 307
Summary 311
$SSHQGL[: Next Steps... 312
Getting Started with Data Mining 312
Scikit-learn tutorials 312
Extending the Jupyter Notebook 313
More datasets 313
Other Evaluation Metrics 313
More application ideas 313
Classifying with scikit-learn Estimators 314
Scalability with the nearest neighbor 314
More complex pipelines 314
Comparing classifiers 315
Automated Learning 315
Predicting Sports Winners with Decision Trees 316
More complex features 316
Dask 317
Research 317
Recommending Movies Using Affinity Analysis 317
New datasets 317
The Eclat algorithm 318
Collaborative Filtering 318
Extracting Features with Transformers 318
Adding noise 318
Vowpal Wabbit 319
word2vec 319
Social Media Insight Using Naive Bayes 319
Spam detection 319
Natural language processing and part-of-speech tagging 320
Discovering Accounts to Follow Using Graph Mining 320
More complex algorithms 320
NetworkX 320
Beating CAPTCHAs with Neural Networks 321
Better (worse?) CAPTCHAs 321
Deeper networks 321
Reinforcement learning 322
Authorship Attribution 322
Increasing the sample size 322
Blogs dataset 322
Local n-grams 322
Clustering News Articles 323
Clustering Evaluation 323
Temporal analysis 323
Real-time clusterings 324
Classifying Objects in Images Using Deep Learning 324
Mahotas 324
Magenta 325
Working with Big Data 325
Courses on Hadoop 325
Pydoop 325
Recommendation engine 325
W.I.L.L 326
More resources 326
Kaggle competitions 326
Coursera 326

Index 327
Preface
The second revision of Learning Data Mining with Python was written with the programmer
in mind. It aims to introduce data mining to a wide range of programmers, as I feel that this
is critically important to all those in the computer science field. Data mining is quickly
becoming the building block of the next generation of Artificial Intelligence systems. Even if
you don't find yourself building these systems, you will be using them, interfacing with
them, and being guided by them. Understand the process behind them is important and
helps you get the best out of them.
The second revision builds upon the first. Many of chapters and exercises are similar,
although new concepts are introduced and exercises are expanded in scope. Those that had
read the first revision should be able to move quickly through the book and pick up new
knowledge along the way and engage with the extra activities proposed. Those new to the
book are encouraged to take their time, do the exercises and experiment. Feel free to break
the code to understand it, and reach out if you have any questions.
As this is a book aimed at programmers, we assume that you have some knowledge of
programming and of Python itself. For this reason, there is little explanation of what
the Python code itself is doing, except in cases where it is ambiguous.

What this book covers


$IBQUFS, Getting started with data mining, introduces the technologies we will be using,
along with implementing two basic algorithms to get started.

$IBQUFS, Classifying with scikit-learn, covers classification, a key form of data mining.
You’ll also learn about some structures for making your data mining experimentation easier
to perform..

$IBQUFS, Predicting Sports Winners with Decisions Trees, introduces two new algorithms,
Decision Trees and Random Forests, and uses it to predict sports winners by creating useful
features..

$IBQUFS, Recommending Movies using Affinity Analysis, looks at the problem of


recommending products based on past experience, and introduces the Apriori algorithm.

$IBQUFS, Features and scikit-learn Transformers, introduces more types of features you can
create, and how to work with different datasets.
$IBQUFS, Social Media Insight using Naive Bayes, uses the Naïve Bayes algorithm to
automatically parse text-based information from the social media website Twitter.

$IBQUFS, Follow Recommendations Using Graph Mining, applies cluster analysis and
network analysis to find good people to follow on social media.

$IBQUFS, Beating CAPTCHAs with Neural Networks, looks at extracting information from
images, and then training neural networks to find words and letters in those images.

$IBQUFS, Authorship attribution, looks at determining who wrote a given documents, by


extracting text-based features and using Support Vector Machines.

$IBQUFS, Clustering news articles, uses the k-means clustering algorithm to group
together news articles based on their content.

$IBQUFS, Object Detection in Images using Deep Neural Networks, determines what type of
object is being shown in an image, by applying deep neural networks.

$IBQUFS, Working with Big Data, looks at workflows for applying algorithms to big data
and how to get insight from it.

"QQFOEJY, Next step, goes through each chapter, giving hints on where to go next for a
deeper understanding of the concepts introduced.

What you need for this book


It should come as no surprise that you’ll need a computer, or access to one, to complete the
book. The computer should be reasonably modern, but it doesn’t need to be overpowered.
Any modern processor (from about 2010 onwards) and 4 gigabytes of RAM will suffice, and
you can probably run almost all of the code on a slower system too.

The exception here is with the final two chapters. In these chapters, I step through using
Amazon’s web services (AWS) for running the code. This will probably cost you some
money, but the advantage is less system setup than running the code locally. If you don’t
want to pay for those services, the tools used can all be set-up on a local computer, but you
will definitely need a modern system to run it. A processor built in at least 2012, and more
than 4 GB of RAM are necessary.

I recommend the Ubuntu operating system, but the code should work well on Windows,
Macs, or any other Linux variant. You may need to consult the documentation for your
system to get some things installed though.

[2]
In this book, I use pip for installing code, which is a command line tool for installing Python
libraries. Another option is to use Anaconda, which can be found online here: IUUQDPOU
      

JOVVNJPEPXOMPBET
               

I also have tested all code using Python 3. Most of the code examples work on Python 2
with no changes. If you run into any problems, and can’t get around it, send an email and
we can offer a solution.

Who this book is for


This book is for programmers that want to get started in data mining in an application-
focused manner.

If you haven’t programmed before, I strongly recommend that you learn at least the basics
before you get started. This book doesn’t introduce programming, nor does it give too much
time to explaining the actual implementation (in-code) of how to type out the instructions.
That said, once you go through the basics, you should be able to come back to this book
fairly quickly – there is no need to be an expert programmer first!

I highly recommend that you have some Python programming experience. If you don’t, feel
free to jump in, but you might want to take a look at some Python code first, possibly
focused on tutorials using the IPython notebook. Writing programs in the IPython notebook
works a little differently than other methods, such as writing a Java program in a fully-
fledged IDE.

Conventions
In this book, you will find a number of text styles that distinguish between different kinds
of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions,
pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The next
lines of code read the link and assign it to the to the EBUBTFU@GJMFOBNF function."

A block of code is set as follows:


JNQPSUOVNQZBTOQ
EBUBTFU@GJMFOBNFBGGJOJUZ@EBUBTFUUYU
9OQMPBEUYU EBUBTFU@GJMFOBNF

[3]
Any command-line input or output is written as follows:
$ conda install scikit-learn

New terms and important words are shown in bold. Words that you see on the screen, for
example, in menus or dialog boxes, appear in the text like this: "In order to download new
modules, we will go to Files | Settings | Project Name | Project Interpreter."

Warnings or important notes appear in a box like this.

Tips and tricks appear like this.

Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this
book-what you liked or disliked. Reader feedback is important for us as it helps us develop
titles that you will really get the most out of.
To send us general feedback, simply e-mail GFFECBDL!QBDLUQVCDPN, and mention the
book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or
contributing to a book, see our author guide at XXXQBDLUQVCDPNBVUIPST.

Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you
to get the most from your purchase.

Downloading the example code


You can download the example code files for this book from your account at IUUQXXXQ        

BDLUQVCDPN. If you purchased this book elsewhere, you can visit IUUQXXXQBDLUQVCD
                           

PNTVQQPSUand register to have the files e-mailed directly to you.


        

[4]
You can download the code files by following these steps:

1. Log in or register to our website using your e-mail address and password.
2. Hover the mouse pointer on the SUPPORT tab at the top.
3. Click on Code Downloads & Errata.
4. Enter the name of the book in the Search box.
5. Select the book for which you're looking to download the code files.
6. Choose from the drop-down menu where you purchased this book from.
7. Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the
latest version of:

WinRAR / 7-Zip for Windows


Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at IUUQTHJUIVCDPN1BDLU1VCM                       

JTIJOH-FBSOJOH%BUB.JOJOHXJUI1ZUIPO4FDPOE&EJUJPO. The benefit of the github


                                                    

repository is that any issues with the code, including problems relating to software version
changes, will be kept track of and the code there will include changes from readers around
the world. We also have other code bundles from our rich catalog of books and videos
available at IUUQTHJUIVCDPN1BDLU1VCMJTIJOH. Check them out!
                             

To avoid indention issues please use the code bundle to run the codes in the IDE instead of copying
directly from the PDF.

Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do
happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-
we would be grateful if you could report this to us. By doing so, you can save other readers
from frustration and help us improve subsequent versions of this book. If you find any
errata, please report them by visiting IUUQXXXQBDLUQVCDPNTVCNJUFSSBUB, selecting                                 

your book, clicking on the Errata Submission Form link, and entering the details of your
errata. Once your errata are verified, your submission will be accepted and the errata will
be uploaded to our website or added to any list of existing errata under the Errata section of
that title.

[5]
Getting Started with Data
1
Mining
We are collecting information about our world on a scale that has never been seen before in
the history of humanity. Along with this trend, we are now placing more day-to-day
importance on the use of this information in everyday life. We now expect our computers to
translate web pages into other languages, predict the weather with high accuracy, suggest
books we would like, and to diagnose our health issues. These expectations will grow into
the future, both in application breadth and efficacy. Data Mining is a methodology that we
can employ to train computers to make decisions with data and forms the backbone of
many high-tech systems of today.

The Python programming language is fast growing in popularity, for a good reason. It
gives the programmer flexibility, it has many modules to perform different tasks, and
Python code is usually more readable and concise than in any other languages. There is a
large and an active community of researchers, practitioners, and beginners using Python for
data mining.

In this chapter, we will introduce data mining with Python. We will cover the following
topics

What is data mining and where can we use it?


Setting up a Python-based environment to perform data mining
An example of affinity analysis, recommending products based on purchasing
habits
An example of (a classic) classification problem, predicting the plant species
based on its measurement
Getting Started with Data Mining

Introducing data mining


Data mining provides a way for a computer to learn how to make decisions with data. This
decision could be predicting tomorrow's weather, blocking a spam email from entering
your inbox, detecting the language of a website, or finding a new romance on a dating site.
There are many different applications of data mining, with new applications being
discovered all the time.

Data mining is part algorithm design, statistics, engineering, optimization,


and computer science. However, combined with these base skills in the
area, we also need to apply domain knowledge (expert knowledge)of the
area we are applying the data mining. Domain knowledge is critical for
going from good results to great results. Applying data mining effectively
usually requires this domain-specific knowledge to be integrated with the
algorithms.

Most data mining applications work with the same high-level view, where a model learns
from some data and is applied to other data, although the details often change quite
considerably.

Data mining applications involve creating data sets and tuning the algorithm as explained
in the following steps

1. We start our data mining process by creating a dataset, describing an aspect of


the real world. Datasets comprise of the following two aspects:

Samples: These are objects in the real world, such as a book,


photograph, animal, person, or any other object. Samples are also
referred to as observations, records or rows, among other naming
conventions.
Features: These are descriptions or measurements of the samples in
our dataset. Features could be the length, frequency of a specific word,
the number of legs on an animal, date it was created, and so on.
Features are also referred to as variables, columns, attributes or
covariant, again among other naming conventions.

2. The next step is tuning the data mining algorithm. Each data mining algorithm
has parameters, either within the algorithm or supplied by the user. This tuning
allows the algorithm to learn how to make decisions about the data.

[8]
Getting Started with Data Mining

As a simple example, we may wish the computer to be able to categorize people as short or
tall. We start by collecting our dataset, which includes the heights of different people and
whether they are considered short or tall:

Person Height Short or tall?


1 155cm Short
2 165cm Short
3 175cm Tall
4 185cm Tall
As explained above, the next step involves tuning the parameters of our algorithm. As a
simple algorithm; if the height is more than x, the person is tall. Otherwise, they are short.
Our training algorithms will then look at the data and decide on a good value for x. For the
preceding data, a reasonable value for this threshold would be 170 cm. A person taller than
170 cm is considered tall by the algorithm. Anyone else is considered short by this measure.
This then lets our algorithm classify new data, such as a person with height 167 cm, even
though we may have never seen a person with those measurements before.

In the preceding data, we had an obvious feature type. We wanted to know if people are
short or tall, so we collected their heights. This feature engineering is a critical problem in
data mining. In later chapters, we will discuss methods for choosing good features to collect
in your dataset. Ultimately, this step often requires some expert domain knowledge or at
least some trial and error.

In this book, we will introduce data mining through Python. In some cases, we choose
clarity of code and workflows, rather than the most optimized way to perform every task.
This clarity sometimes involves skipping some details that can improve the algorithm's
speed or effectiveness.

Using Python and the Jupyter Notebook


In this section, we will cover installing Python and the environment that we will use for
most of the book, the Jupyter Notebook. Furthermore, we will install the NumPy module,
which we will use for the first set of examples.

The Jupyter Notebook was, until very recently, called the IPython
Notebook. You'll notice the term in web searches for the project. Jupyter is
the new name, representing a broadening of the project beyond using just
Python.

[9]
Getting Started with Data Mining

Installing Python
The Python programming language is a fantastic, versatile, and an easy to use language.

For this book, we will be using Python 3.5, which is available for your system from the
Python Organization's website IUUQTXXXQZUIPOPSHEPXOMPBET. However, I
recommend that you use Anaconda to install Python, which you can download from the
official website at IUUQTXXXDPOUJOVVNJPEPXOMPBET.                             

There will be two major versions to choose from, Python 3.5 and Python
2.7. Remember to download and install Python 3.5, which is the version
tested throughout this book. Follow the installation instructions on that
website for your system. If you have a strong reason to learn version 2 of
Python, then do so by downloading the Python 2.7 version. Keep in mind
that some code may not work as in the book, and some workarounds may
be needed.

In this book, I assume that you have some knowledge of programming and Python itself.
You do not need to be an expert with Python to complete this book, although a good level
of knowledge will help. I will not be explaining general code structures and syntax in this
book, except where it is different from what is considered normal python coding practice.

If you do not have any experience with programming, I recommend that you pick up the
Learning Python book from Packt Publishing, or the book Dive Into Python, available online
at XXXEJWFJOUPQZUIPOOFU

The Python organization also maintains a list of two online tutorials for those new to
Python:

For non-programmers who want to learn to program through the Python


language:

IUUQTXJLJQZUIPOPSHNPJO#FHJOOFST(VJEF/PO1SPHSBNNFST
                                                  

For programmers who already know how to program, but need to learn Python
specifically:

IUUQTXJLJQZUIPOPSHNPJO#FHJOOFST(VJEF1SPHSBNNFST
                                               

[ 10 ]
Getting Started with Data Mining

Windows users will need to set an environment variable to use Python from the command
line, where other systems will usually be immediately executable. We set it in the following
steps

1. First, find where you install Python 3 onto your computer; the default location is
$=1ZUIPO.
2. Next, enter this command into the command line (cmd program): set the
environment to 1:5)0/1"5)1:5)0/1"5)$=1ZUIPO.

Remember to change the $=1ZUIPO if your installation of Python is in


a different folder.

Once you have Python running on your system, you should be able to open a command
prompt and can run the following code to be sure it has installed correctly.
QZUIPO
1ZUIPO EFGBVMU"QS
<($$>PO-JOVY
5ZQFIFMQDPQZSJHIUDSFEJUTPSMJDFOTFGPSNPSF
JOGPSNBUJPO
 QSJOU )FMMPXPSME
Hello, world!
 FYJU

Note that we will be using the dollar sign ($) to denote that a command that you type into
the terminal (also called a shell or DNE on Windows). You do not need to type this character
(or retype anything that already appears on your screen). Just type in the rest of the line and
press Enter.

After you have the above )FMMPXPSME example running, exit the program and move
on to installing a more advanced environment to run Python code, the Jupyter Notebook.

Python 3.5 will include a program called pip, which is a package manager
that helps to install new libraries on your system. You can verify that QJQ
is working on your system by running the QJQGSFF[F command,
which tells you which packages you have installed on your system.
Anaconda also installs their package manager, DPOEB, that you can use. If
unsure, use DPOEB first, use QJQ only if that fails.

[ 11 ]
Getting Started with Data Mining

Installing Jupyter Notebook


Jupyter is a platform for Python development that contains some tools and environments
for running Python and has more features than the standard interpreter. It contains the
powerful Jupyter Notebook, which allows you to write programs in a web browser. It also
formats your code, shows output, and allows you to annotate your scripts. It is a great tool
for exploring datasets and we will be using it as our main environment for the code in this
book.

To install the Jupyter Notebook on your computer, you can type the following into a
command line prompt (not into Python):
$ conda install jupyter notebook

You will not need administrator privileges to install this, as Anaconda keeps packages in
the user's directory.

With the Jupyter Notebook installed, you can launch it with the following:
$ jupyter notebook

Running this command will do two things. First, it will create a Jupyter Notebook instance -
the backend - that will run in the command prompt you just used. Second, it will launch
your web browser and connect to this instance, allowing you to create a new notebook. It
will look something like the following screenshot (where you need to replace IPNFCPC
with your current working directory):

[ 12 ]
Getting Started with Data Mining

To stop the Jupyter Notebook from running, open the command prompt that has the
instance running (the one you used earlier to run the KVQZUFSOPUFCPPL command).
Then, press Ctrl + C and you will be prompted 4IVUEPXOUIJTOPUFCPPLTFSWFS
Z<O> . Type y and press Enter and the Jupyter Notebook will shut down.

Installing scikit-learn
The TDJLJUMFBSO package is a machine learning library, written in Python (but also
containing code in other languages). It contains numerous algorithms, datasets, utilities,
and frameworks for performing machine learning. Scikit-learnis built upon the scientific
python stack, including libraries such as the /VN1Z and 4DJ1Z for speed. Scikit-learn is fast
and scalable in many instances and useful for all skill ranges from beginners to advanced
research users. We will cover more details of scikit-learn in $IBQUFS, Classifying with scikit-
learn Estimators.

To install TDJLJUMFBSO, you can use the DPOEB utility that comes with Python 3, which
will also install the /VN1Z and 4DJ1Z libraries if you do not already have them. Open a
terminal with administrator/root privileges and enter the following command:
$ conda install scikit-learn

Users of major Linux distributions such as Ubuntu or Red Hat may wish to install the
official package from their package manager.

Not all distributions have the latest versions of scikit-learn, so check the
version before installing it. The minimum version needed for this book is
0.14. My recommendation for this book is to use Anaconda to manage this
for you, rather than installing using your system's package manager.

Those wishing to install the latest version by compiling the source, or view more detailed
installation instructions, can go to IUUQTDJLJUMFBSOPSHTUBCMFJOTUBMMIUNM and
refer the official documentation on installing scikit-learn.

[ 13 ]
Getting Started with Data Mining

A simple affinity analysis example


In this section, we jump into our first example. A common use case for data mining is to
improve sales, by asking a customer who is buying a product if he/she would like another
similar product as well. You can perform this analysis through affinity analysis, which is
the study of when things exist together, namely. correlate to each other.

To repeat the now-infamous phrase taught in statistics classes, correlation is not causation.
This phrase means that the results from affinity analysis cannot give a cause. In our next
example, we perform affinity analysis on product purchases. The results indicate that the
products are purchased together, but not that buying one product causes the purchase of
the other. The distinction is important, critically so when determining how to use the results
to affect a business process, for instance.

What is affinity analysis?


Affinity analysis is a type of data mining that gives similarity between samples (objects).
This could be the similarity between the following:

Users on a website, to provide varied services or targeted advertising


Items to sell to those users, to provide recommended movies or products
Human genes, to find people that share the same ancestors

We can measure affinity in several ways. For instance, we can record how frequently two
products are purchased together. We can also record the accuracy of the statement when a
person buys object 1 and when they buy object 2. Other ways to measure affinity include
computing the similarity between samples, which we will cover in later chapters.

Product recommendations
One of the issues with moving a traditional business online, such as commerce, is that tasks
that used to be done by humans need to be automated for the online business to scale and
compete with existing automated businesses. One example of this is up-selling, or selling an
extra item to a customer who is already buying. Automated product recommendations
through data mining are one of the driving forces behind the e-commerce revolution that is
turning billions of dollars per year into revenue.

[ 14 ]
Getting Started with Data Mining

In this example, we are going to focus on a basic product recommendation service. We


design this based on the following idea: when two items are historically purchased
together, they are more likely to be purchased together in the future. This sort of thinking is
behind many product recommendation services, in both online and offline businesses.

A very simple algorithm for this type of product recommendation algorithm is to simply
find any historical case where a user has brought an item and to recommend other items
that the historical user brought. In practice, simple algorithms such as this can do well, at
least better than choosing random items to recommend. However, they can be improved
upon significantly, which is where data mining comes in.

To simplify the coding, we will consider only two items at a time. As an example, people
may buy bread and milk at the same time at the supermarket. In this early example, we
wish to find simple rules of the form:

If a person buys product X, then they are likely to purchase product Y

More complex rules involving multiple items will not be covered such as people buying
sausages and burgers being more likely to buy tomato sauce.

Loading the dataset with NumPy


The dataset can be downloaded from the code package supplied with the book, or from the
official GitHub repository at:
IUUQTHJUIVCDPNEBUB1JQFMJOF"6-FBSOJOH%BUB.JOJOH8JUI1ZUIPO
Download this file and save it on your computer, noting the path to the dataset. It is easiest
to put it in the directory you'll run your code from, but we can load the dataset from
anywhere on your computer.

For this example, I recommend that you create a new folder on your computer to store your
dataset and code. From here, open your Jupyter Notebook, navigate to this folder, and
create a new notebook.

The dataset we are going to use for this example is a NumPy two-dimensional array, which
is a format that underlies most of the examples in the rest of the book. The array looks like a
table, with rows representing different samples and columns representing different
features.

[ 15 ]
Getting Started with Data Mining

The cells represent the value of a specific feature of a specific sample. To illustrate, we can
load the dataset with the following code:
JNQPSUOVNQZBTOQ
EBUBTFU@GJMFOBNFBGGJOJUZ@EBUBTFUUYU
9OQMPBEUYU EBUBTFU@GJMFOBNF

Enter the previous code into the first cell of your (Jupyter) Notebook. You can then run the
code by pressing Shift + Enter (which will also add a new cell for the next section of code).
After the code is run, the square brackets to the left-hand side of the first cell will be
assigned an incrementing number, letting you know that this cell has completed. The first
cell should look like the following:

For code that will take more time to run, an asterisk will be placed here to
denote that this code is either running or scheduled to run. This asterisk
will be replaced by a number when the code has completed running
(including if the code completes because it failed).

[ 16 ]
Getting Started with Data Mining

This dataset has 100 samples and five features, which we will need to know for the later
code. Let's extract those values using the following code:
O@TBNQMFTO@GFBUVSFT9TIBQF

If you choose to store the dataset somewhere other than the directory your Jupyter
Notebooks are in, you will need to change the EBUBTFU@GJMFOBNF value to the new
location.

Next, we can show some of the rows of the dataset to get an understanding of the data.
Enter the following line of code into the next cell and run it, to print the first five lines of the
dataset:
QSJOU 9<>

The result will show you which items were bought in the first five transactions listed:
<<>
<>
<>
<>
<>>

Downloading the example code


You can download the example code files from your account at IUUQXXXQBDLUQVCDPN                   

for all the Packt Publishing books you have purchased. If you purchased this book
elsewhere, you could visit IUUQXXXQBDLUQVCDPNTVQQPSUand register to have the files
                          

e-mailed directly to you. I've also setup a GitHub repository that contains a live version of
the code, along with new fixes, updates and so on. You can retrieve the code and datasets at
the repository here:
IUUQTHJUIVCDPNEBUB1JQFMJOF"6-FBSOJOH%BUB.JOJOH8JUI1ZUIPO

You can read the dataset can by looking at each row (horizontal line) at a time. The first row
 shows the items purchased in the first transaction. Each column
(vertical row) represents each of the items. They are bread, milk, cheese, apples, and
bananas, respectively. Therefore, in the first transaction, the person bought cheese, apples,
and bananas, but not bread or milk. Add the following line in a new cell to allow us to turn
these feature numbers into actual words:
GFBUVSFT<CSFBENJMLDIFFTFBQQMFTCBOBOBT>

[ 17 ]
Getting Started with Data Mining

Each of these features contains binary values, stating only whether the items were
purchased and not how many of them were purchased. A1 indicates that at least 1 item was
bought of this type, while a 0 indicates that absolutely none of that item was purchased. For
a real world dataset, using exact figures or a larger threshold would be required.

Implementing a simple ranking of rules


We wish to find rules of the type If a person buys product X, then they are likely to purchase
product Y. We can quite easily create a list of all the rules in our dataset by simply finding all
occasions when two products are purchased together. However, we then need a way to
determine good rules from bad ones allowing us to choose specific products to recommend.

We can evaluate rules of this type in many ways, on which we will focus on two: support
and confidence.

Support is the number of times that a rule occurs in a dataset, which is computed by simply
counting the number of samples for which the rule is valid. It can sometimes be normalized
by dividing by the total number of times the premise of the rule is valid, but we will simply
count the total for this implementation.

The premise is the requirements for a rule to be considered active. The


conclusion is the output of the rule. For the example if a person buys an
apple, they also buy a banana, the rule is only valid if the premise happens - a
person buys an apple. The rule's conclusion then states that the person will
buy a banana.

While the support measures how often a rule exists, confidence measures how accurate they
are when they can be used. You can compute this by determining the percentage of times
the rule applies when the premise applies. We first count how many times a rule applies to
our data and divide it by the number of samples where the premise (the JG statement)
occurs.

As an example, we will compute the support and confidence for the rule if a person buys
apples, they also buy bananas.

As the following example shows, we can tell whether someone bought apples in a
transaction, by checking the value of TBNQMF<>, where we assign a sample to a row of our
matrix:
TBNQMF9<>

[ 18 ]
Getting Started with Data Mining

Similarly, we can check if bananas were bought in a transaction by seeing if the value of
TBNQMF<> is equal to 1 (and so on). We can now compute the number of times our rule
exists in our dataset and, from that, the confidence and support.

Now we need to compute these statistics for all rules in our database. We will do this by
creating a dictionary for both valid rules and invalid rules. The key to this dictionary will be a
tuple (premise and conclusion). We will store the indices, rather than the actual feature
names. Therefore, we would store (3 and 4) to signify the previous rule If a person buys
apples, they will also buy bananas. If the premise and conclusion are given, the rule is
considered valid. While if the premise is given but the conclusion is not, the rule is
considered invalid for that sample.

The following steps will help us to compute the confidence and support for all possible
rules:

1. We first set up some dictionaries to store the results. We will use EFGBVMUEJDU
for this, which sets a default value if a key is accessed that doesn't yet exist. We
record the number of valid rules, invalid rules, and occurrences of each premise:

GSPNDPMMFDUJPOTJNQPSUEFGBVMUEJDU
WBMJE@SVMFTEFGBVMUEJDU JOU
JOWBMJE@SVMFTEFGBVMUEJDU JOU
OVN@PDDVSFODFTEFGBVMUEJDU JOU

2. Next, we compute these values in a large loop. We iterate over each sample in the
dataset and then loop over each feature as a premise. When again loop over each
feature as a possible conclusion, mapping the relationship premise to conclusion.
If the sample contains a person who bought the premise and the conclusion, we
record this in WBMJE@SVMFT. If they did not purchase the conclusion product, we
record this in JOWBMJE@SVMFT.
3. For sample in X:

GPSTBNQMFJO9
GPSQSFNJTFJOSBOHF O@GFBUVSFT 
JGTBNQMF<QSFNJTF>DPOUJOVF
3FDPSEUIBUUIFQSFNJTFXBTCPVHIUJOBOPUIFSUSBOTBDUJPO
OVN@PDDVSFODFT<QSFNJTF> 
GPSDPODMVTJPOJOSBOHF O@GFBUVSFT 
JGQSFNJTFDPODMVTJPO
*UNBLFTMJUUMFTFOTFUP
NFBTVSFJG9 9
DPOUJOVF
JGTBNQMF<DPODMVTJPO>
5IJTQFSTPOBMTPCPVHIUUIFDPODMVTJPOJUFN

[ 19 ]
Getting Started with Data Mining

WBMJE@SVMFT< QSFNJTFDPODMVTJPO > 

If the premise is valid for this sample (it has a value of 1), then we record this and check
each conclusion of our rule. We skip over any conclusion that is the same as the premise-
this would give us rules such as: if a person buys Apples, then they buy Apples, which
obviously doesn't help us much.

We have now completed computing the necessary statistics and can now compute the
support and confidence for each rule. As before, the support is simply our WBMJE@SVMFT
value:
TVQQPSUWBMJE@SVMFT

We can compute the confidence in the same way, but we must loop over each rule to
compute this:
DPOGJEFODFEFGBVMUEJDU GMPBU
GPSQSFNJTFDPODMVTJPOJOWBMJE@SVMFTLFZT 
SVMF QSFNJTFDPODMVTJPO
DPOGJEFODF<SVMF>WBMJE@SVMFT<SVMF>OVN@PDDVSFODFT<QSFNJTF>

We now have a dictionary with the support and confidence for each rule. We can create a
function that will print out the rules in a readable format. The signature of the rule takes the
premise and conclusion indices, the support and confidence dictionaries we just computed,
and the features array that tells us what the GFBUVSFT mean. Then we print out the
4VQQPSU and $POGJEFODF of this rule:

GPSQSFNJTFDPODMVTJPOJODPOGJEFODF
QSFNJTF@OBNFGFBUVSFT<QSFNJTF>
DPODMVTJPO@OBNFGFBUVSFT<DPODMVTJPO>
QSJOU 3VMF*GBQFSTPOCVZT\^UIFZXJMMBMTP
CVZ\^GPSNBU QSFNJTF@OBNFDPODMVTJPO@OBNF
QSJOU $POGJEFODF\G^GPSNBU
 DPOGJEFODF< QSFNJTFDPODMVTJPO >
QSJOU 4VQQPSU\^GPSNBU TVQQPSU
< QSFNJTF
DPODMVTJPO >
QSJOU 

We can test the code by calling it in the following way-feel free to experiment with different
premises and conclusions:
GPSQSFNJTFDPODMVTJPOJODPOGJEFODF
QSFNJTF@OBNFGFBUVSFT<QSFNJTF>
DPODMVTJPO@OBNFGFBUVSFT<DPODMVTJPO>
QSJOU 3VMF*GBQFSTPOCVZT\^UIFZXJMMBMTP
CVZ\^GPSNBU QSFNJTF@OBNFDPODMVTJPO@OBNF

[ 20 ]
Getting Started with Data Mining

QSJOU $POGJEFODF\G^GPSNBU
 DPOGJEFODF< QSFNJTFDPODMVTJPO >
QSJOU 4VQQPSU\^GPSNBU TVQQPSU
< QSFNJTF
DPODMVTJPO >
QSJOU 

Ranking to find the best rules


Now that we can compute the support and confidence of all rules, we want to be able to
find the best rules. To do this, we perform a ranking and print the ones with the highest
values. We can do this for both the support and confidence values.

To find the rules with the highest support, we first sort the support dictionary. Dictionaries
do not support ordering by default; the JUFNT function gives us a list containing the data
in the dictionary. We can sort this list using the JUFNHFUUFS class as our key, which allows
for the sorting of nested lists such as this one. Using JUFNHFUUFS  allows us to sort
based on the values. 4FUUJOHSFWFSTF5SVF gives us the highest values first:
GSPNPQFSBUPSJNQPSUJUFNHFUUFS
TPSUFE@TVQQPSUTPSUFE TVQQPSUJUFNT LFZJUFNHFUUFS  SFWFSTF5SVF

We can then print out the top five rules:


TPSUFE@DPOGJEFODFTPSUFE DPOGJEFODFJUFNT LFZJUFNHFUUFS  
SFWFSTF5SVF
GPSJOEFYJOSBOHF  
QSJOU 3VMF\^GPSNBU JOEFY 
QSFNJTFDPODMVTJPOTPSUFE@DPOGJEFODF<JOEFY><>
QSJOU@SVMF QSFNJTFDPODMVTJPOTVQQPSUDPOGJEFODFGFBUVSFT

The result will look like the following:


Rule #1
Rule: If a person buys bananas they will also buy milk
- Support: 27
- Confidence: 0.474
Rule #2
Rule: If a person buys milk they will also buy bananas
- Support: 27
- Confidence: 0.519
Rule #3
Rule: If a person buys bananas they will also buy apples
- Support: 27
- Confidence: 0.474
Rule #4

[ 21 ]
Another random document with
no related content on Scribd:
insulted her and is expected to pay damages. If a man meets his
mother-in-law coming along the road and does not recognise her,
she will fall down on the ground as a sign, when he will run away. In
the same way a father-in-law will signal to his daughter-in-law; the
whole idea being that they are unworthy to be noticed till they have
proved that they can beget children.”79.2 However, if a wife should
prove barren for three years, the rules of avoidance between the
young couple and their parents-in-law cease to be observed.79.3
Hence the custom of avoidance among these people is associated in
some way with the wife’s fertility. So among the Awemba, a Bantu
tribe of Northern Rhodesia, “if a young man sees his mother-in-law
coming along the path, he must retreat into the bush and make way
for her, or if she suddenly comes upon him he must keep his eyes
fixed on the ground, and only after a child is born may they converse
together.”79.4 Among the Angoni, another Bantu tribe of British
Central Africa, it would be a gross breach of etiquette if a man were
to enter his son-in-law’s house; he may come within ten paces of the
door, but no nearer. A woman may not even approach her son-in-
law’s house, and she is never allowed to speak to him. Should they
meet accidentally on a path, the son-in-law gives way and makes a
circuit to avoid encountering his mother-in-law face to face.79.5 Here
then we see that a man avoids his son-in-law as well as his mother-
in-law, though not so strictly.
Among the Thonga, a Bantu tribe about
The custom of Delagoa Bay, when a man meets his mother-in-
avoiding mother-in-
law and wife of law or her sister on the road, he steps out of the
wife’s brother road into the forest on the right hand side and sits
among the Thonga down. She does the same. Then they salute each
of Delagoa Bay.
other in the usual way by clapping their hands.
After that they may talk to each other. When a man is in a hut, his
mother-in-law dare not enter it, but must sit down outside without
seeing him. So seated she may salute him, “Good morning, son of
So-and-so.” But she would not dare to pronounce his name.
However, when a man has been married many years, his mother-in-
law has less fear of him, and will even enter the hut where he is and
speak to him. But among the Thonga the woman whom a man is
bound by custom to avoid most rigidly is not his wife’s mother, but
the wife of his wife’s brother. If the two meet on a path, they carefully
avoid each other; he will step out of the way and she will hurry on,
while her companions, if she has any, will stop and chat with him.
She will not enter the same boat with him, if she can help it, to cross
a river. She will not eat out of the same dish. If he speaks to her, it is
with constraint and embarrassment. He will not enter her hut, but will
crouch at the door and address her in a voice trembling with
emotion. Should there be no one else to bring him food, she will do it
reluctantly, watching his hut and putting the food inside the door
when he is absent. It is not that they dislike each other, but that they
feel a mutual, a mysterious fear.80.1 However, among the Thonga,
the rules of avoidance between connexions by marriage decrease in
severity as time passes. The strained relations between a man and
his wife’s mother in particular become easier. He begins to call her
“Mother” and she calls him “Son.” This change even goes so far that
in some cases the man may go and dwell in the village of his wife’s
parents, especially if he has children and the children are grown
up.80.2 Again, among the Ovambo, a Bantu people of German South-
West Africa, a man may not look at his future mother-in-law while he
talks with her, but is bound to keep his eyes steadily fixed on the
ground. In some cases the avoidance is even more stringent; if the
two meet unexpectedly, they separate at once. But after the
marriage has been celebrated, the social intercourse between
mother-in-law and son-in-law becomes easier on both sides.81.1
Thus far our examples of ceremonial avoidance
The custom of between mother-in-law and son-in-law have been
avoiding the
mother-in-law drawn from Bantu tribes. But in Africa the custom,
among other than though apparently most prevalent and most
the Bantu tribes of strongly marked among peoples of the great Bantu
Africa.
stock, is not confined to them. Among the Masai of
British East Africa, “mothers-in-law and their sons-in-law must avoid
one another as much as possible; and if a son-in-law enters his
mother-in-law’s hut she must retire into the inner compartment and
sit on the bed, whilst he remains in the outer compartment; they may
then talk. Own brothers-in-law and sisters-in-law must also avoid one
another, though this rule does not apply to half-brothers-in-law and
sisters-in-law.”81.2 So, too, among the Bogos, a tribe on the outskirts
of Abyssinia, a man never sees the face of his mother-in-law and
never pronounces her name; the two take care not to meet.81.3
Among the Donaglas a husband after marriage “lives in his wife’s
house for a year, without being allowed to see his mother-in-law, with
whom he enters into relations only on the birth of his first son.”81.4 In
Darfur, when a youth has been betrothed to a girl, however intimate
he may have been with her parents before, he ceases to see them
until the ceremony has taken place, and even avoids them in the
street. They, on their part, hide their faces, if they happen to meet
him unexpectedly.81.5
To pass now from Africa to other parts of the
The custom of world, among the Looboos, a primitive tribe in the
avoiding relations
by marriage in tropical forests of Sumatra, custom forbids a
Sumatra and New woman to be in her father-in-law’s company and a
Guinea. man to be in his mother-in-law’s society. For
example, if a man meets his daughter-in-law, he
should cross over to the other side of the road to let her pass as far
as possible from him; but if the way is too narrow, he takes care in
time to get out of it. But no such reserve is prescribed between a
father-in-law and his son-in-law, or between a mother-in-law and her
daughter-in-law.82.1 Among the Bukaua, a Melanesian tribe of
German New Guinea, the rules of avoidance between persons
connected by marriage are very stringent; they may not touch each
other or mention each other’s names. But contrary to the usual
practice the avoidance seems to be quite as strict between persons
of the same sex as between males and females. At least the writer
who reports the custom illustrates it chiefly by the etiquette which is
observed between a man and his daughter’s husband. When a man
eats in presence of his son-in-law, he veils his face; but if
nevertheless his son-in-law should see his open mouth, the father-in-
law is so ashamed that he runs away into the wood. If he gives his
son-in-law anything, such as betel or tobacco, he will never put it in
his hand, but pours it on a leaf, and the son-in-law fetches it away. If
father-in-law and son-in-law both take part in a wild boar hunt, the
son-in-law will abstain from seizing or binding the boar, lest he
should chance to touch his father-in-law. If, however, through any
accident their hands or backs should come into contact, the father-
in-law is extremely horrified, and a dog must be at once killed, which
he gives to his son-in-law for the purpose of wiping out the stain on
his honour. If the two should ever fall out about anything, the son-in-
law will leave the village and his wife, and will stay away in some
other place till his father-in-law, for his daughter’s sake, calls him
back. A man in like manner will never touch his sister-in-law.82.2
Among the low savages of the Californian
The custom of peninsula a man was not allowed for some time to
avoiding relations
by marriage among look into the face of his mother-in-law or of his
the Indian tribes of wife’s other near relations; when these women
America. were present he had to step aside or hide
himself.83.1 Among the Indians of the Isla del
Malhado in Florida a father-in-law and mother-in-law might not enter
the house of their son-in-law, and he on his side might not appear
before his father-in-law and his relations. If they met by accident they
had to go apart to the distance of a bowshot, holding their heads
down and their eyes turned to the earth. But a woman was free to
converse with the father and mother of her husband.83.2 Among the
Indians of Yucatan, if a betrothed man saw his future father-in-law or
mother-in-law at a distance, he turned away as quickly as possible,
believing that a meeting with them would prevent him from begetting
children.83.3 Among the Arawaks of British Guiana a man may never
see the face of his wife’s mother. If she is in the house with him, they
must be separated by a screen or partition-wall; if she travels with
him in a canoe, she steps in first, in order that she may turn her back
to him.83.4 Among the Caribs “the women never quit their father’s
house, and in that they have an advantage over their husbands in as
much as they may talk to all sorts of people, whereas the husband
dare not converse with his wife’s relations, unless he is dispensed
from this observance either by their tender age or by their
intoxication. They shun meeting them and make great circuits for
that purpose. If they are surprised in a place where they cannot help
meeting, the person addressed turns his face another way so as not
to be obliged to see the person, whose voice he is compelled to
hear.”83.5 Among the Araucanian Indians of Chili a man’s mother-in-
law refuses to speak to or even to look at him during the marriage
festivity, and “the point of honour is, in some instances, carried so
far, that for years after the marriage the mother never addresses her
son-in-law face to face; though with her back turned, or with the
interposition of a fence or a partition, she will converse with him
freely.”84.1
It would be easy to multiply examples of similar
The custom of customs of avoidance between persons closely
avoiding relations
by marriage cannot connected by marriage, but the foregoing may
be separated from serve as specimens. Now in order to determine
the similar custom the meaning of such customs it is very important
of avoiding relations
by blood; both are to observe that similar customs of avoidance are
probably practised in some tribes not merely between
precautions to
prevent improper
persons connected with each other by marriage,
relations between but also between the nearest blood relations of
the sexes. different sexes, namely, between parents and
children and between brothers and sisters;84.2 and
the customs are so alike that it seems difficult or impossible to
separate them and to offer one explanation of the avoidance of
connexions by marriage and another different explanation of the
avoidance of blood relations. Yet this is what is done by some who
attempt to explain the customs of avoidance; or rather they confine
their attention wholly to connexions by marriage, or even to mothers-
in-law alone, while they completely ignore blood relations, although
in point of fact it is the avoidance of blood relations which seems to
furnish the key to the problem of such avoidances in general. The
true explanation of all such customs of avoidance appears to be, as I
have already indicated, that they are precautions designed to
remove the temptation to sexual intercourse between persons whose
marriage union is for any reason repugnant to the moral sense of the
community. This explanation, while it has been rejected by theorists
at home, has been adopted by some of the best observers of savage
life, whose opinion is entitled to carry the greatest weight.85.1
That a fear of improper intimacy even between
Mutual avoidance of the nearest blood relations is not baseless among
mother and son, of
father and daughter, races of a lower culture seems proved by the
and of brother and testimony of a Dutch missionary in regard to the
Battas or Bataks of Sumatra, a people who have
sister among the attained to a fairly high degree of barbaric
Battas.
civilization. The Battas “observe certain rules of
avoidance in regard to near relations by blood or marriage; and we
are informed that such avoidance springs not from the strictness but
from the looseness of their moral practice. A Batta, it is said,
assumes that a solitary meeting of a man with a woman leads to an
improper intimacy between them. But at the same time he believes
that incest or the sexual intercourse of near relations excites the
anger of the gods and entails calamities of all sorts. Hence near
relations are obliged to avoid each other lest they should succumb to
temptation. A Batta, for example, would think it shocking were a
brother to escort his sister to an evening party. Even in the presence
of others a Batta brother and sister feel embarrassed. If one of them
comes into the house, the other will go away. Further, a man may
never be alone in the house with his daughter, nor a mother with her
son. A man may never speak to his mother-in-law nor a woman to
her father-in-law. The Dutch missionary who reports these customs
adds that he is sorry to say that from what he knows of the Battas he
believes the maintenance of most of these rules to be very
necessary. For the same reason, he tells us, as soon as Batta lads
have reached the age of puberty they are no longer allowed to sleep
in the family house but are sent away to pass the night in a separate
building (djambon); and similarly as soon as a man loses his wife by
death he is excluded from the house.”85.2
In like manner among the Melanesians of the
Mutual avoidance of Banks’ Islands and the New Hebrides a man must
mother and son and
of brother and sister not only avoid his mother-in-law; from the time
among the when he reaches or approaches puberty and has
Melanesians. begun to wear clothes instead of running about
naked, he must avoid his mother and sisters, and
he may no longer live in the same house with them; he takes up his
quarters in the clubhouse of the unmarried males, where he now
regularly eats and sleeps. He may go to his father’s house to ask for
food, but if his sister is within he must go away before he eats; if she
is not there, he may sit down near the door and eat. If by chance
brother and sister meet in the path, she runs away or hides. If a boy,
walking on the sands, perceives footprints which he knows to be
those of his sister, he will not follow them, nor will she follow his. This
mutual avoidance lasts through life. Not only must he avoid the
persons of his sisters, but he may not pronounce their names or
even use a common word which happens to form part of any one of
their names. In like manner his sisters eschew the use of his name
and of all words which form part of it. Strict, too, is a boy’s reserve
towards his mother from the time when he begins to wear clothes,
and the reserve increases as he grows to manhood. It is greater on
her side than on his. He may go to the house and ask for food and
his mother may bring it out for him, but she will not give it to him; she
puts it down for him to take. If she calls to him to come, she speaks
to him in the plural, in a more distant manner; “Come ye,” she says,
not “Come thou.” If they talk together she sits at a little distance and
turns away, for she is shy of her grown-up son. “The meaning of all
this,” as Dr. Codrington observes, “is obvious.”86.1
Mutual avoidance of
a man and his
When a Melanesian man of the Banks’ Islands
mother-in-law marries, he is bound in like manner to avoid his
among the mother-in-law. “The rules of avoidance are very
Melanesians.
strict and minute. As regards the avoidance of the
person, a man will not come near his wife’s mother; the avoidance is
mutual; if the two chance to meet in a path, the woman will step out
of it and stand with her back turned till he has gone by, or perhaps if
it be more convenient he will move out of the way. At Vanua Lava, in
Port Patteson, a man would not follow his mother-in-law along the
beach, nor she him, until the tide had washed out the footsteps of
the first traveller from the sand. At the same time a man and his
mother-in-law will talk at a distance.”87.1
It seems obvious that these Melanesian
It is significant that customs of avoidance are the same, and must be
mutual avoidance
between blood explained in the same way whether the woman
relations of opposite whom a man shuns is his wife’s mother or his own
sexes begins at or mother or his sister. Now it is highly significant that
near puberty.
just as among the Akamba of East Africa the
mutual avoidance of father and daughter only begins when the girl
has reached puberty, so among the Melanesians the mutual
avoidance of a boy on the one side and of his mother and sisters on
the other only begins when the boy has reached or approached
puberty. Thus in both peoples the avoidance between the nearest
blood relations only commences at the dangerous age when sexual
connexion on both sides begins to be possible. It seems difficult,
therefore, to evade the conclusion that the mutual avoidance is
adopted for no other reason than to diminish as far as possible the
chances of sexual unions which public opinion condemns as
incestuous. But if that is the reason why a young Melanesian boy, on
the verge of puberty, avoids his own mother and sisters, it is natural
and almost necessary to infer that it is the same reason which leads
him, as a full-grown and married man, to eschew the company of his
wife’s mother.
Similar customs of avoidance between mothers
Mutual avoidance of and sons, between fathers and daughters, and
mother and son, of
father and daughter, between brothers and sisters are observed by the
and of brother and natives of the Caroline Islands, and the writer who
sister in the records them assigns the fear of incest as the
Caroline Islands.
motive for their observance. “The prohibition of
marriage,” he says, “and of sexual intercourse between kinsfolk of
the same tribe is regarded by the Central Caroline natives as a
divine ordinance; its breach is therefore, in their opinion, punished by
the higher powers with sickness or death. The law influences in a
characteristic way the whole social life of the islanders, for efforts are
made to keep members of families of different sexes apart from each
other even in their youth. Unmarried men and boys, from the time
when they begin to speak, may therefore not remain by night in the
huts, but must sleep in the fel, the assembly-house. In the evening
their meal (âkot) is brought thither to them by their mothers or
sisters. Only when a son is sick may his mother receive him in the
hut and tend him there. On the other hand entrance to the assembly-
house (fel) is forbidden to women and girls except on the occasion of
the pwarik festival; whereas female members of other tribes are free
to visit it, although, so far as I could observe, they seldom make use
of the permission. Unmarried girls sleep in the huts with their
parents.
“These restrictions, which custom and tradition have instituted
within the family, find expression also in the behaviour of the
members of families toward each other. The following persons,
namely, have to be treated with respect—the daughters by their
father, the sons by their mother, the brothers by their sisters. In
presence of such relations, as in the presence of a chief, you may
not stand, but must sit down; if you are obliged on narrow paths to
pass by one of them you must first obtain permission and then do it
in a stooping or creeping posture. You allow them everywhere to go
in front; you also avoid to drink out of the vessel which they have just
used; you do not touch them, but keep always at a certain distance
from them; the head especially is deemed sacred.”88.1
In all these cases the custom of mutual
Mutual avoidance of avoidance is observed by persons of opposite sex
male and female
cousins in some who, though physically capable of sexual union,
tribes. are forbidden by tradition and public opinion to
have any such commerce with each other. Thus
far the blood relations whom a man is forbidden to marry and
compelled to avoid, are his own mother, his own daughter, and his
own sisters. But to this list some people add a man’s female cousins
or at least certain of them; for many races draw a sharp line of
distinction between cousins according as they are children of two
brothers or of two sisters or of a brother and a sister, and while they
permit or even prefer marriage with certain cousins, they absolutely
forbid marriage with certain others. Now, it is highly significant that
some tribes which forbid a man to marry certain of his cousins also
compel him to adopt towards them the same attitude of social
reserve which in the same or other tribes a man is obliged to
observe towards his wife’s mother, his own mother, and his own
sisters, all of whom in like manner he is forbidden to marry. Thus
among the tribes in the central part of New Ireland
Mutual avoidance of (New Mecklenburg) a male and a female cousin,
male and female
cousins in New the children of a brother and a sister respectively,
Ireland. are most strictly forbidden by custom to marry
each other; indeed this prohibition is described as
the most stringent of all; the usual saying in regard to such relations
is, “The cousin is holy” (i tábu ra kókup). Now, in these tribes a man
is not merely forbidden to marry his female cousin, the daughter of
his father’s sister or of his mother’s brother; he must also avoid her
socially, just as in other tribes a man must avoid his wife’s mother,
his own mother, his own daughter, and his own sisters. The cousins
may not approach each other, they may not shake hands or even
touch each other, they may not give each other presents, they may
not mention each other’s names; but they are allowed to speak to
each other at a distance of some paces. These rules of avoidance,
these social barriers erected between cousins, the children of a
brother and a sister respectively, are interpreted most naturally and
simply as precautions intended to obviate the danger of a criminal
intercourse between persons whose sexual union would be regarded
by public opinion with deep displeasure. Indeed the Catholic
missionary, to whom we are indebted for the information, assumes
this interpretation of the rules as if it were too obvious to call for
serious discussion. He says that all the customs of avoidance “are
observed as outward symbols of this prohibition of marriage”; and he
adds that “were the outward sign of the prohibition of marriage, to
which the natives cleave with genuine obstinacy, abolished or even
weakened, there would be an immediate danger of the natives
contracting such marriages.”90.1 It seems difficult for a rational man
to draw any other inference. If any confirmation were needed, it
would be furnished by the fact that among these tribes of New
Ireland brothers and sisters are obliged to observe precisely the
same rules of mutual avoidance, and that incest between brother
and sister is a crime which is punished with hanging; they may not
come near each other, they may not shake hands, they may not
touch each other, they may not give each other presents; but they
are allowed to speak to each other at a distance of some paces. And
the penalty for incest with a daughter is also death by hanging.90.2
Amongst the Baganda of Central Africa in like
Mutual avoidance of manner a man was forbidden under pain of death
certain male and
female cousins to marry or have sexual intercourse with his
among the cousin, the daughter either of his father’s sister or
Baganda; marriage of his mother’s brother; and such cousins might
or sexual
intercourse not approach each other, nor hand each other
forbidden between anything, nor enter the same house, nor eat out of
these cousins under
pain of death.
the same dish. Were cousins to break these rules
of social avoidance, in other words, if they were to
approach each other or hand each other anything, it was believed
that they would fall ill, that their hands would tremble, and that they
would be unfit for any work.90.3 Here, again, the prohibition of social
intercourse was in all probability merely a precaution against sexual
intercourse, for which the penalty was death. And the same may be
said of the similar custom of avoidance which among these same
Baganda a man had to observe towards his wife’s mother. “No man
might see his mother-in-law, or speak face to face with her; she
covered her face, if she passed her son-in-law, and he gave her the
path and made a detour, if he saw her coming. If she was in the
house, he might not enter, but he was allowed to speak to her from a
distance. This was said to be because he had seen her daughter’s
nakedness. If a son-in-law accidentally saw his mother-in-law’s
breasts, he sent her a barkcloth in compensation, to cover herself,
lest some illness, such as tremor, should come upon him. The
punishment for incest was death; no member of a clan would shield
a person guilty thereof; the offender was disowned by the clan, tried
by the chief of the district, and put to death.”91.1
The prohibition of marriage with certain cousins
Marriage between appears to be widespread among African peoples
certain cousins
forbidden among of the Bantu stock. Thus in regard to the Bantus of
some South African South Africa we read that “every man of a coast
tribes but allowed tribe regarded himself as the protector of those
among others.
females whom we would call his cousins, second
cousins, third cousins, and so forth, on the father’s side, while some
had a similar feeling towards the same relatives on the mother’s side
as well, and classified them all as sisters. Immorality with one of
them would have been considered incestuous, something horrible,
something unutterably disgraceful. Of old it was punished by the
death of the male, and even now a heavy fine is inflicted upon him,
while the guilt of the female must be atoned by a sacrifice performed
with due ceremony by the tribal priest, or it is believed a curse will
rest upon her and her issue.… In contrast to this prohibition the
native of the interior almost as a rule married the daughter of his
father’s brother, in order, as he said, to keep property from being lost
to his family. This custom more than anything else created a disgust
and contempt for them by the people of the coast, who term such
intermarriages the union of dogs, and attribute to them the insanity
and idiocy which in recent times has become prevalent among the
inland tribes.”91.2
Among the Thonga, a Bantu tribe about
Marriage between Delagoa Bay, marriages between cousins are as a
cousins allowed in
some African tribes rule prohibited, and it is believed that such unions
on condition that an are unfruitful. However, custom permits cousins to
expiatory sacrifice marry each other on condition that they perform an
is offered.
expiatory ceremony which is supposed to avert the
curse of barrenness from the wife. A goat is sacrificed, and the
couple are anointed with the green liquid extracted from the half-
digested grass in the animal’s stomach. Then a hole is cut in the
goat’s skin and through this hole the heads of the cousins are
inserted. The goat’s liver is then handed to them, quite raw, through
the hole in the skin, and they must tear it out with their teeth without
using a knife. Having torn it out, they eat it. The word for liver
(shibindji) also means “patience,” “determination.” So they say to the
couple, “You have acted with strong determination. Eat the liver now!
Eat it in the full light of the day, not in the dark! It will be an offering to
the gods.” Then the family priest prays, saying: “You, our gods, So-
and-so, look! We have done it in the daylight. It has not been done
by stealth. Bless them, give them children!” When he has done
praying, the assistants take all the half-digested grass from the
goat’s stomach and place it on the wife’s head, saying, “Go and bear
children!”92.1 Among the Wagogo of German East Africa marriage is
forbidden between cousins who are the children of two brothers or of
two sisters, but is permitted between cousins who are the children of
a brother and sister respectively. However, in this case it is usual for
the wife’s father to kill a sheep and put on a leather armlet, made
presumably from the sheep’s skin; otherwise it is supposed that the
marriage would be unfruitful.92.2 Thus the Wagogo, like the Thonga,
imagine that the marriage of cousins is doomed to infertility unless
an expiatory sacrifice is offered and a peculiar use made of the
victim’s skin. Again, the Akikuyu of British East Africa forbid the
marriage of cousins and second cousins, the children and
grandchildren of brothers and sisters. If such persons married, they
would commit a grave sin, and all their children would surely die; for
the curse or ceremonial pollution (thahu) incurred by such a crime
cannot be purged away. Nevertheless it sometimes happens that a
man unwittingly marries a first or second cousin; for instance, if a
part of the family moves away to another district, it may come about
that a man makes the acquaintance of a girl and marries her before
he discovers the relationship. In such a case, where the sin has
been committed unknowingly, the curse can be averted by the
performance of an expiatory rite. The elders take a sheep and place
it on the woman’s shoulders; there it is killed and the intestines taken
out. Then the elders solemnly sever the intestines with a sharp
splinter of wood taken from a bush of a certain sort (mukeo), “and
they announce that they are cutting the clan kutinyarurira, by which
they mean that they are severing the bond of relationship which
exists between the pair. A medicine man then comes and purifies the
couple.”93.1 In all these cases we may assume with a fair degree of
probability that the old prohibition of marriage between cousins is
breaking down, and that the expiatory sacrifice offered when such a
marriage does take place is merely a salve to the uneasy conscience
of those who commit or connive at a breach of the ancient taboo.
Thus the prohibition of marriage between
The mutual cousins, and the rules of ceremonial avoidance
avoidance of male
and female cousins observed in some tribes between persons who
is probably a stand in that relationship to each other, appear
precaution against both to spring from a belief, right or wrong, in the
a criminal intimacy
between them. injurious effects of such unions and from a desire
to avoid them. The mutual avoidance of the
cousins is merely a precaution to prevent a closer and more criminal
intimacy between them. If that is so, it furnishes a confirmation of the
view that all the customs of ceremonial avoidance between blood
relations or connexions by marriage of opposite sexes are based
simply on a fear of incest.
The theory is perhaps confirmed by the
The mutual observation that in some tribes the avoidance
avoidance between
a man and his between a man and his wife’s mother lasts only
wife’s relations until he has had a child by his wife;94.1 while in
seems to be partly
grounded on a fear others, though avoidance continues longer, it
of rendering the gradually wears away with time as the man and
wife infertile.
woman advance in years,94.2 and in others, again,
it is observed only between a man and his future mother-in-law, and
comes to an end with his marriage.94.3 These customs suggest that
in the minds of the people who practise them there is a close
connexion between the avoidance of the wife’s relations and the
dread of an infertile marriage. The Indians of Yucatan, as we saw,
believe that if a betrothed man were to meet his future mother-in-law
or father-in-law, he would thereby lose the power of begetting
children. Such a fear seems to be only an extension by false analogy
of that belief in the disastrous consequences of illicit sexual relations
which we dealt with in an earlier part of this chapter,94.4 and of which
we shall have more to say presently.94.5 From thinking, rightly or
wrongly, that sexual intercourse between certain persons is fraught
with serious dangers, the savage jumped to the conclusion that
social intercourse between them may be also perilous by virtue of a
sort of physical infection acting through simple contact or even at a
distance; or if, in many cases, he did not go so far as to suppose that
for a man merely to see or touch his mother-in-law sufficed to blast
the fertility of his wife’s womb, yet he may have thought, with much
better reason, that intimate social converse between him and her
might easily lead to something worse, and that to guard against such
a possibility it was best to raise a strong barrier of etiquette between
them. It is not, of course, to be supposed that these rules of
avoidance were the result of deliberate legislation; rather they were
the spontaneous and gradual growth of feelings and thoughts of
which the savages themselves perhaps had no clear consciousness.
In what precedes I have merely attempted to sum up in language
intelligible to civilized man the outcome of a long course of moral and
social evolution.
These considerations perhaps obviate to some extent the only
serious difficulty which lies in the way of the theory here advocated.
If the custom of avoidance was adopted in order to
The mutual guard against the danger of incest, how comes it
avoidance between
persons of the that the custom is often observed towards persons
same sex was of the same sex, for example, by a man towards
probably an his father-in-law as well as towards his mother-in-
extension by false
analogy of the law? The difficulty is undoubtedly serious: the only
mutual avoidance way of meeting it that I can suggest is the one I
between persons of
different sexes.
have already indicated. We may suppose that the
deeply rooted beliefs of the savage in the fatal
effects of marriage between certain classes of persons, whether
relations by blood or connexions by marriage, gradually spread in his
mind so as to embrace the relations between men and men as well
as between men and women; till he had worked himself into the
conviction that to see or touch his father-in-law, for example, was
nearly or quite as dangerous as to touch or have improper relations
with his mother-in-law. It is no doubt easy for us to detect the flaw in
this process of reasoning; but we should beware of casting stones at
the illogical savage, for it is possible or even probable that many of
our own cherished convictions are no better founded.
Viewed from this standpoint the customs of
The custom of ceremonial avoidance among savages assume a
mutual avoidance
between near serious aspect very different from the appearance
relations has of arbitrariness and absurdity which they are apt to
probably had the present to the civilized observer who does not look
effect of checking
the practice of below the surface of savage society. So far as
inbreeding. these customs have helped, as they probably
have done, to suppress the tendency to
inbreeding, that is, to the marriage of near relations, we must
conclude that their effect has been salutary, if, as many eminent
biologists hold, long-continued inbreeding is injurious to the stock,
whether animal or vegetable, by rendering it in the end infertile.95.1
However, men of science are as yet by no means agreed as to the
results of consanguineous marriages, and a living authority on the
subject has recently closed a review of the evidence as follows:
“When we take into account such evidence as there is from animals
and plants, and such studies as those of Huth,95.2 and the instances
and counter-instances of communities with a high degree of
consanguinity, we are led to the conclusion that the prejudices and
laws of many peoples against the marriage of near kin rest on a
basis not so much biological as social.”96.1 Whatever may be the
ultimate verdict of science on this disputed question, it will not affect
the result of the present enquiry, which merely affirms the deep and
far-reaching influence which in the long course of human history
superstition has exercised on morality. Whether the influence has on
the whole been for good or evil does not concern us. It suffices for
our purpose to shew that superstition has been a crutch to morality,
whether to support it in the fair way of virtue or to precipitate it into
the miry pit of vice. To return to the point from which we wandered
into this digression, we must leave in suspense the question whether
the Australian savages were wise or foolish who forbade a man
under pain of death to speak to his mother-in-law.
I will conclude this part of my subject with a few
Other examples of more instances of the extreme severity with which
the severe
punishment of certain races have visited what they deemed
sexual crime. improper connexions between the sexes.
Among the Indians who inhabited the coast of
The Indians of Brazil near Rio de Janeiro about the middle of the
Brazil. sixteenth century, a married woman who gave
birth to an illegitimate child was either killed or
abandoned to the caprice of the young men who could not afford to
keep a wife. Her child was buried alive; for they said that were he to
grow up he would only serve to perpetuate his mother’s disgrace; he
would not be allowed to go to war with the rest for fear of the
misfortunes and disasters he might draw down upon them, and no
one would eat any food, whether venison, fish, or what not, which
the miserable outcast had touched.96.2 In Ruanda,
The natives of
Ruanda.
a district of Central Africa, down to recent years
any unmarried woman who was got with child
used to be put to death with her baby, whether born or unborn. A
spot at the mouth of the Akanyaru river was the place of execution,
where the guilty women and their innocent offspring were hurled into
the water. As usual, this Puritanical strictness of morality has been
relaxed under European influence; illegitimate children are still killed,
but their mothers escape with the fine of a cow.97.1
The Saxons.
Among the Saxons down to the days of St.
Boniface the adulteress or the maiden who had dishonoured her
father’s house was compelled to hang herself, was burned, and her
paramour hung over the blazing pile; or she was scourged or cut to
pieces with knives by all the women of the village till she was
dead.97.2 Among the Slav peoples of the Balkan
The Southern
Slavs.
peninsula women convicted of immoral conduct
used to be stoned to death. About the year 1770 a
young betrothed couple were thus executed near Cattaro in
Dalmatia, because the girl was found to be with child. The youth
offered to marry her, and the priest begged that the sentence of
death might be commuted to perpetual banishment; but the people
declared that they would not have a bastard born among them; and
the two fathers of the luckless couple threw the first stones at them.
When Miss M. Edith Durham related this case to some Montenegrin
peasantry, they all said that in the old days stoning was the proper
punishment for unchaste women; the male paramours were shot by
the relations of the girls whom they had seduced. When “that
modern Messalina,” Queen Draga of Servia, was murdered, a
decent peasant woman remarked that “she ought to be under the
cursed stone heap” (pod prokletu gomilu). The country-folk of
Montenegro, who heard the news of the murder from Miss Durham,
“looked on it as a cleansing—a casting out of abominations—and
genuinely believed that Europe would commend the deed, and that
the removal of this sinful woman would bring prosperity to the
land.”97.3 Even down to the second half of the nineteenth century in
cases of seduction among the Southern Slavs the people proposed
to stone both the culprits to death.98.1 This happened, for example, in
Herzegovina in the year 1859, when a young man named Milutin
seduced or (to be more exact) was seduced by three unmarried girls
and got them all with child. The people sat in judgment upon the
sinners, and, though an elder proposed to stone them all, the court
passed a milder sentence. The young man was to marry one of the
girls, to rear the infants of the other two as his legitimate children,
and next time there was a fight with the Turks he was to prove his
manhood by rushing unarmed upon the enemy and wresting their
weapons from them, alive or dead. The sentence was fulfilled to the
letter, though many years passed before the culprit could carry out
the last part of it. However, his time came in 1875, when
Herzegovina revolted against the Turks. Then Milutin ran unarmed
upon a regiment of the enemy and found among the Turkish
bayonets a hero’s death.98.2 Even now the Old Catholics among the
South Slavs believe that a village in which a seducer is not
compelled to marry his victim will be punished with hail and
excessive rain. For this article of faith, however, they are ridiculed by
their enlightened Catholic neighbours, who hold the far more
probable view that thunder and lightning are caused by the village
priest to revenge himself for unreasonable delays in the payment of
his salary. A heavy hail-storm has been known to prove almost fatal
to the local incumbent, who was beaten within an inch of his life by
his enraged parishioners.98.3
It is difficult to believe that in these and similar
Inference from the cases the community would inflict such severe
severe punishments
inflicted for sexual punishment for sexual offences if it did not believe
offences. that its own safety, and not merely the interest of a
few individuals, was imperilled thereby.
If now we ask why illicit relations between the
Why should illicit sexes should be supposed to disturb the balance
relations between
the sexes be of nature and particularly to blast the fruits of the
thought to disturb earth, a partial answer may be conjecturally
the balance of suggested. It is not enough to say that such
nature?
relations are displeasing to the gods, who punish
indiscriminately the whole community for the sins of a few. For we
must always bear in mind that the gods are creations of man’s fancy;
he fashions them in human likeness, and endows them with tastes
and opinions which are merely vast cloudy projections of his own. To
affirm, therefore, that something is a sin because the gods will it so,
is only to push the enquiry one stage farther back and to raise the
further question, Why are the gods supposed to dislike and punish
these particular acts? In the case with which we
The reason why the are here concerned, the reason why so many
gods of savages
are supposed to savage gods prohibit adultery, fornication, and
punish sexual incest under pain of their severe displeasure may
crimes so severely
may perhaps be perhaps be found in the analogy which many
found in a mistaken savage men trace between the reproduction of the
belief that human species and the reproduction of animals
irregularities of the
human sexes and plants. The analogy is not purely fanciful, on
prevent the the contrary it is real and vital; but primitive
reproduction of peoples have given it a false extension in a vain
edible animals and
plants and thereby attempt to apply it practically to increasing the food
supply. They have imagined, in fact, that by
strike a fatal blow at
the food supply. performing or abstaining from certain sexual acts
they thereby directly promoted the reproduction of
animals and the multiplication of plants.99.1 All such acts and
abstinences, it is obvious, are purely superstitious and wholly fail to
effect the desired result. They are not religious but magical; that is,
they compass their end, not by an appeal to the gods, but by
manipulating natural forces in accordance with certain false ideas of
physical causation. In the present case the principle on which
savages seek to propagate animals and plants is that of magical
sympathy or imitation: they fancy that they assist the reproductive
process in nature by mimicking or performing it among themselves.
Now in the evolution of society such efforts to control the course of
nature directly by means of magical rites appear to have preceded
the efforts to control it indirectly by appealing to the vanity and
cupidity, the good-nature and pity of the gods; in short, magic seems
to be older than religion.100.1 In most races, it is true, the epoch of
unadulterated magic, of magic untinged by religion, belongs to such
a remote past that its existence, like that of our ape-like ancestors,
can be a matter of inference only; almost everywhere in history and
the world we find magic and religion side by side, at one time allies,
at another enemies, now playing into each other’s hands, now
cursing, objurgating, and vainly attempting to exterminate one
another. On the whole the lower intelligences cling closely, though
secretly, to magic, while the higher intelligences have discerned the
vanity of its pretensions and turned to religion instead. The result has
been that beliefs and rites which were purely magical in origin often
contract in course of time a religious character; they are modified in
accordance with the advance of thought, they are translated into
terms of gods and spirits, whether good and beneficent, or evil and
malignant. We may surmise, though we cannot prove, that a change
of this sort has come over the minds of many races with regard to
sexual morality. At some former time, perhaps, straining a real
analogy too far, they believed that those relations of the human
sexes which for any reason they regarded as right and natural had a
tendency to promote sympathetically the propagation of animals and
plants and thereby to ensure a supply of food for the community;
while on the contrary they may have imagined that those relations of
the human sexes which for any reason they deemed wrong and
unnatural had a tendency to thwart and impede the propagation of
animals and plants and thereby to diminish the common supply of
food.
Such a belief, it is obvious, would furnish a
Such a belief would sufficient motive for the strict prohibition of what
account both for the
horror with which were deemed improper relations between men
many savages and women; and it would explain the deep horror
regard such crimes, and detestation with which sexual irregularities are
and for the severity
with which they viewed by many, though certainly not by all,
punish them. savage tribes. For if improper relations between
the human sexes prevent animals and plants from
multiplying, they strike a fatal blow at the existence of the tribe by
cutting off its supply of food at the roots. No wonder, therefore, that
wherever such superstitions have prevailed the whole community,
believing its very existence to be put in jeopardy by sexual
immorality, should turn savagely on the culprits, and beat, burn,
drown or otherwise exterminate them in order to rid itself of so
dangerous a pollution. And when with the advance of knowledge
men began to perceive the mistake they had made in imagining that
the commerce of the human sexes could affect the propagation of
animals and plants, they would still through long habit be so inured
to the idea of the wickedness of certain sexual relations that they
could not dismiss it from their minds, even when they discerned the
fallacious nature of the reasoning by which they had arrived at it. The
old practice would therefore stand, though the old theory had fallen:
the old rules of sexual morality would continue to be observed, but if
they were to retain the respect of the community, it was necessary to

You might also like