Professional Documents
Culture Documents
Data Science Foundations Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics 1st Edition Fionn Murtagh
Data Science Foundations Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics 1st Edition Fionn Murtagh
https://textbookfull.com/product/its-all-analytics-the-
foundations-of-ai-big-data-and-data-science-landscape-for-
professionals-in-healthcare-business-and-government-scott-burk/
https://textbookfull.com/product/big-data-analytics-systems-
algorithms-applications-c-s-r-prabhu/
https://textbookfull.com/product/from-big-data-to-big-profits-
success-with-data-and-analytics-1st-edition-russell-walker/
https://textbookfull.com/product/big-data-and-analytics-for-
insurers-1st-edition-boobier/
Big and Complex Data Analysis Methodologies and
Applications Ahmed
https://textbookfull.com/product/big-and-complex-data-analysis-
methodologies-and-applications-ahmed/
https://textbookfull.com/product/foundations-of-data-science-
avrim-blum/
https://textbookfull.com/product/data-science-and-big-data-an-
environment-of-computational-intelligence-1st-edition-witold-
pedrycz/
https://textbookfull.com/product/understanding-azure-data-
factory-operationalizing-big-data-and-advanced-analytics-
solutions-sudhir-rawat/
https://textbookfull.com/product/big-data-analytics-with-
java-1st-edition-rajat-mehta/
DATA SCIENCE
FOUNDATIONS
Geometry and Topology
of Complex Hierarchic Systems
and Big Data Analytics
Chapman & Hall/CRC
Computer Science and Data Analysis Series
The interface between the computer and statistical sciences is increasing, as each
discipline seeks to harness the power and resources of the other. This series aims to
foster the integration between the computer sciences and statistical, numerical, and
probabilistic methods by publishing a broad range of reference works, textbooks, and
handbooks.
SERIES EDITORS
David Blei, Princeton University
David Madigan, Rutgers University
Marina Meila, University of Washington
Fionn Murtagh, Royal Holloway, University of London
Proposals for the series should be sent directly to one of the series editors above, or submitted to:
Published Titles
®
Computational Statistics Handbook with MATLAB , Third Edition
Wendy L. Martinez and Angel R. Martinez
R Graphics
Paul Murrell
DATA SCIENCE
FOUNDATIONS
Geometry and Topology
of Complex Hierarchic Systems
and Big Data Analytics
Fionn Murtagh
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity
of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of
users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface xiii
vii
viii Contents
Bibliography 187
Index 203
Preface
This is my motto: Analysis is nothing, data are everything. Today, on the web, we
can have baskets full of data . . . baskets or bins?
Jean-Paul Benzécri, 2011
This book describes solid and supportive foundations for the data science of our times,
with many illustrative cases. Core to these foundations are mathematics and computational
science. Our thinking and decision-making in regard to data can follow the insightful ob-
servation by the physicist Paul Dirac that physical theory and physical meaning have to
follow behind the mathematics (see Section 4.7). The hierarchical nature of complex reality
is part and parcel of this mathematically well-founded way of observing and interacting
with physical, social and all realities.
Quite wide-ranging case studies are used in this book. The text, however, is written in an
accessible and easily grasped way, for a reader who is knowledgeable and engaged, without
necessarily being an expert in all matters. Ultimately this book seeks to inspire, motivate and
orientate our human thinking and acting regarding data, associated information and derived
knowledge. This book seeks to give the reader a good start towards practical and meaningful
perspectives. Also, by seeking to chart out future perspectives, this book responds to current
needs in a way that is unlike other books of some relevance to this field, and that may be
great in their own specialisms.
The field of data science has come into its own, in a highly profiled way, in recent times.
Ever increasing numbers of employees are required nowadays, as data scientists, in sectors
that range from retail to regulatory, and so much besides. Many universities, having started
graduate-level courses in data science, are now also starting undergraduate courses. Data
science encompasses traditional disciplines of computational science and statistics, data
analysis, machine learning and pattern recognition. But new problem domains are arising.
Back in the 1970s and into the 1980s, one had to pay a lot of attention to available memory
storage when working with computers. Therefore, that focus of attention was on stored
data directly linked to the computational processing power. By the beginning of the 1990s,
communication and networking had become the focus of attention. Against the background
of regulatory and proprietary standards, and open source communication protocols (ISO
standards, Decnet, TCP/IP protocols, and so on), data access and display protocols became
so central (File Transfer Protocol, gopher, Veronica, Wide Area Information Server, and
Hypertext Transfer Protocol). So the focus back in those times was on: firstly, memory
and computer power; and secondly, communications and networking. Now we have, thirdly,
data as the prime focus. Such waves of technology developments are exciting. They motivate
the tackling of new problems, and also there may well be the requirement for new ways of
addressing problems. Such requirement of new perspectives and new approaches is always
due to contemporary inadequacies, limitations and underperformance. Now, we move on to
our interacting with data.
This book targets rigour, and mathematics, and computational thinking. Through avail-
able data sets and R code, reproducibility by the reader of results and outcomes is facilitated.
Indeed, understanding is also facilitated through “learning by doing”. The case studies and
xiii
xiv Preface
the available data and software codes are intended to help impart the data science phi-
losophy in the book. In that sense, dialoguing with data, and “letting the data speak”
(Jean-Paul Benzécri), are the perspective and the objective. To the foregoing quotations,
the following will be added: “visualization and verbalization of data” (cf. [34]).
Our approach is influenced by how the leading social scientist, Pierre Bourdieu, used the
most effective inductive analytics developed by Jean-Paul Benzécri. This family of geomet-
ric data analysis methodologies, centrally based on correspondence analysis encompassing
hierarchical clustering, and statistical modelling, not only organizes the analysis method-
ology and domain of application but, most of all, integrates them. An inspirational set of
principles for data analytics, listed in [24] (page 6), included the following: “The model
should follow the data, and not the reverse. . . . What we need is a rigorous method that
extracts structures from data.” Closely coupled to this is that “data synthesis” could be
considered as equally if not more important relative to “data analysis” [27]. Analysis and
synthesis of data and information obviously go hand in hand.
A very minor note is the following. Analytics refers to general and generic data process-
ing, obtaining information from data, while analysis refers to specific data processing.
We have then the following. “If I make extensive use of correspondence analysis, in
preference to multivariate regression, for instance, it is because correspondence analysis is a
relational technique of data analysis whose philosophy corresponds exactly to what, in my
view, the reality of the social world is. It is a technique which ‘thinks’ in terms of relation,
as I try to do precisely in terms of field” (Bourdieu, cited in [133, p. 43]).
“In Data Analysis, numerous disciplines need to collaborate. The role of mathematics,
although essential, is modest, in the sense that one uses almost exclusively classical the-
orems or elementary demonstration techniques. But it is necessary that certain abstract
conceptions enter into the spirits of the users, the specialists who collect the data and who
should orientate the analysis according to fundamental problems that are appropriate to
their science” [27].
No method is fruitful unless the data are relevant: “analysing data is not the collecting
of disparate data and seeing what comes out of the computer” [27]. In contradistinction
to statistics being “technical control” of process, certifying that work has been carried out
in conformance with rules, there with primacy accorded to being statistically correct, even
asking if such and such a procedure has the right to be used – in contradistinction to that,
there is relevance, asking if there is interest in using such and such a procedure.
Another inspirational quotation is that “the construction of clouds leads to the mastery
of multidimensionality, by providing ‘a tool to make patterns emerge from data’” (this is
from Benzécri’s 1968 Honolulu conference, when the 1969 proceedings had the paper, “Sta-
tistical analysis as a tool to make patterns emerge from data”). John Tukey (developer of
exploratory data analysis, i.e. visualization in statistics and data analysis, the fast Fourier
transform, and many other methods) expressed this as follows: “Let the data speak for
themselves!” This can be kept in mind relative to direct, immediate, unmediated statistical
hypothesis testing that relies on a wide range of assumptions (e.g. normality, homoscedas-
ticity, etc.) that are often unrealistic and unverifiable.
The foregoing and the following are in [130]. “Data analysis, or more particularly ge-
ometric data analysis is the multivariate statistical approach, developed by J.-P. Benzécri
around correspondence analysis, in which data are represented in the form of clouds of
points and the interpretation is first and foremost on the clouds of points.”
While these are our influences, it would be good, too, to note how new problem areas of
Big Data are of concern to us, and also issues of Big Data ethics. A possible ethical issue,
entirely due to technical aspects, in the massification and reduction through scale effects
that are brought about by Big Data. From [130]: “Rehabilitation of individuals. The context
Preface xv
model is always formulated at the individual level, being opposed therefore to modelling at
an aggregate level for which the individuals are only an ‘error term’ of the model.”
Now let us look at the importance of homology and field, concepts that are inherent
to Bourdieu’s work. The comprehensive survey of [108] sets out new contemporary issues
of sampling and population distribution estimation. An important take-home message is
this: “There is the potential for big data to evaluate or calibrate survey findings . . . to help
to validate cohort studies”. Examples are discussed of “how data . . . tracks well with the
official”, and contextual, repository or holdings. It is well pointed out how one case study
discussed “shows the value of using ‘big data’ to conduct research on surveys (as distinct
from survey research)”. Therefore, “The new paradigm means it is now possible to digitally
capture, semantically reconcile, aggregate, and correlate data.”
Limitations, though, are clear [108]: “Although randomization in some form is very
beneficial, it is by no means a panacea. Trial participants are commonly very different
from the external . . . pool, in part because of self-selection”. This is because “One type of
selection bias is self-selection (which is our focus)”.
Important points towards addressing these contemporary issues include the following
[108]: “When informing policy, inference to identified reference populations is key”. This is
part of the bridge which is needed between data analytics technology and deployment of
outcomes. “In all situations, modelling is needed to accommodate non-response, dropouts
and other forms of missing data.”
While “Representativity should be avoided”, here is an essential way to address in a
fundamental way what we need to address [108]: “Assessment of external validity, i.e. gen-
eralization to the population from which the study subjects originated or to other popula-
tions, will in principle proceed via formulation of abstract laws of nature similar to physical
laws”.
The bridge between the data that is analysed, and the calibrating Big Data, is well
addressed by the geometry and topology of data. Those form the link between sampled data
and the greater cosmos. Pierre Bourdieu’s concept of field is a prime exemplar. Consider, as
noted in [132], how Bourdieu’s work involves “putting his thinking in mathematical terms”,
and that it “led him to a conscious and systematic move toward a geometric frame-model”.
This is a multidimensional “structural vision”. Bourdieu’s analytics “amounted to the global
[hence Big Data] effects of a complex structure of interrelationships, which is not reducible
to the combination of the multiple [effects] of independent variables”. The concept of field,
here, uses geometric data analysis that is core to the integrated data and methodology
approach used in the correspondence analysis platform [177].
In addressing the “rehabilitation of individuals”, which can be considered as address-
ing representativity both quantitatively as well as qualitatively, there is the potential and
relevance for the many ethical issues related to Big Data, detailed in [199]. We may say
that in the detailed case study descriptions in that book, what is unethical is the arbitrary
representation of an individual by a class or group.
The term analytics platform for the science of data, which is quite central to this book,
can be associated with an interesting article by New York Times author Steve Lohr [146]
on the “platform thinking” of the founders of Microsoft, Intel and Apple. In this book
the analytics platform is paramount, over and above just analytical or software tools. In his
article [146] Lohr says: “In digital-age competition, the long goal is to establish an industry-
spanning platform rather than merely products. It is platforms that yield the lucrative
flywheel of network effects, complementary products and services and increasing returns.” In
this book we describe a data analytics platform. It is to have the potential to go way beyond
mere tools. It is to be accepted that software tools, incorporating the needed algorithms,
can come to one’s aid in the nick of time. That is good. But for a deep understanding of
all aspects of potential (i.e. having potential for further usage and benefit) and practice,
xvi Preface
“platform” is the term used here for the following: potential importance and relevance, and
a really good conceptional understanding or role. The excellent data analyst does not just
come along with a software bag of tricks. The outstanding data analyst will always strive
for full integration of theory and practice, of methodology and its implementation.
An approach to drawing benefit from Big Data is precisely as described in [108]. The
observation of the need for the “formulation of abstract laws” that bridge sampled data
and calibrating Big Data can be addressed, for the data analyst and for the application
specialist, as geometric and topological.
In summary, then, this book’s key points include the following.
One major motivation for some of this book’s content, related to the fifth item here, is
to see, and draw benefit from, the remarkable simplicity of very high dimensions, and even
infinite dimensionality. With reference to the last item here, there is a very nice statement
by Immanuel Kant, in Chapter 34 of Critique of Practical Reason (1788): “Two things fill
the mind with ever newer and increasing wonder and awe, the more often and lasting that
reflection is concerned with them: the starry sky over me, and the moral law within me.”
• Chapter 1 relates to the mapping of the semantics, i.e. the inherent meaning and sig-
nificance of information, underpinning and underlying what is expressed textually and
quantitatively. Examples include script story-line analysis, using film script, national
research funding, and performance management.
• Chapter 2 relates to a case study of change over time in Twitter. Quantification, includ-
ing even statistical analysis, of style is motivated by domain-originating stylistic and
artistic expertise and insight. Also covered is narrative synthesis and generation.
• Those two chapters comprise Part I, relating to film and movie, literature and docu-
mentation, some social media such as Twitter, and the recording, in both quantitative
and qualitative ways, of some teamwork activities.
• The accompanying website has as its aim to encourage and to facilitate learning and
understanding by doing, i.e. by actively undertaking experimentation and familiarization
with all that is described in this book.
• Next comes Part II, relating to underpinning methodology and vantage points.
xviii Preface
Paramount are geometry for the mapping of semantics, and, based on this, tree or
hierarchical topology, for lots of objectives.
• Chapter 3 relates to how hierarchy can express symmetry. Also at issue is how such
symmetries in data and information can be so revealing and informative.
• Chapter 4 is a review chapter, relating to fundamental aspects that are intriguing, and
maybe with great potential, in particular for cosmology. This chapter relates to the
theme that analytics through real-valued mathematics can be very beneficially com-
plemented by p-adic and, relatedly, m-adic number theory. There is some discussion of
relevance and importance in physics and cosmology.
• Part III relates to outcomes from somewhat more computational perspectives.
• Chapter 5 explains the operation of, and the great benefits to be derived from, linear-
time hierarchical clustering. Lots of associations with other techniques and so on are
included.
• The focus in Chapter 6 is on new application domains such as very high-dimensional
data. The chapter describes what we term informally the remarkable simplicity of very
high-dimensional data, and, quite often, very big data sets and massive data volumes.
• Part IV seeks to describe new perspectives arising out of all of the analytics here, with
relevance for various application domains.
• Chapter 7 relates to novel definitions and usage of the concept of information.
• Then Chapter 8 relates to ultrametric topology expressing or symbolically representing
human unconscious reasoning. Inspiration for this most important and insightful work
comes from the eminent psychoanalyst Ignacio Matte Blanco’s pursuit of bi-logic, the
human’s two modes of being, conscious and unconscious.
• Chapter 9 takes such analytics further, with application to very varied expressions of
narrative, embracing literature, event and experience reporting.
• Chapter 10 discusses a little the broad and general application of methods at issue here.
Part I
3
4 Data Science Foundations
• Great masses of data, textual and otherwise, need to be exploited and decisions need
to be made. Correspondence analysis handles multivariate numerical and symbolic data
with ease.
Various aspects of how we respond to these challenges will be discussed in this chapter,
complemented by the annex to the chapter. We will look at how this works, using the
Casablanca film script. Then we return to the data mining approach used, to propose that
various issues in policy analysis can be addressed by such techniques also.
and economic development policy. We will discuss initial work on the application to policy
decision-making in Section 1.3 below.
1.5 Strasser
.
.
1.0
.
Factor 2, 15% of inertia
. . Ilsa Renault
. .
. . .
0.5
. .
. .
.
NotRicks . .
Other .
Laszlo .
. . Rick Int
. . .
. . .
0.0
. . .
Day . . .
. . .
. .
.
.
. Night
−0.5
. RicksCafe
. .
Ext
.
.
−1.5 −1.0 −0.5 0.0 0.5
FIGURE 1.1: Correspondence analysis of the Casablanca data derived from the script.
The input data are presences/absences for 77 scenes crossed by 12 attributes. Just the 12
attributes are displayed. For a short review of the analysis methodology, see the annex to
this chapter.
interrelationships between characters, other attributes, and scenes, for instance closeness of
Rick’s Café with Night and Int (obviously enough).
l
x
5
y
Vertical
l
4
l
z
3
2
2 3 4 5 6
Horizontal
FIGURE 1.2: (a) Depiction of the triangle inequality. Consider a journey from location x
to location z, but via y. (b) A poetic portrayal of Huyghens.
Let us take an informal case study to see how this works. Consider the situation of
seeking documents based on titles. If the target population has at least one document that
is close to the query, then this is (let us assume) clear-cut. However, if all documents in the
target population are very unlike the query, does it make any sense to choose the closest?
Whatever the answer, here we are focusing on the inherent ambiguity, which we will note
or record in an appropriate way. Figure 1.3(a) illustrates this situation, where the query is
the point to the right. By using approximate similarity the situation can be modelled as an
isosceles triangle with small base.
As illustrated in Figure 1.3(a), we are close to having an isosceles triangle with small
base, with the red dot as apex, and with a pair of the black dots as the base. In practice,
in hierarchical clustering, we fit a hierarchy to our data. An ultrametric space has proper-
ties that are very unlike a metric space, and one such property is that the only triangles
allowed are either equilateral, or isosceles with small base. So Figure 1.3(a) can be taken as
representing a case of ultrametricity. What this means is that the query can be viewed as
having a particular sort of dominance or hierarchical relationship vis-à-vis any pair of target
documents. Hence any triplet of points here, one of which is the query (defining the apex
of the isosceles, with small base, triangle), defines local hierarchical or ultrametric struc-
ture. Further general discussion can be found in [169], including how established nearest
neighbour or best match search algorithms often employ such principles.
It is clear from Figure 1.3(a) that we should use approximate equality of the long sides
of the triangle. The further away the query is from the other data, the better is this ap-
proximation [169].
What sort of explanation does this provide for our example here? It means that the
query is a novel, or anomalous, or unusual “document”. It is up to us to decide how to treat
8 Data Science Foundations
Height
15
Property 2
10
Isosceles triangle:
approx equal long sides
z
l
5
l
l
10 20 30 40
(b) The strong triangle inequality defines an ul-
Property 1 trametric: every triplet of points satisfies the
relationship d(x, z) ≤ max{d(x, y), d(y, z)} for
(a) The query is on the upper right. While we can distance d. Check by reading off the hierarchy,
easily determine the closest target (among the how this is verified for all x, y, z: d(x, z) = 3.5,
three objects represented by the dots on the left), d(x, y) = 3.5, d(y, z) = 1.0. In addition, the sym-
is the closest really that much different from the metry and positive definiteness conditions hold
alternatives? for any pair of points.
FIGURE 1.3: (a) graphical depiction, and (b) hierarchy, or rooted tree, depiction.
such new, innovative cases. It raises, though, the interesting perspective that here we have
a way to model and subsequently handle the semantics of anomaly or innocuousness.
The strong triangle inequality, or ultrametric inequality, holds for tree distances: see
Figure 1.3(b). The closest common ancestor distance is such an ultrametric.
Correspondence analysis is in practice a tale of three metrics [171]. The analysis is based
The Correspondence Analysis Platform for Mapping Semantics 9
30
25
20
15
10
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
FIGURE 1.4: The 77 scenes clustered. These scenes are in sequence: a sequence-constrained
agglomerative criterion is used for this. The agglomerative criterion itself is a complete link
one. See [167] for properties of this algorithm.
on embedding a cloud of points from a space governed by one metric into another. The
cloud of observables is inherently related to the cloud of attributes of those observables.
Observables are defined by their attributes, and each attribute is, de facto, specified by its
associated observables. So – in the case of film script – for any one of the metrics we can
effortlessly pass between the space of film script scenes and attribute set. The three metrics
are as follows.
• Chi-squared (χ2 ) metric, appropriate for profiles of frequencies of occurrence.
• Euclidean metric, for visualization, and for static context.
• Ultrametric, for hierarchic relations and for dynamic context, as we operationally have
it here, also taking the chronology into account.
In the analysis of semantics, we distinguish two separate aspects.
1. Context – the collection of all interrelationships.
•The Euclidean distance makes a lot of sense when the population is homo-
geneous.
•All interrelationships together provide context, relativities – and hence
meaning.
2. Hierarchy tracks anomaly.
•Ultrametric distance makes a lot of sense when the observables are hetero-
geneous, discontinuous.
•The latter is especially useful for determining anomalous, atypical, innova-
tive cases.
10 Data Science Foundations
FOOTNOTES:
[8] What you call the leaf of a fern is, properly speaking, a frond.
LESSON X.
PLANTS AND THEIR PARTNERS.
Did I not tell you that the plants had taken partners and gone into
business? I said that their business was seed-growing, but that the
result of the business was to feed and clothe the world.
In our first lessons we showed you that we get all our food, clothes,
light, and fuel, first or last, from plants. “Stop! stop!” you say. “Some
of us burn coal. Coal is a mineral.” Yes, coal is a mineral now, but it
began by being a vegetable. All the coal-beds were once forests of
trees and ferns. Ask your teacher to tell you about that.
If all these things which we need come from plants, we may be very
glad that the plants have gone into business to make more plants.
Who are these partners which we told you plants have? They are the
birds and the insects. They might have a sign up, you see, “Plant,
Insect & Co., General Providers for Men.”
Do let us get at the truth of this matter at once! Do you remember
what you read about the stamens and pistils which stand in the
middle of the flower? You know the stamens carry little boxes full of
pollen. The bottom of the pistil is a little case, or box, full of seed
germs.
You know also that the pollen must creep down through the pistils,
and touch the seed germs before they can grow to be seeds. And
you also know, that unless there are new seeds each year the world
of plants would soon come to an end.
Now you see from all this that the stamens and pistils are the chief
parts of the flower. The flower can give up its calyx, or cup, and its
gay petals, its color, honey, and perfume. If it keeps its stamens and
pistils, it will still be a true seed-bearing flower.
It is now plain that the aim of
the flower must be to get
that pollen-dust safely
landed on the top of the
pistil.
You look at a lily, and you
say, “Oh! that is very easy.
Just let those pollen boxes
fly open, and their dust is
sure to hit the pistil, all right.”
But not so fast! Let me tell
you that many plants do not
carry the stamens and pistils
all in one flower. The
stamens, with the pollen
boxes, may be in one flower,
and the pistil, with its sticky
cushion to catch pollen, may
be in another flower.
More than that, these
flowers, some with stamens, THE THREE PARTNERS.
and some with pistils, may
not even be all on one plant! Have you ever seen a poplar-tree? The
poplar has its stamen-flowers on one tree, and its pistil-flowers on
another. The palm-tree is in the same case.
Now this affair of stamen and pistil and seed making does not seem
quite so easy, does it? And here is still another fact. Seeds are the
best and strongest, and most likely to produce good plants, if the
pollen comes to the pistil, from a flower not on the same plant.
This is true even of such plants as the lily, the tulip, and the
columbine, where stamens and pistils grow in one flower.
Now you see quite plainly that in some way the pollen should be
carried about. The flowers being rooted in one place cannot carry
their pollen where it should go. Who shall do it for them?
Here is where the insect comes in. Let us look at him. Insects vary
much in size. Think of the tiny ant and gnat. Then think of the great
bumble bee, or butterfly. You see this difference in size fits them to
visit little or big flowers.
You have seen the great bumble bee busy in a lily, or a trumpet
flower. Perhaps, too, you have seen a little ant, or gnat, come
crawling out of the tiny throat of the thyme or sage blossom. And you
have seen the wasp and bee, busy on the clover blossom or the
honeysuckle.
Insects have wings to take them quickly wherever they choose to go.
Even the ant, which has cast off its wings,[9] can crawl fast on its six
nimble legs.
Then, too, many insects have a long pipe, or tongue, for eating. You
have seen such a tongue on the bee.[10] In this book you will soon
read about the butterfly, with its long tube which coils up like a watch
spring.
With this long tube the insect can poke into all the slim cups, and
horns, and folds, of the flowers of varied shapes.
Is it not easy to see that when the insect flies into a flower to feed, it
may be covered with the pollen from the stamens? Did you ever
watch a bee feeding in a wild rose? You could see his velvet coat all
covered with the golden flower dust.
Why does the insect go to the flower? He does not know that he is
needed to carry pollen about. He never thinks of seed making. He
goes into the flower to get food. He eats pollen sometimes, but
mostly honey.
In business, you know, all the partners wish to make some profit for
themselves. The insect partner of the flower has honey for his gains.
The flower lays up a drop of honey for him.
In most flowers there is a little honey. Did you ever suck the sweet
drop out of a clover, or a honeysuckle? This honey gathers in the
flower about the time that the pollen is ripe in the boxes. Just at the
time that the flower needs the visit of the insects, the honey is set
ready for them.
Into the flower goes the insect for honey. As it moves about, eating,
its legs, its body, even its wings, get dusty with pollen. When it has
eaten the honey of one flower, off it goes to another. And it carries
with it the pollen grains.
As it creeps into the next flower, the pollen rubs off the insect upon
the pistil. The pistil is usually right in the insect’s way to the honey.
The top of the pistil is sticky, and it holds the pollen grains fast. So
here and there goes the insect, taking the pollen from one flower to
another.
But stop a minute. The pollen from a rose will not make the seed
germs of a lily grow. The tulip can do nothing with pollen from a
honeysuckle. The pollen of a buttercup is not wanted by any flower
but a buttercup. So of all. The pollen to do the germ any good must
come from a flower of its own kind.
What is to be done in this case? How will the insect get the pollen to
the right flower? Will it not waste the clover pollen on a daisy?
Now here comes in a very strange habit of the insect. Insects fly
“from flower to flower,” but they go from flowers of one kind to other
flowers of the same kind. Watch a bee. It goes from clover to clover,
not from clover to daisy.
Notice a butterfly. It flits here and there. But you will see it settle on a
pink, and then on another pink, and on another, and so on. If it
begins with golden rod, it keeps on with golden rod.
God has fixed this habit in insects. They feed for a long time on the
same kind of flowers. They do this, even if they have to fly far to
seek them. If I have in my garden only one petunia, the butterfly
which feeds in that will fly off over the fence to some other garden to
find another petunia. He will not stop to get honey from my sweet
peas.
Some plants have drops of honey all along up the stem to coax ants
or other creeping insects up into the flower.
But other plants have a sticky juice along the stem, to keep crawling
insects away. In certain plants the bases of the leaf-stems form little
cups, for holding water. In this water, creeping insects fall and drown.
Why is this? It is because insects that would not properly carry the
pollen to another flower, would waste it. So the plant has traps, or
sticky bars, to keep out the kind of insects that would waste the
pollen, or would eat up the honey without carrying off the pollen.
I have not had time to tell you of the many shapes of flowers. You
must notice that for yourselves.
Some are like cups, some like saucers, or plates, or bottles, or bags,
or vases. Some have long horns, some have slim tubes or throats.
Some are all curled close about the stamens and pistils.
These different kinds of flowers need different kinds of insects to get
their pollen. Some need bees with thick bodies. Some need
butterflies with long, slim tubes. Some need wasps with long, slender
bodies and legs. Some need little creeping ants, or tiny gnats.
Each kind of flower has what will coax the right kind of insects, and
keep away the wrong ones. What has the plant besides honey to
coax the insect for a visit? The flower has its lovely color, not for us,
but for insects. The sweet perfume is also for insects.
Flowers that need the visits of moths, or other insects that fly by
night, are white or pale yellow. These colors show best at night.
Flowers that need the visits of day-flying insects, are mostly red,
blue, orange, purple, scarlet.
There are some plants, as the grass, which have no sweet perfume
and no gay petals. I have told you of flowers which are only a small
brown scale with a bunch of stamens and pistils held upon it. And
they have no perfumes. These flowers want no insect partners. Their
partner is the summer wind! The wind blows the pollen of one plant
to another. That fashion suits these plants very well.
So, by means of insect or wind partners, the golden pollen is carried
far and wide, and seeds ripen.
But what about the bird partners? Where do they come in?
If the ripe seed fell just at the foot of the parent plant, and grew
there, you can see that plants would be too much crowded. They
would spread very little. Seeds must be carried from place to place.
Some light seeds, as those of the thistle, have a plume. The maple
seeds have wings. By these the wind blows them along.
But most seeds are too heavy to be wind driven. They must be
carried. For this work the plant takes its partner, the bird.
To please the eye of the bird, and attract it to the seed, the plant has
gay-colored seeds. Also it has often gay-colored seed cases. The
rose haws, you know, are vivid red. The juniper has a bright blue
berry. The smilax has a black berry. The berries of the mistletoe are
white, of the mulberry purple.
These colors catch the eye of the bird. Down he flies to swallow the
seed, case, and all. Also many seed cases, or covers, are nice food
to eat. They are nice for us. We like them. But first of all they were
spread out for the bird’s table.
Birds like cherries, plums, and strawberries. Did you ever watch a
bird picking blackberries? The thorns do not bother him. He swallows
the berries fast,—pulp and seed.
You have been told of the hard case which covers the soft or germ
part of the seed, and its seed-leaf food. This case does not melt up
in the bird’s crop or gizzard, as the soft food does. So when it falls to
the ground the germ is safe, and can sprout and grow.
Birds carry seeds in this way from land to land, as well as from field
to field. They fly over the sea and carry seeds to lonely islands,
which, but for the birds, might be barren.
So by means of its insect partners, the plant’s seed germs grow, and
perfect seeds. By means of the bird partners, the seeds are carried
from place to place. Thus many plants grow, and men are clothed,
and warmed, and fed.
FOOTNOTES:
[9] See Nature Reader, No. 2, Lessons on Ants.
[10] No. 1, Lesson 18.
LESSON XI.
AIR, WATER, AND SAND PLANTS.
Most of the plants which you see about you grow in earth or soil. You
have heard your father say that the grass in some fields was scanty
because the soil was poor. You have been told that wheat and corn
would not grow in some other field, because the soil was not rich
enough.
You understand that. The plant needs good soil, made up of many
kinds of matter. These minerals are the plant’s food. Perhaps you
have helped your mother bring rich earth from the forest, to put
about her plants.
But beside these plants growing in good earth in the usual way, there
are plants which choose quite different places in which to grow.
There are air-plants, water-plants, sand-plants. Have you seen all
these kinds of plants?
You have, no doubt, seen plants growing in very marshy, wet places,
as the rush, the iris, and the St. John’s-wort. Then, too, you have
seen plants growing right in the water, as the water-lilies, yellow and
white; the little green duck-weed; and the water crow-foot.
If you have been to the sea-shore, you have seen green, rich-looking
plants, growing in a bank of dry sand. In the West and South, you
may find fine plants growing in what seem to be drifts, or plains of
clear sand.
Air-plants are less common. Let us look at them first. There are
some plants which grow upon other plants and yet draw no food
from the plant on which they grow. Such plants put forth roots,
leaves, stems, blossoms, but all their food is drawn from the air.
I hope you may go and see some hot-house where orchids are kept.
You will see there splendid plants growing on a dead branch, or