You are on page 1of 30

Searching for Productive Causes in Big Data: The Information-Transmission

Account

Billy Wheeler

Abstract

It has been argued that the use of Big Data in scientific research casts doubt on the
need for causal knowledge in making sound predictions (Mayer-Schonberger & Cukier,
2013). In this article I argue that it is possible to search for productive causes in Big Data
if one adopts the 'information-transfer account of causation' (Illari, 2011; Illari & Russo,
2014), a version of the causal process theory. As it stands, the current formulation is
inadequate as it does not specify how information is to be measured. I consider three
concepts of information: (i) information as knowledge update, (ii) information as entropy
and (iii) information as algorithmic complexity, and argue that the last of these provides
the best way to achieve this with respect to Big Data. How this can be used to search for
causal connections among Big Data is then illustrated with respect to exposomics
research.

1. Introduction

1.1. The End of Causation?

David Hume is famous for putting forward the view that causation in the world amounts
to nothing more than constant conjunction. Event-A causes event-Bso the analysis
goesif all A-events are routinely followed by B-events. Whether Hume actually held

this position is contentious. Less controversial is Hume's opinion about the role our idea
of causation plays in our inferential practices. According to Hume:

‘Tis evident that all reasonings concerning matters of fact are founded on the

relation of cause and effect, and that we can never infer the existence of one
object from another, unless they be connected together. (1739, p. 649)

1
This picture concerning the role of causation has, for a long time, seemed to be

supported by the practice of natural science. Scientists start with a hypothesis about
which events cause other events. This hypothesis is then subject to experiment and once

confirmed forms the basis for a causal law which can be used to predict and manipulate
the world around us. Despite its widespread appeal, this picture has recently been called

into question by the arrival of Big Data as a source of scientific knowledge. In his famous
article (2008), Anderson claims that features of Big Data that make it different from

traditional evidence rules out the need for top-down 'theory-driven' science. Instead, he
recommends a 'data-driven' science where the correlations speak for themselves. A

similar sentiment is presented in Viktor Mayer-Schonberger & Kenneth Cukier (2013),


although they are far more explicit about the consequences this has for causation:

The ideal of identifying causal mechanisms is a self-congratulatory illusion; big


data overturns this. Yet again we are at a historical impasse where "god is dead."

(2013, p. 18)

Turning traditional slogans on their head, enthusiasts of Big Data proclaim that
'correlation supersedes causation'. With the arrival of Big Data establishing causal

connections is no longer needed to successfully predict and intervene in the world. We


can get by just as well with correlations and searching for causation is an expensive and

time-consuming diversion.

Enthusiasts of Big Data point to numerous success stories to support their claim that it
heralds a new age of 'doing science', free from theory and free from causal inference. An

oft-cited example is ‘Google Flu Trends’ (Ginsberg, et al. 2009). Using existing data from
influenza outbreak in the United States, Google looked for correlations between

infected areas and common search terms. Eventually they found a correlation which was
able to predict the spread of flu much faster than official routes (such as doctor records).
Another well-known case was the surprising correlation found by Wal-Mart between

2
hurricane reports and the sale of Pop-Tarts: every time a hurricane was announced, sales

increased. Wal-Mart were able to exploit this by placing Pop-Tarts closer to the store
front during stormy weather, thereby increasing profits.

In these two cases causation was nowhere to be found. Correlations went in and
predictions came outthere was no need to formulate a causal hypothesis at any stage.

But to hold that this applies to the use of Big Data in science, as some Big Data
enthusiasts suggest, is to overlook one crucial fact. In the worlds of finance and health

science there is an added pressure to get predictions, especially when lives and money
are at stake. In some cases it might well be worth making predictions on far than sturdy

evidenceespecially if (i) the consequences of doing so and getting it wrong are small
and (ii) the consequences of doing so and getting it right are big. The chances of getting

it wrong might be worth it if the payoff is high enough.

This cost-benefit appraisal is enhanced by Big Data, especially given the speeds at which
it can be utilised. But it doesn't exclude cognitive understanding. A recent study by

Canali (2015) shows that scientists using Big Data in the field of exposomics employ
practices that are indicative of the search for causal knowledge. Even more explicitly, the

Causality Workbench Project (Guyon, et al. 2011) aims to find computer programs that
can search and discover causal links (and not merely correlations) among Big Data. What

this suggests is that the arrival of Big Data calls for a rethink of the role of
causationand how it is inferredrather than relegation.

1.2. Rethinking Causation for Data-Intensive Science

In this paper I will investigate whether there is a role to play for productive intuitions

about causation in the use of Big Data. In particular, I will focus on the question: given
the nature of Big Data, is it possible to infer productive causes as opposed to only
difference-making causes?

3
Metaphysical theories of causation are frequently separated into two groups (Hall,

2004). Firstly, there are those who take after Hume in thinking that causation in the
world amounts to nothing more than regularity or constant conjunction. Sophisticated

versions of this intuition might appeal to probabilities (Reichenbach, 1956),


counterfactuals (Lewis, 1973) and interventions (Woodward, 2003). The crucial point is

that the occurrence of the cause makes a difference to the occurrence of the effect, but
there is no 'tie' or 'link' between them. For this reason these theories are often clustered

under the title of 'difference-making' accounts. Secondly, there are those who think that
to say C causes E means that there is a connection, usually understood as a mechanism

or process, linking C to E. These views are called 'productive' because they suggest the
cause in some way produces or 'brings about’ the effect.

There is evidence that scientists are interested in finding both difference-making and
productive causes (Russo & Williamson, 2007 & 2011; Clarke et al., 2013 & 2014) and

that neither has an exclusive hold on the practice of science. The nature of Big Data
challenges both intuitions. A number of aspects of Big Data make it interesting to study

from a methodological standpoint. Here I will focus on just two: (i) the observational

nature of the data collected and (ii) its automated analysis by technology. Whilst not all
examples of Big Data exhibit these two features (some data may be circumscribed and
some will be analysed manually by human scientists) it covers a sufficiently large
number of cases to capture what's different about it.1

These two features prove problematic for traditional ways of thinking about causation
and causal inference in science. Turning to difference-making accounts, the study of Big

Data happens after the means of production. This means it is not possible to intervene
at any stage and see what might have happened had the circumstances been different.

1
For example the data-intensive studies carried out by the exposomics project (Vineis, et al. (2009) and the GWAS
project (Ratti, 2015) both exhibit these features.

4
This rules out certain interventionist approaches as well as counterfactual approaches to

causation.

Production notions of causation seem equally problematic. The most well-known

process theory is the conserved quantities view (Salmon, 1998; Dowe, 2000) which
identifies causal interaction as the exchange of a conserved quantity, e.g. mass-energy,

charge and momentum. Yet few Big Data sets are about these kinds of properties and
so this approach has limited use in inferring causes outside the physical sciences.

Identifying evidence of complex mechanisms in Big Data appears an even more


formidable task: mechanisms require more than just the cause and effect event, they

also need activities and entities to connect the two (Machamer, Darden & Craver 2000;
Glennan, 2002). Whilst human scientists may be able to draw on past evidence to

hypothesise a mechanism connecting two correlated variables, it is hard to see how this
inference could be automatedat least given the current computational resources

available.

Wolfgang Pietsch (2015) recently took up the task of providing a difference-making


account of causation compatible with data-intensive science. He gives a sophisticated

version of the regularity view based on eliminitive induction. In brief: a factor or variable
C is causally relevant for E if, and only if, in a fixed context B one can find instances of

both 'C&E' and 'not-C & not-E' (2015, p. 12). But we have seen that scientists need
evidence of both difference-making and production in order to infer the existence of

causal connections. So here I will investigate whether a notion of productive cause exists
which is suitable for use in Big Data analysis. I will do this by exploring a recent

alternative version of the process view known as the ‘information-transmission account’


(Collier 1999, 2010; Illari, 2011 ; Illari & Russo, 2015).

In section 2 I will outline the information-transmission account in detail and explain how
it arose out of an attempt to overcome problems with Dowe’s conserved quantities

5
view. The main difficulty with the information-transmission account as it currently stands

is it does not tell us how to measure information, yet without this we cannot track its
flow and therefore identify causal connections among data. To remedy this, I consider

three well-known concepts of information: knowledge updates, entropy and algorithmic


complexity. Section 3 will be largely theoretical as each of the concepts is used to

formulate a different version of the transmission account. Out of all three, I argue that
interpreting information as algorithmic complexity provides the best concept for a

theory of causation as well as being practical in the search for causal links. Finally in
section 4 I illustrate how this concept of information could be used to search for

evidence of productive causes in data-intensive science.

2. The Information-Transmission Account

2.1. The Conserved Quantities View (CQV)

The CQV can be seen as combining two earlier notions for understanding productive

causes: those of 'transfer' and 'process'. Transfer approaches to causality had been
advocated by Ron Aronson (1971) and David Fair (1979). According to Aronson, two

events are causally connected if an amount of a physical quantity, such as velocity,


momentum, heat etc., is transferred from the cause to the effect. The collision of two
billiard-balls provides a standard example. When one ball strikes another there is an
exchange of momentum between the two balls. This suffices to make the connection

causal, according to Aronson.2 To rule out possible accidental exchanges, both Aronson
and Fair demand that the transferred quantity retain the same identity between the

exchanges.

Philip Dowe criticises these early transfer views over the nature of the identity that is
required. Convincingly, Dowe argues it is impossible to trace the identity of the

2
Fair's view is slightly different in that he identifies the transferred property
exclusively as 'energy'.

6
exchanged quantities Aronson and Fair give (2000, pp. 55-59). Energy, velocity,

momentum etc., lose their numerical identity when exchanged. If two ball bearings are
swung on a Newton's cradle exchanging their momentum along the stationary ones, it is

not possible to say from which striking ball their gain in momentum came from.
Although they might not retain their numerical identity, they do however keep their

numerical equality: the total amount of momentum is constant throughout the process.
Dowe therefore identifies the exchanged properties as physical conserved quantities,

which are all only those quantities described by a conservation law. The relation
between cause and effect is then one of 'exchange' of a conserved quantity; principally

either mass-energy, momentum or charge.

Another criticism Dowe has of Aronson and Fair's transfer views is that they fail to

capture what he calls 'immanent causation' (2000, pp. 52-55). A wrench spinning in
space can have its movement explained by its own inertiayet it is clear there is no

transfer of momentum from the spanner to anything else. To handle these cases Dowe
makes use of the idea of a 'causal process'. The wrench is a causal process which

transmits momentum from one point in space-time to another. The idea of a causal

process previously played an important role in Wesley Salmon's (1984) mark-


transmission theory. According to Salmon causal processes can be separated from
'pseudo processes' by the ability of the formerbut not the latterto transmit a mark.
Since this approach makes use of processes having 'abilities', Salmon ultimately rejected

it as depending on counterfactuals (Salmon, 1994).

Dowe was able to resurrect the 'process intuition' by demanding of causal

processesnot that they have the ability to transmit a markbut that they actually
transmit a conserved quantity. Causal interaction, what Dowe calls 'transient causation',

can then be explained as the exchange of a conserved quantity between two (or more)
causal processes.

7
CQ1. A causal process is a world line of an object that possesses a conserved

quantity.

CQ2. A causal interaction is an intersection of world lines that involves exchange

of a conserved quantity.

The CQV falls down as a suitable notion for data-intensive science in its applicability
(Illari, 2011; Illari & Russo, 2014). By its very nature, the CQV only predicts causation

where there is the exchange of a fundamental physical property. But most data-
intensive science takes place in the biological and social sciences where the data is

unlikely to be anything about these. To remedy this, Boniolo et al. (2014) propose
replacing Dowe's conserved quantities with what they call 'extensive quantities'. An

extensive quantity is a quantity of an object such that the quantity for the total volume
of the object is equal to the sum total of its parts. This makes mass an extensive quantity

since the mass of an object is the sum of its parts, whereas it excludes velocity, since the
velocity of an object is the same as the velocity of its individual parts. Other examples of

extensive quantities include no. of moles, entropy, and angular momentum. Other
examples of intensive quantities include temperature, colour, and shape.

Yet even extensive quantities may not be enough to cover all the cases we are
interested in. Whilst Boniolo et al. have identified a greater range of potentially
exchanged quantities; these are still limited to properties in the physical sciences. But as

we have seen, causal inference occurs in many fields, and it is unlikely the scientist
would have access this kind of data in all of the cases that may use Big Data as evidence.

2.2. Information as a Conserved Quantity

The previous discussion shows that neither Dowe nor Biololo et al., have provided a
version of the CQV that is suitably general for analysing Big Data for causal connections.
The lack of applicability was one motivating factor for Illari (2011) and Illari & Russo

8
(2014) to propose that the transferred quantity along and between causal processes

should be understood as information. A similar view had also been advocated by John
Collier (1999) who writes 'The basic idea is that causation is the transfer of a particular

token of a quantity of information from one state of a system to another' (1999, p. 215).

Thinking of the transferred quantity as information does appear to give us what we

need. Any data we have can be considered informative, whether that data is about the
charge of an electric plate or the age of an individual. Since data is the primary carrier of

information it is a more suitable concept for automated analysis and computer scientists
have a good deal of experience of quantifying data and making comparative claims

about its 'size'.

Coming back to the CQV, we could think of the causal processes as channels which
transmit data and the intersection of two causal processes (where causal interaction

occurs) as the transfer of information between two such channels. Illari and Russo (2014)
argue that this synthesises the evidence already established that uncovering

mechanisms is important in finding productive causes. They propose that the channel
through which the information flows is the mechanism in question. Here I will not

speculate on what precisely the nature of physical channels is, as my concern will be
more with the nature of the information which flows along it. For that reason I will use

'causal process' and 'channel' interchangeably whilst noting that at some point the
advocate of information-transmission owes us an account of what channels are and how

they permit the flow of information.

Both Collier (1999 & 2010) and Illari & Russo (2014) envision that the transferred
information along and between channels must be the same information, understood in

terms of numerical identity. As we have seen, some quantities do not possess identity
over time, and information appears to work like this as well. If I give somebody the
information that 'Tomorrow is Sunday', I do not lose this information. I still retain it even

9
though I have shared it. Yet if information tracked numerical identity surely I would lose

possession? This suggests that if we are to use 'information' as a transferred quantity


then we should adopt Dowe's position of numerical equality rather than identity.

This raises the question of whether or not information is a conserved quantity. In physics
it has been a standard assumption that in a closed system information cannot be

created or destroyed. The one exception to this is black holes (see John Preskill (1992)
for an overview of the problem), yet even the latest evidence here suggests black holes

may not have the ability to destroy information after all (Hawking, 2015). Information,
therefore, appears to be a safe proposal for a quantity that retains numeral equality

when transmitted and transferred.

We can adopt as a working theory a version of the CQV which we might call the
Informational Conserved Quantities View or i-CQV for short:

i-CQV 1: A causal process is a world line of an object that conserves information.

i-CQV 2: A causal interaction is an intersection of causal processes whose sum


total information is conserved.

These definitions are meant to preserve as much as possible of Dowe's original insight

but replace the multitude of physical conserved quantities with the single conserved
quantity of 'information'.

Thinking of causation as the transfer of information seems like a promising idea, but as

it stands, it is more of an analogy than a fully-fledged view. As Illari and Russo


themselves acknowledge:

The challenge is precisely to find a concept that covers the many diverse kinds of

causal linking in the world, that nevertheless says something substantive about
causality (2014, p. 148).

10
The generality of the concept of information is its strength and its weakness. As has

been remarked many times the concept of information is multifaceted and difficult to
pin-down.3 Whilst the many different applications of the term 'information' might be

useful for understanding causation in each respective domain, it will be important to


find a singular concept, if we are to have a philosophically significant and general

account of causation that covers a variety of cases. What's more, we need a concept of
information that fulfils much the same role as the conserved quantities in the CQV.

Three aspects of conserved quantities are essential for their role:

 First, whether or not a quantity is transferred between two objects is an objective

matter, not relative to an individual observer.

 Secondly, the amount of quantity transferred must be measurable. Whilst


'quantifiable' will do for causation in itself, the quantities need to be actually

measurable, if it is to provide any grounding for causal inference.

 Thirdly, each of the quantities (q) must be additive such that q(A)+ q(B) = q(A+B).
This is needed to ensure the amount of a quantity is conserved between causal

interaction.

Whilst this list is a good starting point it still leaves a range of possible concepts of
information to choose from. I will now turn to assess three of these concepts: (i)

information as knowledge update, (ii) information as entropy, and (iii) information as


algorithmic complexity. I have chosen these because each is relatively well known,

influential in each one's domain of application and general enough to cover all fields in
which causal inference might take place.

3. Three Concepts of Information

3
See Floridi (2010) for an extensive survey of the different kinds of information that
have been recognised.

11
3.1. Information as 'Knowledge Update'

The first concept I will consider comes from the field of epistemic logic. This branch of
logic concerns itself with modeling the knowledge states of agents and how they

change when receiving new information. It has been influential not just in philosophy,
but has had applications in the fields of computer science, robotics and network

security. The relevant concept here is that of the 'epistemic state' of an agent and how
that state changes. The basic idea is that when an agent receives some new piece of

information, their epistemic state changes. The epistemic state of an agent is modeled
using Kripke semantics: each possible world available to the agent is a possible way the

world could be.

To illustrate suppose an agent does not know which day of the week it is. This means
there are seven possible worlds accessible to her. She is told ‘it is a weekend day’. This

reduces the number of possibilities. Furthermore she is told ‘it is not-Saturday’. She
updates her state and there is only one possibility remaining: it must be Sunday. Every

time the agent receives a new piece of information, their state changes and we can
measure 'how informative' that information is by their change of state. This means that

this particular concept of information is semantic and qualitative since it depends upon
prior assumptions about what possibilities are available to the agent.

It might be argued that the semantic and qualitative nature of this concept rules out its

usefulness in the i-CQV. But this is not obviously the case. Take our three aspects of
conserved quantities needed to model causation: objectivity, measurability and

additivity. Although how informative a piece of information is is relative to the current


state of the agent, this can be modeled objectively, and once the current state is given,

the informativeness of that state is the same for everyone. It is not a matter of personal
opinion or subjective value and so cannot change from person to person. Its semantic
character does not exclude its measurability either: provided one has a model for the

12
agent and the possibilities, the amount of information can be calculated. Lastly, this

concept is additive: the update provided by the messages 'weekend day' and 'not-
Saturday' is equal to the single message 'weekend day and not-Saturday'.

How would the i-CQV look if information is interpreted as knowledge update? Let us
imagine that the world line of an object provides a knowledge update to an agent,

which we may take to be some kind of measurement or observation on a particular


occasion.

Causal Process: A world line is a causal process if the epistemic update received

by an agent at time t1 excludes the same possibilities as an epistemic update


received by an agent at time t2, where t1 and t2 are different points along the

world line.

Causal Interaction: There is a causal interaction between two causal processes A


and B if the total epistemic update received by an agent observing A and B at

time t1 excludes the same possibilities as the total epistemic update received by
an agent observing A and B at time t2, where t1 is a point prior to intersection

and t2 is a point after intersection.

In the case of a single causal process, what Dowe calls 'immanent causation', how much
information an agent receives by observing that object should remain the same, no

matter which time they observe it. This would be opposed to a pseudo-process which
can become more-or-less informative at different times. This seems highly plausible in

the case of knowledge updates. Let's go back to our example of the wrench spinning
alone in space. It appears that whatever the current state of the agent and no matter

what time they observe the wrench, they will receive the same knowledge update, and
therefore the same amount of information.

Unfortunately, there are a number of issues with this proposal that make it highly

13
problematic as an interpretation of the i-CQV in the long-run.

Firstly it is clear the agent in the definitions above needs to be an 'ideal agent' in a
specific epistemic context. What an agent already knows affects the informativeness of a

given piece of information. How detrimental is the inclusion of an ideal agent to this
proposal? In terms of causal inference the use of an ideal agent is not that troubling.

This is a common device in many approaches to scientific reasoning. Scientists do not


reason 'in a vacuum' and provided we are explicit about what knowledge they have

during the reasoning process, we can lay down rules for good and bad inferential
practices. The inclusion of an ideal agent is more problematic in terms of giving a

general philosophical analysis of causation 'as it is in the world'. This is because it makes
the fact about whether 'A causes B' relative to the state of the agentyet

intuitivelywe feel that whether or not causation occurs is incidental to what an agent
does or does not know.

A second worry arises from the fact that an agent can only update their knowledge once

when receiving a given piece of information. Once an agent has been told it's a
weekend day this message cannot provide any information later if they are told for a

second time. Yet it is precisely this conservation of information that we need from the
causal process. One way to get around this problem would be to place the conserved

quantity as an ability of the world line, so that it becomes a causal processes if it


possesses the ability to update an agent's epistemic state by the same degree. This is

very similar to Salmon's original mark-transmission theory. As we have seen already


(section 2.1.) Salmon rejected this on the grounds that it involves a counterfactual in

characterising the ability: in his case 'to be marked' and in our case 'to update an
epistemic state'. This would make the resulting view a version of the difference-making

approach to causation. Since our goal is to provide a production account this solution is
not one we can appeal to.

14
So far I have focused on the case of a single causal process but explicating causal

interaction proves to be even more challenging with this concept of information.

Imagine a situation in which particles A and B are passing through space with different

values for momentum, energy and charge. At a point along their world lines they collide
and transfer some of these quantities. According to the CQV the sum total of quantities

remains the same through the collision, but the sum total of knowledge updates
received before and after does not. To explain this, let t1 be a time along their word

lines prior to collision and t2 a time along their world lines after collision. An agent
observing the particles at t1 knows something about the particles at that timenamely

their energy, momentum and charge at t1. Like all updates this excludes a number of
possibilities at t1 about the world and gives them a particular amount of information.

But an agent who observes the particles at t2 cannot exclude these possibilities, as they
have no access to the properties of A and B before t2. Given their current observations

there are more possibilities available for the energy, momentum and charge at t1, since
a number of different configurations are compatible with their current observation.

This demonstrates that it is possible to learn something at t1 which cannot be learnt at

t2 and so for that reason information as 'knowledge-update' is not always conserved


during interaction. What we need is a numerical measure of information so that we can

say 'the same amount of information is conserved' rather than ‘exactly the same
information’. As the concept of information coming from epistemic logic is semantic and

qualitative it cannot provide this role. However a related concept, that of 'entropy',
might give us what we need.

3.2. Information as 'Entropy'

The next concept of information has been hugely influential, especially in electrical
telecommunication, where it has provided rigorous, mathematical definitions of optimal

15
coding, noise and channel capacity. The basic idea is that the more likely a message is

(out of all possible messages), the less informative it is and vice versa. For any given
message or symbol produced by an information source, there is an assumed probability

distribution. The 'entropy' (H) contained within a message x is given by its probability
p(x) according to the following equation (Shannon & Weaver, 1949):

Entropy: H(x) = -p(x) log2 p(x)

How useful is this idea for thinking about productive causes and causal inference? The
first thing to say is that the origin of the concept itself models nicely causation

understood as the flow or transfer of information. The original application was for
copper wires transmitting messages via electrical wave or impulse (Pierce, 1961). If we

think of causal processes as taking place along a channel, then we can readily appreciate
the relationship. Secondly, by taking the negative log of the probability, the resulting

quantity of information is additive: H(x) + H(y) = H(x+y). This makes the concept suitable
for use in the CQV which requires sum totals to conserve after interaction. Lastly, by

defining the information of a message quantitatively via its likelihood of occurring we


do not need to worry about the meaning of the message. Its semantic content or value

is irrelevant on this model. This allows us to avoid the main worry from section 3.1. that
content is not conserved during causal interaction.

At the moment I have not been precise about how we measure the entropy of a

channel, which could mean one of two things. It could mean the 'entropy rate', which is
an average of the information carried by a message per second, or it could mean the

'self-information' of a given message received by the receiver at the end of the channel.
It's not obvious how either of these relates to causal processes. The entropy rate is an

average: by definition this will remain constant. This make it difficult to separate
genuinely causal processes from pseudo-processes on the basis of conserved entropy
rate: trivially all channels will have a conserved entropy rate. Likewise, thinking of the

16
entropy as the property of a channel with respect to a particular message is problematic

if a channel produces messages with different likelihoods and therefore different


entropies.

Fortunately, the nature of single causal processes suggests we can model them as
channels transmitting a solitary message through space-time. When an agent 'receives'

that message by observation, they do not wait for another. Providing the causal process
is not interacting, it will transmit just one message, with a single amount of self-

information:

Causal Process: A causal process is a channel which transmits a message with


constant entropy value.

This has prima facie plausibility with respect to the wrench in space. The lost object

could only be one of the objects in the toolbox, therefore it has a predetermined
probability distribution. The entropy it carries remains constant and would inform the

receiver equally no matter which time they intersected it. The step to causal interaction
is straightforward:

Causal Interaction: A causal interaction occurs between two channels A and B if

the sum total of entropies before interaction equals the sum total of entropies
after interaction.

Notice that this view also says causation depends on probabilities, yet it is quite
different in nature from other difference-making probabilistic approaches such as those

of Reichenbach (1959). Here we are not defining causation in terms of 'chance-raising'


events. Instead we are saying that the chance of a message occurring remains constant

through interaction and so therefore the total information vis-a-vis entropy remains
constant.

17
Although it characterises causation differently, this version of the i-CQV inherits

concerns typically raised against probabilistic difference-makingthe most important


being how we explain where the values in the probability distribution come from. How

we do this depends on which interpretation of probability we take. For obvious reasons I


am keen to avoid a protracted discussion of the pros and cons of various interpretations

of probability. I refer the reader to Gillies (2000b) for an overview of the existing
literature. I think it is worth, however, looking at three of the most relevant

interpretations in order to highlight the difficulties the entropy view faces.

(i) The relative frequency interpretation gives an objective value of probability for an

outcome based on the number of positive instances out of all possible instances. This
view contains problems especially concerning causal inference. Firstly, there is an issue

regarding how we ascertain the values: sampling is our best option here, which is
already consistent with scientific practice, but our sampling method may fall short

through sampling bias or statistical irregularity. The case of Big Data does provide some
reprieve: if our sample contains all available data (or a high percentage of it), then these

biases and chance irregularities can be smoothed over. There is a second issue though:

as scientists are not omniscient they cannot appeal to the 'frequency in the limit' as the
number of cases approaches infinity. The probability for any outcome then is contingent
on past occurrences. As this is constantly changingso too will the entropy. A single
causal process which transmits a solitary message cannot be expected to have constant

entropy, since presumably instances of the event-type are happening elsewhere in the
universe and thus changing its probability distribution. Interpreting probability as

relative frequency has the undesirable consequence that entropy cannot be a conserved
quantity.

(ii) The aforementioned problem can be overcome if we adopt a physical propensity


interpretation (Popper, 1959). This view is also objective but claims the probability is

18
given by a propensity of a mechanism or system to produce a certain outcome. For

example, flipping a fair coin has a propensity (as a real tendency or dispositional
property of the system) to produce heads 1/2 of the time. The trouble with propensity

interpretations, as has been discussed before (Gillies 2000a, p. 825), is that their value is
underdetermined by the evidence. If an event occurs once, its relative frequency is 1, but

its physical propensity may be different. Naturally therefore this raises questions about
our ability to ever know the propensities and therefore of entropy and causal

connection. There are also metaphysical worries: philosophers attracted to causal


process theories usually do so on empirical or Humean grounds. But to borrow an

expression from John Earman (1984), propensities fail the 'empiricist loyalty test' since
two worlds could agree on all occurrent/observable facts but differ over the chances for

physical systems.

(iii) The last option to consider equates probability with subjective degrees of belief

(Ramsey, 1926). This interpretation has the virtue of already been extensively discussed
in Bayesian confirmation theory (Howson & Urbach, 1993). However this interpretation

is also problematic for thinking about causation as conservation of entropy. Like the

epistemic update view, this notion would depend on an agent and their background
beliefs. It is quite possible that here subjective degrees of belief are not conserved in
causal interaction at all, especially when the outcome of that interaction is surprising to
the agent. Alexander Flemming's combined degrees of belief of their being penicillin

mould and bacteria in his petri dish may be far higher than his belief that one would
eradicate the other. It is hard to see how conservation could be guaranteed in such

cases.

This section has shown that the main sticking point for entropy versions of the i-CQV is

its dependence on probability. Whilst this provides a quantitative theory, it requires


some explanation of the origin of the probability distribution. Well known accounts all

19
seem problematic and these problems will have to be dealt with before this becomes a

viable option.

3.3. Information as Algorithmic Complexity

The final notion of information I will consider here originates from algorithmic

information theory (AIT) which was developed independently by Ray Solomonoff (1964),
Andrei Kolmogorov (1965) and Gregory Chaitin (1966). Like Shannon's entropy concept

of information, AIT also provides a quantitative measure. The basic idea is that
informativeness is connected to complexity: the more complex an object the more

information is required to describe it. The size of the information is measured formally
as the length of a program running on a universal computing device that can produce a

description of the object. An example is best used to illustrate this idea. Compare the
following two strings:

(a) 00011010011101001011

(b) 01010101010101010101

Entropy considerations alone suggest that (a) and (b) contain the same amount of
information (assuming they are produced by an ergodic source). On closer inspection

string (b) clearly exhibits greater structure than (a), which at first glance seems random
in nature. The structure in (b) can be described by an algorithm. This makes string (b)

computationally less complex than (a). In order for a universal computer to output (a) it
would need to repeat the entire message whereas for string (b) it need only execute the

operation 'print 01 ten times'.

AIT defines the amount of information in a message or string S as the length of the
shortest program which, when executed, outputs S and halts. This quantity is known as
algorithmic or Kolmogorov complexity (K) after one of its co-discoverers.

20
Algorithmic complexity looks like a suitable concept for the i-CQV. It is objective: once

the operating language of the universal computer is given, the value of K is the same for
everyone. It is measurable: the size of a string can be given simply by just counting the

number of bits (in the case of binary). It is additive: K(S1) + K(S2) = K(S1+S2).4

A version of the i-CQV that adopts algorithmic complexity as a measure of information

would therefore look something like the following:

Causal Process: A causal process is the world line of an object that conserves
algorithmic complexity.

Causal Interaction: There is a causal interaction between causal processes A and B

if the sum total algorithmic complexity of A and B before intersection is the same
as the sum total algorithmic complexity after intersection.

In the case of the lone wrench in space, the first definition looks plausible. Regardless of
which time we describe the wrench, the total amount of resources required to describe

it fully will remain the same. Likewise for interaction. Two particles A and B which collide
and transfer physical quantities at time t will require the same amount of resources to

describe before t as they will after t.

Algorithmic complexity appears to be a promising concept of information for the i-CQV


then: it satisfies the three requirements on a conserved quantity and there is intuitive

reason to think that it is indeed conserved across causal interaction.

One potential worry is that the value of K is language-dependent. As we are measuring


K as the number of symbols in the string, evidently its length will depend on the

vocabulary of our encoding. This could be used to show that complexity is not, after all,

4
It is true that for finite strings the additivity rule may not be met, this is
because short strings with structure may not be compressible if the size of their
algorithm is large. This difference dissipates as their size increases. So assuming
S1 and S2 are relatively large strings, we can assume that additivity is met.

21
a conserved quantity. Imagine that I use one language L1 to describe all the properties

of the wrench before some particular time t. However, after t, I describe it using a
different language, that of L2. Since K is language-dependent this means that its

complexity will not be conserved and that therefore the amount of information carried
by the causal process is also not conserved.

There is a solution to this problem the advocate of complexity could appeal to here.
They can exploit a result in AIT known as the 'invariance theorem' (Li & Vintanyi, date):

Invariance Theorem: (S)  KU1(S) — KU2 (S)  c

This states that for all strings the difference in their complexities equals a constant c

whose value depends only on the computational resources required to translate from
one coding language to another. If the strings are themselves long relative to a

translation program, then the difference becomes minimal. In the limit, as the sizes of S
tend towards infinity, it is irrelevant.

In reality we are not dealing with strings of infinite size and so the choice of encoding
will have some affect. This could be problematic when it comes to finding evidence of

causal connections. One way to avoid this would be to set as a requirement that all

descriptions of the world be carried out in a particular language L. Provided scientists


continue to use L to describe the world, causal processes will conserve complexity. This

raises the question of who decides L and on what basis. The worry is that our choice will
always be somewhat arbitrary. We could try appealing to a 'natural language' based on

natural kind terms, but as van Fraassen has pointed out (1989, p. 53) regardless of the
success of our laws and theories, we will never be able to know whether or not our

language is one comprised of such terms.

An altogether better solution is to use the value for 'c' to place a restriction on
conservation. Hence the definitions above of causal process and causal interaction hold

22
for a given computing language. When we are using different encodings to describe the

world over the course of an object's world line, then conservation will be maintained
within a range of values less than c. Provided it is clear which scenario is present, I do

not see any difficulties arising from the language-dependency character of K.

3.4. Summary of the Concepts

The above discussion shows that out of the three concepts, measuring the amount of

information in terms of algorithmic complexity seems the least problematic. To be sure


each notion has its own problems and I don't say here that a version of the i-CQV

modified to incorporate ‘knowledge update’ or ‘entropy’ could not be made to work.


Nevertheless, information as algorithmic complexity has less internal problems as a

theory of causation in its own right and therefore offers the most potential for tracking
reliable causal inferences.

4. Searching for Causes in Big Data: The Case of Exposomics

How might the i-CQV interpreted in terms of algorithmic complexity be used to find
evidence of causal connections in practice? To specifically highlight its possible role in

data-intensive science, I will attempt to answer this question against the backdrop of

exposomics research. The reasons for choosing this field are threefold: (1) exposomics
research is currently one of the largest scientific studies incorporating Big Data. As it is

contemporary and ongoing it provides fresh information about the methodology of


data studies not tainted by historical reconstructions of the process; (2) it has already

been discussed in length, particularly in Russo and Williamson (2012) that this field
utilises evidence of both difference-making and production when asserting causal

connections; and (3) scientists engaged in this project have expressed their interest in
finding processes that run from exposure conditions to the onset of a disease (Canali,
2015).

23
Exposomics research focuses on the search for ‘biomarkers’: these are factors both

internal and external to the agent which might be connected to the onset of a disease.
In this respect, it can be seen as a combination of traditional epidemiology (which

studies external factors) and genomics (which studies the interplay between internal
factors such as gene expression and protein activities in the cell). In exposomics, data is

collected from many different sources that might include the lifestyle of an individual,
location, age, pollution exposure, family history, genetic composition, ongoing illnesses

etc. Programs can then be used to analyse this data for correlations, which are then
investigated further by individuals, looking for intermediate biomarkers that suggest

evidence of a causal process for the disease.

This discipline provides an ideal case to articulate how productive causes might be

searched for in Big Data. Recall that the i-CQV interpreted via algorithmic complexity
says there is a causal interaction between two processes if their sum total values of K are

equivalent. This suggests the following rule of causal inference:

K-Rule: Track values for biomarkers along the route of two (or more causal
processes) A and B. Encode the description of each process so the value of each

equals (K), the algorithmic complexity. For each value K1 and K2, if the value for
K1+K2 before interaction equals the value of K1+K2 after interaction, then there

is evidence of a productive cause linking A and B.

The K-Rule only uses the data given and does not require any interventions. For that
reason it seems suitable to data-intensive science that only uses observational data.

Likewise comparing the size of data is a routine task for computers and so searching for
causes in this way appears to be something that could be automated.

I reality, however, the K-Rule will never be able to be followed. The reason for this is that
for a given data structure such as a string of symbols S, K is non-computable. One

24
cannot define a program which, when given S, outputs its value for K. Given any string

we will never know if our best compression of it actually is the best compression
possible. What this shows is that scientists using this method can only at best

approximate the K-Rule, given the algorithms they currently have for compressing their
data. This suggests the formulation of a weaker version of the K-Rule that is more

attainable in practice:

Best Compression-Rule: Track values for biomarkers along the route of two (or

more causal processes) A and B. Encode the description of each process using the
best compression algorithms available C1 and C2. If the length of the best

compression available of A and B before interaction (using C1 and C2) is the same
as the length of the compression achieved by applying C1 and C2 to A and B

after interaction, then there is evidence of a productive cause linking A and B.

Although this inference-rule has a rather cumbersome formula its underlying logic is
simple. It assumes that the amount of compression achieved by using a particular

algorithm is the same before as well as after interaction. The compressibility of the data
(with respect to our current algorithms) is invariant along the causal process. In the

situation where K is not known, this is the second-best option.

Again, this inference-rule is compatible with our understanding of data-intensive science


as involving observational data and automated analysis. Human scientists will need to

supply the necessary algorithm as this will be a creative activity. However once known, it
is a routine computational procedure to apply that algorithm in compressing sets of

data and measuring the resulting length. If this kind of 'compression invariance' is found
between the data, what can the scientist conclude? At most that there is evidence of a

causal process. Just as with difference-making conclusions, one needs to be cautious:


the equivalence may just come down to a coincidence. Nonetheless, when this result is
added alongside difference-making evidence such as statistical dependency, it adds

25
further support to the claim that causality is really present.

Although this shows how in potential evidence of productive causes can be found in Big
Data, it would be foolish to conclude that this shows traditional methods in science are

now obsolete. For a start, I agree with Russo and Williamson (2012), that evidence of
both kinds of productive causeprocesses as well as mechanismsare needed to

establish causal claims. Indeed, it is doubtful whether causal processes established as


the conservation of information, as envisioned by the i-CQV, could ever be explanatory.

One might find there is an informational equivalence between certain biomarkers and
the outcome of a disease, and yet this is far short of explaining why those conditions

gives rise to the disease. Rather, what it suggests is a program of further, non-
automated research, by the scientist to look for complex mechanisms connecting the

two. This depends on the scientist’s background knowledge of similar systems and the
creative design of experiments to test hypotheses about which mechanisms might be

responsible.

References
Anderson, C., 2008. The End of Theory: The Data Deluge Makes the Scientific Method

Obsolete. Wired Magazine, 23 June.

Aronson, J., 1971. On the Grammar of Cause. Synthese, Volume 22, pp. 414-30.

Boniolo, G., Faraldo, R. & Saggion, A., 2011. Explicating the notion of Causation: The role

of extensive quantities. In: P. Illari, F. Russo & J. Williamson, eds. Causality in the Sciences.
Oxford: Oxford University Press, pp. 503-525.

Canali, S., 2015. Big Data, Epistemology and Causality: Knowledge in and Knowledge out

in EXPOsOMICS. Under Review.

26
Chaitin, G., 1966. On the Length of Programs for Computing Finite Binary sequences.

Journal of the ACM, 13(4), pp. 547-569.

Clarke, B. et al., 2013. The Evidence that Evidence-Based Medicine Omits. Preventive

Medicine, 57(6), pp. 745-747.

Clarke, B. et al., 2014. Mechanisms and the Evidence Hierarchy. Topoi, 33(2), pp. 339-360.

Collier, J., 1999. Causation is the Transfer of Information. Australasian Studies in History
and Philosophy of Science , Volume 14, pp. 215-245.

Collier, J., 2010. Information, Causation and Computation. In: G. Crnkovic & M. Burgin,
eds. Information and Computation: Essays on Scientific and Philosophical Understanding

of Foundations of Information and Computation. London: World Scientific, pp. 89-106.

Dowe, P., 2000. Physical Causation. Cambridge: Cambridge University Press.

Earman, J., 1984. Laws of Nature: The Empiricist Challenge. In: R. J. Bogdan, ed. D. M.
Armstrong. Dordrecht: D. Reidel Publishing Company, pp. 191-223.

Fair, D., 1979. Cuasation and the Flow of Energy. Erkenntnis, Volume 14, pp. 219-250.

Floridi, L., 2010. Information: A Very Short Introduction. Oxford: Oxford University Press.

Gillies, D., 2000. Philosophical Theories of Probability. London: Routledge.

Ginsberg, J., 2009. Detecting influenza epidemics using search engine query data.
Nature, Issue 457, pp. 1012-1014.

Glennan, S., 2002. Rethinking Mechanistic Explanation. Philosophy of Science, 69(1), pp.
342-353.

Guyon, e. a., 2011. Causality Workbench. In: P. Illari, F. Russo & J. Williamson, eds.
Causality in the Sciences. Oxford: Oxford University Press, pp. 543-561.

27
Hall, N., 2004. Two Concepts of Information. In: J. Collins, N. Hall & L. A. Paul, eds.

Causation and Counterfactuals. Cambridge: MIT Press, pp. 198-222.

Hawking, S., 2015. Stephen Hawking says he's solved a black hole mystery, but physicists

await the proof. [Online]


Available at: http://phys.org/news/2015-08-stephen-hawking-black-hole-mystery.html

[Accessed 04 10 2015].

Howson, C. & Urbach, P., 1993. Scientific Reasoning. Chicago: Open Court.

Illari, P., 2011. Why theories of causality need production: an information-transmission


account. Philosophy & Technology, 24(2), pp. 95-114.

Illari, P. & Russo, F., 2014. Causality: Philosophical Theory Meets Scientific Practice.
Oxford: Oxford University Press.

Illari, P., Russo, F. & Williamson, J., 2011. Causality in the Sciences. Oxford: Oxford

University Press.

Kitchin, R., 2014. The Data Revolution. London: Sage.

Kolmogorov, A., 1965. Three Approaches to the Definition of the Quantity of

Information. Problems of Information Transmission, 1(1), pp. 1-7.

Lewis, D., 1973. Causation. Journal of Philosophy, Volume 2, pp. 556-567.

Li, M. & Vintanyi, P., 1993. An Introduction to Kolmogorov Complexity and its
Applications. New York: Springer-Verlag.

Machamar, P., Darden, L. & Craver, C., 2000. Thinking about Mechanisms. Philosophy of

Science, 67(1), pp. 1-21.

28
Mayer-Schonberger, V. & Cukier, K., 2013. Big Data: A Revolution that will Transform

how we Live, Work and Think. London: John Murray.

Pierce, J., 1961. An Introduction to Information Theory: Symbols, Signals and Noise. 1980

ed. New York: Dover.

Pietsch, W., 2015. The Causal Nature of Modeling with Big Data. Philosophy and
Technology, Volume doi: 10.1007/s13347-015-2202-2, pp. 1-35.

Popper, K., 1959. The Logic of Scientific Discovery. New York: Basic Books.

Preskill, J., 1992. Do Black Holes Destroy Information?. arXiv: 9209058.

Ramsey, F., 1990. Philosophical Papers. Cambridge: Cambridge University Press.

Ratti, E., 2015. Big Data Biology: Between Eliminative Inferences and Exploratory
Experiments. Philosophy of Science, 82(2), pp. 198-218.

Reichenbach, H., 1956. The Direction of Time. Chicago: University of Chicago Press.

Russo, F. & Williamson, J., 2007. Interpreting Causality in the Health Sciences.
International Studies in the Philosophy of Science, 21(2), pp. 157-170.

Russo, F. & Williamson, J., 2012. EnviroGenomarkers: The Interplay Between Mechanisms

and Difference Making in Establishing Causal Claims. Medicine Studies, 3(4), pp. 249-262.

Salmon, W., 1984. Scientific Explanation and the Causal Structure of the World. Princeton:
Princeton University Press.

Salmon, W., 1994. Causality without Counterfactuals. Philosophy of Science, 61(2), pp.
297-312.

Salmon, W., 1998. Causation and Explanation. Oxford: Oxford University Press.

29
Shannon, C. & Weaver, W., 1949. The Mathematical Theory of Communication. Urbana:

University of Illinois Press.

Solomonoff, R., 1964a. A Formal Theory of Inductive Inference: Part I. Information and

Control, 7(1), pp. 1-22.

Solomonoff, R., 1964b. A Formal Theory of Inductive Inference: Part II. Information and
Control, 7(2), pp. 224-254.

van Fraassen, B., 1989. Laws and Symmetry. Oxford: Clarenden Press.

Vineis, P., Khan, A., Vlaanderen, J. & Vermeulen, R., 2009. The Impact of New Research
Technologies on Our Understanding of Environmental Causes of Disease: The Concept

of Clinical Vulnerability. Environmental Health, 8(54).

30