You are on page 1of 11

WEBVTT

1
00:00:00.630 --> 00:00:05.640
Now it's time to talk about some of
the words that people throw around

2
00:00:05.640 --> 00:00:08.800
when they do quantitative modelling and so

3
00:00:08.800 --> 00:00:13.190
being exposed to this vocabulary is
helpful because it will allow you to

4
00:00:13.190 --> 00:00:17.000
describe to other people more
accurately what you're doing and also

5
00:00:17.000 --> 00:00:21.940
when you hear about someone else's model,
you'll have a sense of what's going on.

6
00:00:21.940 --> 00:00:24.800
And so, I'm going to describe
some of these terms here.

7
00:00:25.810 --> 00:00:28.230
Okay, there's a spectrum out there and

8
00:00:28.230 --> 00:00:32.320
most models fit on the spectrum somewhere
between empirical and theoretical.

9
00:00:32.320 --> 00:00:37.650
So an example of a theoretical model
is an option pricing model and

10
00:00:37.650 --> 00:00:42.840
what I mean there is that by a theoretical
model, is that someone has laid down

11
00:00:42.840 --> 00:00:46.590
a set of assumptions, they have written
down some relationships and they really

12
00:00:46.590 --> 00:00:51.250
ask what are the logical consequences
of those assumptions and relationships.
13
00:00:51.250 --> 00:00:55.360
So there could be assumption
that markets are efficient, and

14
00:00:55.360 --> 00:00:59.600
then given that assumption,
there are certain logical consequences and

15
00:00:59.600 --> 00:01:05.900
those logical consequences could be used
for example, to come up with a model for

16
00:01:05.900 --> 00:01:11.640
pricing, a stock option and so
that's an example of a theoretical model.

17
00:01:11.640 --> 00:01:15.650
The other end of the spectrum is a model
that is purely based on data and

18
00:01:15.650 --> 00:01:19.830
that's when I've got a set of
observations and I'm asking myself,

19
00:01:19.830 --> 00:01:25.150
how can I approximate the underlying
process that generated those observations?

20
00:01:25.150 --> 00:01:28.950
And so I start with the data and
then I try to back out

21
00:01:28.950 --> 00:01:32.180
the model as opposed to the theoretical
one where I start with the theory and

22
00:01:32.180 --> 00:01:33.802
look at the consequences of that theory.

23
00:01:33.802 --> 00:01:40.217
So an example of a data driven model might
be a set of customers that I have I have,

24
00:01:40.217 --> 00:01:43.740
I figured out the profitability
of each of those customers and
25
00:01:43.740 --> 00:01:48.470
now I ask myself the question, what
are the essential characteristics that

26
00:01:48.470 --> 00:01:51.870
separate out the profitable
from unprofitable customers?

27
00:01:51.870 --> 00:01:55.820
That would be a useful thing to know,
but my starting point here

28
00:01:55.820 --> 00:02:00.700
is not some grand theory of how the world
works, my starting point is a spreadsheet

29
00:02:00.700 --> 00:02:04.880
full of data, the data being
the profitability of my customers.

30
00:02:04.880 --> 00:02:07.870
There's a set of attributes associated
with those customers, and I'm trying to

31
00:02:07.870 --> 00:02:11.653
figure out which of the attributes are
associated with profitable customers, so

32
00:02:11.653 --> 00:02:16.405
that would be an example of
a totally data driven model.

33
00:02:16.405 --> 00:02:19.400
So, that's essentially the spectrum
where most modellers fit,

34
00:02:19.400 --> 00:02:21.540
somewhere between empirical and
theoretical.

35
00:02:21.540 --> 00:02:24.860
You'll find that there are often
arguments between people

36
00:02:24.860 --> 00:02:26.700
because they lie at different
points on the spectrum.

37
00:02:26.700 --> 00:02:30.980
My own opinion here is that you
really want to be able to take

38
00:02:30.980 --> 00:02:34.230
a piece from both of these approaches.

39
00:02:34.230 --> 00:02:38.130
Additional terms that you will
hear thrown around by people

40
00:02:38.130 --> 00:02:41.730
who are making models
are deterministic and probabilistic.

41
00:02:41.730 --> 00:02:47.630
We're going to look at these two
types of models in other modules,

42
00:02:47.630 --> 00:02:50.690
but just to get started,
what do we mean by deterministic?

43
00:02:50.690 --> 00:02:53.720
Well, essentially given
a fixed set of inputs,

44
00:02:53.720 --> 00:02:56.970
the model's always going to give
the identical or same output.

45
00:02:56.970 --> 00:03:00.950
So here's an example,
you've got $1000, that's the input.

46
00:03:00.950 --> 00:03:06.860
You're going to invest at a 4% annual
compound interest for two years.

47
00:03:06.860 --> 00:03:11.390
After two years, given the way
the money is growing, that $1000 is

48
00:03:11.390 --> 00:03:17.180
always going to turn out to be equal to or
will have grown to $1081.60 and

49
00:03:17.180 --> 00:03:20.794
it's never going to change,
it's totally deterministic.

50
00:03:20.794 --> 00:03:25.040
The same input,
always gives the same output, but

51
00:03:25.040 --> 00:03:29.610
what happens if you took that $1,000 and
rather than putting it in to

52
00:03:29.610 --> 00:03:34.520
an investment that was growing at 4%,
you bought lottery tickets with it?

53
00:03:34.520 --> 00:03:39.360
And, I could say well how much is
this $1,000 going to have grown to

54
00:03:40.660 --> 00:03:45.150
after two years, for example,
after the lottery has happened.

55
00:03:45.150 --> 00:03:48.840
Well, the answer now is,
it fundamentally depends on whether or

56
00:03:48.840 --> 00:03:52.690
not one of those lottery
tickets won the lottery or not.

57
00:03:52.690 --> 00:03:54.910
If none of the tickets won the lottery,

58
00:03:54.910 --> 00:04:00.020
then you're going to get an output
of zero, all the money disappeared.

59
00:04:00.020 --> 00:04:02.950
If one of the lottery tickets was
lucky enough to win the lottery,

60
00:04:02.950 --> 00:04:05.800
you're going to get a very,
very different output.

61
00:04:05.800 --> 00:04:10.080
And so the output of this process or

62
00:04:10.080 --> 00:04:14.800
model is probabilistic,
it's what we call a random variable.

63
00:04:14.800 --> 00:04:17.560
It all depends on whether or
not the lottery was won.

64
00:04:17.560 --> 00:04:21.240
So that's very different from
the deterministic model.

65
00:04:21.240 --> 00:04:28.100
And, the term stochastic is often used
as really a synonym for a probabilistic

66
00:04:28.100 --> 00:04:32.200
model, so you'll see both of those
terms used when there's uncertainty.

67
00:04:32.200 --> 00:04:35.010
And so those are terms deterministic and
probabilistic.

68
00:04:37.692 --> 00:04:39.420
More terms.

69
00:04:39.420 --> 00:04:42.890
The next one is discrete
versus continuous.

70
00:04:42.890 --> 00:04:47.080
Now, the analogy that I'm going to
use here is the idea of a watch.

71
00:04:47.080 --> 00:04:50.250
Now, there are two different
sorts of watches, essentially.

72
00:04:50.250 --> 00:04:54.090
Some watches are digital, and
others are what we call analog.
73
00:04:55.110 --> 00:05:00.160
And so a digital watch only can show
you specific times because it has

74
00:05:00.160 --> 00:05:06.560
the given or finite number of numbers
appearing on the face and so,

75
00:05:06.560 --> 00:05:12.270
there's some inherent resolution beyond
which you can't go in telling the time.

76
00:05:13.840 --> 00:05:18.130
On the other hand,
if you have an analog watch,

77
00:05:18.130 --> 00:05:21.260
that's one where the hands
are physical and

78
00:05:21.260 --> 00:05:27.300
sweep out the time,
then it can pick up any time

79
00:05:27.300 --> 00:05:31.660
possible because the hands have to
go through every single number.

80
00:05:31.660 --> 00:05:33.830
That's the idea of something
that's continuous.

81
00:05:33.830 --> 00:05:36.910
So just as we have digital and analog,

82
00:05:36.910 --> 00:05:42.390
we're going to have in our modelling,
the same concept happening.

83
00:05:42.390 --> 00:05:46.900
In the modelling world,
we would call them discrete or continuous.

84
00:05:46.900 --> 00:05:51.420
So discrete processes are characterized
by jumps in distinct values just like
85
00:05:51.420 --> 00:05:55.620
the digital watch, the numbers
jump from one to another whereas

86
00:05:55.620 --> 00:06:00.920
continuous processes tend to be
much smoother and more formally,

87
00:06:00.920 --> 00:06:05.430
you can get an infinite number of
values happening in any fixed interval.

88
00:06:05.430 --> 00:06:11.260
And so going back to the watch here,
you will see every possible time presented

89
00:06:11.260 --> 00:06:16.390
on the analog watch between say,
12 o'clock and 1 o'clock,

90
00:06:16.390 --> 00:06:23.090
because the hands are going to sweep out
every single time within that period.

91
00:06:23.090 --> 00:06:28.050
And so, some models will be discrete and
others will be continuous and

92
00:06:28.050 --> 00:06:32.560
it's one of the choices that
the modeller gets to make.

93
00:06:33.880 --> 00:06:36.080
Now when you do spreadsheets,

94
00:06:36.080 --> 00:06:39.490
you're typically taking a rather
discrete approach to the world.

95
00:06:40.640 --> 00:06:43.640
When you think of mat
continuous variables,

96
00:06:43.640 --> 00:06:46.240
it tends to be a little more
mathematical in nature.
97
00:06:47.965 --> 00:06:53.370
Final terms that we want to talk
about are static and dynamic models.

98
00:06:53.370 --> 00:06:58.480
So, static models are those that are
really trying to capture a single snapshot

99
00:06:58.480 --> 00:07:01.710
of a business process and so
here's an example of a static model.

100
00:07:01.710 --> 00:07:04.100
Given a website's installed software base,

101
00:07:04.100 --> 00:07:06.980
what are the chances that
it is compromised today?

102
00:07:06.980 --> 00:07:10.860
I'm just trying to make a statement
about this single period in time.

103
00:07:12.460 --> 00:07:17.820
By contrast, a dynamic model is much
more about an evolution of a process and

104
00:07:17.820 --> 00:07:22.980
it's the evolution that is of interest
that we are trying to understand.

105
00:07:22.980 --> 00:07:28.890
And these dynamic models typically capture
a business process moving from state

106
00:07:28.890 --> 00:07:34.560
to state, and we model
the dynamics of those transitions.

107
00:07:34.560 --> 00:07:37.690
So here's an example, what I would
think of as a dynamic model and

108
00:07:37.690 --> 00:07:41.700
this would be a sort of question
that someone who was a public policy
109
00:07:41.700 --> 00:07:44.280
individual would be interested in,
or an economist.

110
00:07:44.280 --> 00:07:50.030
So given a person's participation in a job
training program, how long is it until

111
00:07:50.030 --> 00:07:56.080
he or she finds a job and then once they
find one, how long can they keep it?

112
00:07:56.080 --> 00:08:01.060
And so I'm thinking here that
a person's participation in the labor

113
00:08:01.060 --> 00:08:03.900
market goes through a set of states.

114
00:08:03.900 --> 00:08:07.703
Sometimes they're unemployed, and
sometimes they're employed, and

115
00:08:07.703 --> 00:08:11.211
they can go from employed back to
unemployed again, potentially.

116
00:08:11.211 --> 00:08:16.382
And so we'd be interested in modeling
the transition through these states,

117
00:08:16.382 --> 00:08:20.850
and that would be the idea of a dynamic
as opposed to a static model.

118
00:08:20.850 --> 00:08:23.750
So there's a whole set of
terminology that I've gone through

119
00:08:24.800 --> 00:08:26.480
that is associated with modelling.

120
00:08:26.480 --> 00:08:30.430
That's what I call the lexicon of models
and it's not like you can only have one of
121
00:08:30.430 --> 00:08:35.180
these things going on in terms of the
language, you could clearly have something

122
00:08:35.180 --> 00:08:41.750
like a static probabilistic,
discrete time model.

123
00:08:41.750 --> 00:08:46.346
And we're going to see one of those and
it's termed a mark of chain later on.

124
00:08:46.346 --> 00:08:49.891
So, there's our lexicon.

You might also like