Mathematical Foundations For Data Science: Ian Wanless (Standing in For Nick Wormald)

Mathematical Foundations for Data Science
Ian Wanless (standing in for Nick Wormald)
MAT9004 - Week 1
Topics for week 1
Notation
What is a set?
Notation for sums and products
Functions
What is a function?
Zeroes
Inverse functions
Convexity
Some important functions

Linear functions
Polynomials
2 / 35
Sets
A set is an unordered collection of distinct ‘objects’.
Examples:
{1, 2, 3, 4, 5} is the set of integers 1, . . . , 5.
{a, b, c, d} is the set of letters a, . . . , d.
{{a, b}, {c, d}} is the set of sets {a, b}, {c, d}.
Some important sets:

N is the set of natural numbers {0, 1, 2, . . .}
Z is the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
R is the set of real numbers
R includes all√the numbers on a ‘standard’ number line,

e.g. 5, −3, 14 , 17, π.
3 / 35
We write x ∈ S if x is an element of the set S.
We write x ∈
/ S if x is not an element of S.
Examples:
1 ∈ {1, 2, 3, 4, 5}.
−3 ∈ Z.
1
3 ∈
/ N.
The set {x ∈ S : P(x)} contains all elements of S for which

P(x) is true.
Examples:
{x ∈ N : x is even} = {0, 2, 4, 6, 8, . . .}.
{x ∈ {1, . . . , 6} : x is a prime number} = {2, 3, 5}.
{x ∈ Z : −1 6 x 6 1} = {−1, 0, 1}.
4 / 35
Intervals
For real numbers a and b with a < b:
The set [a, b] contains all real numbers between a and
b (including a, b); [a, b] is called the closed interval
from a to b.
The set (a, b) contains all elements of [a, b] except for
a and b; (a, b) is called the open interval from a to b
Rewritten in the set-notation:

[a, b] = {x ∈ R : a 6 x 6 b}
(a, b) = {x ∈ R : a < x < b}
Similarly, we define:
[a, b) = {x ∈ R : a 6 x < b}
(a, b] = {x ∈ R : a < x 6 b}
[a, ∞) = {x ∈ R : x > a} and (a, ∞) = {x ∈ R : x > a}
(−∞, b) = {x ∈ R : x < b} and (−∞, b] = {x ∈ R : x 6 b}
5 / 35
Test yourself with the following exercise:
Exercise
Are the following statements true or false:
{3, 1, 2} = {1, 2, 3}? True, sets are unordered!
0 ∈ N? Truea
3 3
2 ∈
/ Z? True, 2 is not an integer
3.157 ∈
/ (3, ∞)? False
a ∈ {{a, b}, {c, d}}? Falseb
List all elements of the following sets:
{x ∈ Z : x ∈ {−1, 0, 1.5}}= {−1, 0}
{x ∈ N : x is odd and x 6 5}= {1, 3, 5}
a
This is a convention in this unit; 0 ∈
/ N in some books
b
a ∈ {a, b} and {a, b} ∈ {{a, b}, {c, d}} but a ∈
/ {{a, b}, {c, d}}
6 / 35
Sigma notation
If a and b are integers with a 6 b and f is a function, then
b
X
f (x) = f (a) + f (a + 1) + · · · + f (b).
x=a
Examples.
3 4
X 1 1 1 X
=1+ + . x 2 = 4 + 9 + 16.
x 2 3
x=1 x=2
X
f (x) denotes the sum of f (x) over all x ∈ S.
x∈S
Example. X
x 2 = (−2)2 + 22 = 8
x∈{−2,2}
7 / 35
Product notation
If a and b are integers with a 6 b and f is a function, then

b
Y
f (x) = f (a) × f (a + 1) × · · · × f (b).
x=a
Examples.
4
Y 5
Y
x = 1 × 2 × 3 × 4 = 24, (x 2 − 1) = 8 × 15 × 24.
x=1 x=3
Y
Again, f (x) denotes the product of f (x) over all x ∈ S.
x∈S
8 / 35
Exercise
Rewrite without using Sigma or product notation.
X3
ax = a + a2 + a3
x=1
X
(2x − 5) = (−1) + 5 + 9 = 13
x∈{2,5,7}
5
Y 1 1 1 1
= 4 × 5 = 20
y
y =4
5
X
1 = 1+1+1+1+1=5
x=1
9 / 35
Exercise
Rewrite in Sigma notation
X4
1+3+5+7 = (2x − 1)
x=1
10
X
1 − 2 + 3 − 4 + · · · − 10 = (−1)i+1 i
i=1
Rewrite in product notation
4
1 1 1
Y 1
4 × 9 × 16 =
x2
x=2
10 / 35
Why do we need functions?
One of the most fundamental concepts of mathematics

Find structure in data:
Describe data with a proper function
Make predictions based on the function
Optimization:
Describing the cost/error/benefit depending on some
parameters by a function
Optimizing (minimizing/maximizing) the function for the best
outcome
11 / 35
Functions
Suppose that X and Y are two sets
A function f : X → Y assigns to each x ∈ X exactly one

f (x) ∈ Y .
f : X → Y means f is a function from X to Y
X is called the domain of f
Y is called the codomain of f
Examples.
The function f : R → R, f (x) = x 2 , assigns to each real
number x the square x 2 of the number.
We can also define a function by listing its values:
x 1 2 3
defines a function f : {1, 2, 3} → {1, 2}.
f(x) 1 1 2
12 / 35
Function graphs
The graph of f : X → Y is the set {(x, f (x)) : x ∈ X }.
If f : R → R, we usually plot x (horizontal) and f (x) (vertical) in a

Cartesian coordinate system:
-2 -1 1 2
2
Figure: The graph of f (x) = x
13 / 35
The image f (X ) of a function f is the set {f (x) : x ∈ X }
Images in previous Examples.

x 1 2 3
has image f (X ) = {1, 2}.
f(x) 1 1 2
f (x) = x 2 has image f (X ) = [0, ∞).
We sometimes just write f (x) = · · · without domain/codomain

If we write ‘f (x) = · · · with domain X ’, then f : X → f (X ),
that is the codomain is chosen to be the image.
If it is clear from the context, we do not specify the domain;
typically X = R in these cases.
Example.
f (x) = x 2 means f : X → Y with domain X = R and
codomain Y = [0, ∞)
14 / 35
Zeroes
The zeroes (or roots) of a function f are all points x in its

domain with f (x) = 0
{x ∈ X : f (x) = 0} is the set of all zeroes of f : X → Y

The zeroes are the values of x where f crosses the x-axis
y
Examples. 20
The function in the plot has 10
three zeroes: −3, −1, 1 x

-4 -3 -2 -1 1 2
The function f (x) = x 2 − 1 -10
has two zeroes: −1 and 1

-20
15 / 35
Inverse Functions
f −1 : Y → X is called inverse function of f : X → Y if
f −1 (f (x)) = x and f (f −1 (y )) = y for all x ∈ X and y ∈ Y .
Think of f −1 as a way to ‘undo’ f .

Inverse functions to not always exist.
Inverse functions are unique if the exist.
Be careful: f −1 (x) does not mean f (x)
1
.
−1
To find f , solve the equation f (x) = y for x.
Example.
To find the inverse of f (x) = 2x + 1:
y = 2x + 1,
y − 1 = 2x,
y −1
x= .
2
y −1
Hence f −1 (y ) = 2 .
16 / 35
Bijections
A function f : X → Y is called:
injective (or one-to-one) if for all distinct x1 , x2 ∈ X .
f (x1 ) 6= f (x2 )
surjective if Y = f (X ), that is if for every y ∈ Y there
is an x ∈ X with f (x) = y .
bijective if it is both injective and surjective
Examples.
f (x) = 2x + 1 is bijective
f : R → R with f (x) = x 2 is neither injective nor surjective
f : [0, ∞) → [0, ∞) with f (x) = x 2 is bijective
f has an inverse function if and only if it is bijective
17 / 35
Exercise
Consider the functions f1 : R → R and f2 : R → R below.
What are the zeroes of the functions?
Are the functions injective? Are they surjective?
4 y
20
3
15
10
2
5
x
-3 -2 -1 1 2 3
1
-5
-10
-2 -1 1 2 -15
(a) f1 (x) = 2x (b) f2 (x) = x 3 + x 2 − 5x + 3
18 / 35
Exercise
Consider the function f : [0, ∞) → [0, ∞) with f (x) = 2x 2

Find the inverse function of f
f and f −1 are plotted below. Can you see a relation
between both graphs?
(a) Plot of f (b) Plot of f −1

19 / 35
Convex functions
A function is convex if, for any two points in its plot, the
straight line between both points is entirely above (or touch-
ing) the plot of the function.
Example 1
4 4
3 3
2 2
1 1
-2 -1 1 2 -2 -1 1 2
(a) f (x) = 2x 2 (b) f (x) = 2x
Example 2 A function with a straight line graph is convex.
20 / 35
Concave functions
A function is concave if, for any two points in its plot, the
straight line between both points is entirely below (or touch-
ing) the plot of the function.
Example
0.5 1.0 1.5 2.0
-1
-2
-3
21 / 35
Side note
The formal definition for convexity is the following:
f is convex if, for all x1 , x2 in its domain and a ∈ [0, 1],
f (ax1 + (1 − a)x2 ) 6 af (x1 ) + (1 − a)f (x2 ).
f is concave if, for all x1 , x2 in its domain and a ∈ [0, 1],
f (ax1 + (1 − a)x2 ) > af (x1 ) + (1 − a)f (x2 ).
This side note is only meant to provide more details to people

interested in the topic. You do not have to remember the
definitions above!
22 / 35
Predicting data with functions
250
4
200
3
150
y
y
100
1
50
5 10 15 20 5 10 15 20
x x
16
14
5.0
12
4.5
10
y
y
8
4.0
6
3.5
4
10 15 20 25 10 15 20 25 30 35 40
x x
23 / 35
Linear functions
A linear function is a function f : R → R with
f (x) = mx + b
where m and b are real numbers.

m is called the slope of the function
Examples
f (x) = 21 x − 1
f (x) = −2x + 3
f (x) = 1
24 / 35
Geometric interpretation of m and b
f (x) = mx + b
For each ‘step’ to the right take m steps up (if m > 0)
If m is negative, say m = −q, take q steps down instead
b is the y -coordinate where f intersects with the y -axis.
Figure: f (x) = 12 x − 1
25 / 35
Fitting a linear function through two data points
5
4
3
y
2
1
5 10 15 20
It is often useful to find a linear function that approximates

the data ‘best’
For today, we only discuss how to fit a linear function through
two data points
26 / 35
Example
Finding a linear function passing through (1, 3) and (2, 4):
Option 1.
A linear function is of the form f (x) = mx + b.
Passing through the points means f (1) = 3 and f (2) = 4, i.e.
3 = m × 1 + b,
4 = m × 2 + b.
We will discuss later in the course how to solve such

equations. For now just note that b = 3 − m by the first
equation and b = 4 − 2m by the second. This only works if
4 − 2m = 3 − m
and thus m = 1. Substituting m = 1 in either of the
equations above yields b = 2.
27 / 35
Finding a linear function passing through (1, 3) and (2, 4):
Option 2. Using the following way to derive the slope:
Let (x1 , y1 ) and (x2 , y2 ) be two points with x1 6= x2 . The

slope of the linear function passing through both points is
y2 − y1
.
x2 − x1
To find m: the formula yields

4−3
m= = 1.
2−1
To find b: plugging the first point into y = mx + b yields
3=m×1+b =1+b
(as we found m = 1). Hence b = 2.
Our final formula for the linear function: f (x) = x + 2.
28 / 35
Properties of linear functions
Let f be a linear function (i.e. f (x) = mx + b).
Zeroes? If m 6= 0, f has exactly one zero:

b
mx + b = 0 if and only if x = − .
m
If m = 0, we either have no zeroes (if b 6= 0) or every real x is a
zero (if b = 0).
Bijection? If m 6= 0, f is bijective (find an inverse function!). If

m = 0 the function is not injective and the image f (X ) is {b}.
Convex/Concave? Linear function are both convex and concave
29 / 35
Example: Market of watermelons
n − th year: price Pn , supply Qn .
Experience: When Qn increases, Pn decreases. When Pn
increases, Qn+1 increases.
Mathematical model:
(
Pn = a − bQn ,
Qn+1 = c + dPn .
where a, b, c, d are parameters, to be determined.
Fit the model with the samples

Year 2000 2001 2002
Data: P P1 P2 P3
Q Q1 Q2 Q3
Find a, b, c, d, Solve Pn . Predict the behaviour of Pn when
n → ∞.
30 / 35
Polynomial functions
A polynomial function of degree n is a function with
f (x) = an x n + an−1 x n−1 + · · · + a1 x + a0
where a0 , . . . , an are real numbers and an 6= 0.
Examples
f (x) = x 3 + 2x 2 − x + 21 is a polynomial function of degree 3
f (x) = −0.3 is a polynomial function of degree 0
f (x) = 21 (x − 2)2 + 3 is a polynomial function of degree 2
Linear functions (with m 6= 0) are polynomials of degree 1
Sigma notation. Recall that we can also write
n
X
f (x) = ai x i .
i=0
31 / 35
Approximation with polynomials
In Theory
Suppose we are given n + 1 data points (x0 , y0 ), . . . , (xn , yn ).
We can always try to find a polynomial of degree n that
passes through all data points
This requires solving a system of n + 1 linear equations. We
will learn how to do that in a couple of weeks
In Practice
For very large data sets (large n) this might be unnecessarily
complicated and inefficient
We might permit some “noise” into the model – find simple
functions that are a reasonable approximation of data
32 / 35
How about these data points?
250
200
150
y
100
50
5 10 15 20
Data does not look like a linear function

Might be quadratic?
33 / 35
600
500
20
400
15
y^(1/2)
300
y
200
10
100
5
0
5 10 15 20 25 30 35 5 10 15 20 25 30 35
x x
Finding the ‘best’ approximation with quadratic functions is

possible but complicated
For now only try to find a quadratic function f (x) = ax 2 + c
through two of the data points
34 / 35
Exercise
Find real numbers a and c such that the function
f (x) = ax 2 + c passes through the points (2, 4) and (4, 10)
We want f (2) = 4 and f (4) = 10, i.e.
a × 22 + c = 4,
a × 42 + c = 10
Hence c = 4 − 4a and c = 10 − 16a. The only way this works is if
4 − 4a = 10 − 16a.
Thus a = 1/2. With c = 4 − 4a we get c = 2.
The answer is: f (x) = 12 x 2 + 2.
35 / 35

Mathematical Foundations For Data Science: Ian Wanless (Standing in For Nick Wormald)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematical Foundations For Data Science: Ian Wanless (Standing in For Nick Wormald)

Uploaded by

Copyright:

Available Formats

Mathematical Foundations for Data Science

Ian Wanless (standing in for Nick Wormald)

Some important functions

A set is an unordered collection of distinct ‘objects’.

Some important sets:

R includes all√the numbers on a ‘standard’ number line,

The set {x ∈ S : P(x)} contains all elements of S for which

Rewritten in the set-notation:

If a and b are integers with a 6 b and f is a function, then

One of the most fundamental concepts of mathematics

A function f : X → Y assigns to each x ∈ X exactly one

If f : R → R, we usually plot x (horizontal) and f (x) (vertical) in a

Images in previous Examples.

We sometimes just write f (x) = · · · without domain/codomain

The zeroes (or roots) of a function f are all points x in its

{x ∈ X : f (x) = 0} is the set of all zeroes of f : X → Y

The function in the plot has 10

three zeroes: −3, −1, 1 x

The function f (x) = x 2 − 1 -10

has two zeroes: −1 and 1

Think of f −1 as a way to ‘undo’ f .

f has an inverse function if and only if it is bijective

(a) f1 (x) = 2x (b) f2 (x) = x 3 + x 2 − 5x + 3

Consider the function f : [0, ∞) → [0, ∞) with f (x) = 2x 2

(a) Plot of f (b) Plot of f −1

(a) f (x) = 2x 2 (b) f (x) = 2x

Example 2 A function with a straight line graph is convex.

0.5 1.0 1.5 2.0

The formal definition for convexity is the following:

f is convex if, for all x1 , x2 in its domain and a ∈ [0, 1],

f (ax1 + (1 − a)x2 ) 6 af (x1 ) + (1 − a)f (x2 ).

f is concave if, for all x1 , x2 in its domain and a ∈ [0, 1],

f (ax1 + (1 − a)x2 ) > af (x1 ) + (1 − a)f (x2 ).

This side note is only meant to provide more details to people

A linear function is a function f : R → R with

where m and b are real numbers.

It is often useful to find a linear function that approximates

We will discuss later in the course how to solve such

Option 2. Using the following way to derive the slope:

Let (x1 , y1 ) and (x2 , y2 ) be two points with x1 6= x2 . The

To find m: the formula yields

Let f be a linear function (i.e. f (x) = mx + b).

Zeroes? If m 6= 0, f has exactly one zero:

Bijection? If m 6= 0, f is bijective (find an inverse function!). If

Convex/Concave? Linear function are both convex and concave

Fit the model with the samples

f (x) = an x n + an−1 x n−1 + · · · + a1 x + a0

where a0 , . . . , an are real numbers and an 6= 0.

Data does not look like a linear function

Finding the ‘best’ approximation with quadratic functions is

We want f (2) = 4 and f (4) = 10, i.e.

Hence c = 4 − 4a and c = 10 − 16a. The only way this works is if

Thus a = 1/2. With c = 4 − 4a we get c = 2.

The answer is: f (x) = 12 x 2 + 2.

You might also like