You are on page 1of 35

Mathematical Foundations for Data Science

Ian Wanless (standing in for Nick Wormald)

MAT9004 - Week 1
Topics for week 1

Notation
What is a set?
Notation for sums and products

Functions
What is a function?
Zeroes
Inverse functions
Convexity

Some important functions


Linear functions
Polynomials

2 / 35
Sets

A set is an unordered collection of distinct ‘objects’.

Examples:
{1, 2, 3, 4, 5} is the set of integers 1, . . . , 5.
{a, b, c, d} is the set of letters a, . . . , d.
{{a, b}, {c, d}} is the set of sets {a, b}, {c, d}.

Some important sets:


N is the set of natural numbers {0, 1, 2, . . .}
Z is the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
R is the set of real numbers

R includes all√the numbers on a ‘standard’ number line,


e.g. 5, −3, 14 , 17, π.

3 / 35
We write x ∈ S if x is an element of the set S.
We write x ∈
/ S if x is not an element of S.

Examples:
1 ∈ {1, 2, 3, 4, 5}.
−3 ∈ Z.
1
3 ∈
/ N.

The set {x ∈ S : P(x)} contains all elements of S for which


P(x) is true.

Examples:
{x ∈ N : x is even} = {0, 2, 4, 6, 8, . . .}.
{x ∈ {1, . . . , 6} : x is a prime number} = {2, 3, 5}.
{x ∈ Z : −1 6 x 6 1} = {−1, 0, 1}.

4 / 35
Intervals
For real numbers a and b with a < b:
The set [a, b] contains all real numbers between a and
b (including a, b); [a, b] is called the closed interval
from a to b.
The set (a, b) contains all elements of [a, b] except for
a and b; (a, b) is called the open interval from a to b

Rewritten in the set-notation:


[a, b] = {x ∈ R : a 6 x 6 b}
(a, b) = {x ∈ R : a < x < b}
Similarly, we define:
[a, b) = {x ∈ R : a 6 x < b}
(a, b] = {x ∈ R : a < x 6 b}
[a, ∞) = {x ∈ R : x > a} and (a, ∞) = {x ∈ R : x > a}
(−∞, b) = {x ∈ R : x < b} and (−∞, b] = {x ∈ R : x 6 b}
5 / 35
Test yourself with the following exercise:
Exercise
Are the following statements true or false:
{3, 1, 2} = {1, 2, 3}? True, sets are unordered!
0 ∈ N? Truea
3 3
2 ∈
/ Z? True, 2 is not an integer
3.157 ∈
/ (3, ∞)? False
a ∈ {{a, b}, {c, d}}? Falseb
List all elements of the following sets:
{x ∈ Z : x ∈ {−1, 0, 1.5}}= {−1, 0}
{x ∈ N : x is odd and x 6 5}= {1, 3, 5}
a
This is a convention in this unit; 0 ∈
/ N in some books
b
a ∈ {a, b} and {a, b} ∈ {{a, b}, {c, d}} but a ∈
/ {{a, b}, {c, d}}

6 / 35
Sigma notation
If a and b are integers with a 6 b and f is a function, then
b
X
f (x) = f (a) + f (a + 1) + · · · + f (b).
x=a

Examples.
3 4
X 1 1 1 X
=1+ + . x 2 = 4 + 9 + 16.
x 2 3
x=1 x=2

X
f (x) denotes the sum of f (x) over all x ∈ S.
x∈S

Example. X
x 2 = (−2)2 + 22 = 8
x∈{−2,2}
7 / 35
Product notation

If a and b are integers with a 6 b and f is a function, then


b
Y
f (x) = f (a) × f (a + 1) × · · · × f (b).
x=a

Examples.
4
Y 5
Y
x = 1 × 2 × 3 × 4 = 24, (x 2 − 1) = 8 × 15 × 24.
x=1 x=3

Y
Again, f (x) denotes the product of f (x) over all x ∈ S.
x∈S

8 / 35
Exercise
Rewrite without using Sigma or product notation.
X3
ax = a + a2 + a3
x=1
X
(2x − 5) = (−1) + 5 + 9 = 13
x∈{2,5,7}
5
Y 1 1 1 1
= 4 × 5 = 20
y
y =4
5
X
1 = 1+1+1+1+1=5
x=1

9 / 35
Exercise
Rewrite in Sigma notation
X4
1+3+5+7 = (2x − 1)
x=1
10
X
1 − 2 + 3 − 4 + · · · − 10 = (−1)i+1 i
i=1
Rewrite in product notation
4
1 1 1
Y 1
4 × 9 × 16 =
x2
x=2

10 / 35
Why do we need functions?

One of the most fundamental concepts of mathematics


Find structure in data:
Describe data with a proper function
Make predictions based on the function
Optimization:
Describing the cost/error/benefit depending on some
parameters by a function
Optimizing (minimizing/maximizing) the function for the best
outcome

11 / 35
Functions
Suppose that X and Y are two sets

A function f : X → Y assigns to each x ∈ X exactly one


f (x) ∈ Y .
f : X → Y means f is a function from X to Y
X is called the domain of f
Y is called the codomain of f

Examples.
The function f : R → R, f (x) = x 2 , assigns to each real
number x the square x 2 of the number.
We can also define a function by listing its values:
x 1 2 3
defines a function f : {1, 2, 3} → {1, 2}.
f(x) 1 1 2

12 / 35
Function graphs
The graph of f : X → Y is the set {(x, f (x)) : x ∈ X }.

If f : R → R, we usually plot x (horizontal) and f (x) (vertical) in a


Cartesian coordinate system:

-2 -1 1 2

2
Figure: The graph of f (x) = x

13 / 35
The image f (X ) of a function f is the set {f (x) : x ∈ X }

Images in previous Examples.


x 1 2 3
has image f (X ) = {1, 2}.
f(x) 1 1 2
f (x) = x 2 has image f (X ) = [0, ∞).

We sometimes just write f (x) = · · · without domain/codomain


If we write ‘f (x) = · · · with domain X ’, then f : X → f (X ),
that is the codomain is chosen to be the image.
If it is clear from the context, we do not specify the domain;
typically X = R in these cases.
Example.
f (x) = x 2 means f : X → Y with domain X = R and
codomain Y = [0, ∞)

14 / 35
Zeroes

The zeroes (or roots) of a function f are all points x in its


domain with f (x) = 0

{x ∈ X : f (x) = 0} is the set of all zeroes of f : X → Y


The zeroes are the values of x where f crosses the x-axis
y

Examples. 20

The function in the plot has 10

three zeroes: −3, −1, 1 x


-4 -3 -2 -1 1 2

The function f (x) = x 2 − 1 -10

has two zeroes: −1 and 1


-20

15 / 35
Inverse Functions
f −1 : Y → X is called inverse function of f : X → Y if
f −1 (f (x)) = x and f (f −1 (y )) = y for all x ∈ X and y ∈ Y .

Think of f −1 as a way to ‘undo’ f .


Inverse functions to not always exist.
Inverse functions are unique if the exist.
Be careful: f −1 (x) does not mean f (x)
1
.
−1
To find f , solve the equation f (x) = y for x.
Example.
To find the inverse of f (x) = 2x + 1:
y = 2x + 1,
y − 1 = 2x,
y −1
x= .
2
y −1
Hence f −1 (y ) = 2 .
16 / 35
Bijections
A function f : X → Y is called:
injective (or one-to-one) if for all distinct x1 , x2 ∈ X .
f (x1 ) 6= f (x2 )
surjective if Y = f (X ), that is if for every y ∈ Y there
is an x ∈ X with f (x) = y .
bijective if it is both injective and surjective

Examples.
f (x) = 2x + 1 is bijective
f : R → R with f (x) = x 2 is neither injective nor surjective
f : [0, ∞) → [0, ∞) with f (x) = x 2 is bijective

f has an inverse function if and only if it is bijective

17 / 35
Exercise
Consider the functions f1 : R → R and f2 : R → R below.
What are the zeroes of the functions?
Are the functions injective? Are they surjective?

4 y

20
3
15

10
2
5

x
-3 -2 -1 1 2 3
1
-5

-10

-2 -1 1 2 -15

(a) f1 (x) = 2x (b) f2 (x) = x 3 + x 2 − 5x + 3

18 / 35
Exercise

Consider the function f : [0, ∞) → [0, ∞) with f (x) = 2x 2


Find the inverse function of f
f and f −1 are plotted below. Can you see a relation
between both graphs?

(a) Plot of f (b) Plot of f −1


19 / 35
Convex functions

A function is convex if, for any two points in its plot, the
straight line between both points is entirely above (or touch-
ing) the plot of the function.

Example 1

4 4

3 3

2 2

1 1

-2 -1 1 2 -2 -1 1 2

(a) f (x) = 2x 2 (b) f (x) = 2x

Example 2 A function with a straight line graph is convex.

20 / 35
Concave functions

A function is concave if, for any two points in its plot, the
straight line between both points is entirely below (or touch-
ing) the plot of the function.

Example

0.5 1.0 1.5 2.0

-1

-2

-3

21 / 35
Side note

The formal definition for convexity is the following:

f is convex if, for all x1 , x2 in its domain and a ∈ [0, 1],

f (ax1 + (1 − a)x2 ) 6 af (x1 ) + (1 − a)f (x2 ).

f is concave if, for all x1 , x2 in its domain and a ∈ [0, 1],

f (ax1 + (1 − a)x2 ) > af (x1 ) + (1 − a)f (x2 ).

This side note is only meant to provide more details to people


interested in the topic. You do not have to remember the
definitions above!

22 / 35
Predicting data with functions

250
4

200
3

150
y
y

100
1

50
5 10 15 20 5 10 15 20

x x
16
14

5.0
12

4.5
10
y

y
8

4.0
6

3.5
4

10 15 20 25 10 15 20 25 30 35 40

x x

23 / 35
Linear functions

A linear function is a function f : R → R with

f (x) = mx + b

where m and b are real numbers.


m is called the slope of the function

Examples
f (x) = 21 x − 1
f (x) = −2x + 3
f (x) = 1

24 / 35
Geometric interpretation of m and b
f (x) = mx + b
For each ‘step’ to the right take m steps up (if m > 0)
If m is negative, say m = −q, take q steps down instead
b is the y -coordinate where f intersects with the y -axis.

Figure: f (x) = 12 x − 1

25 / 35
Fitting a linear function through two data points

5
4
3
y

2
1

5 10 15 20

It is often useful to find a linear function that approximates


the data ‘best’
For today, we only discuss how to fit a linear function through
two data points

26 / 35
Example
Finding a linear function passing through (1, 3) and (2, 4):

Option 1.
A linear function is of the form f (x) = mx + b.
Passing through the points means f (1) = 3 and f (2) = 4, i.e.
3 = m × 1 + b,
4 = m × 2 + b.

We will discuss later in the course how to solve such


equations. For now just note that b = 3 − m by the first
equation and b = 4 − 2m by the second. This only works if
4 − 2m = 3 − m
and thus m = 1. Substituting m = 1 in either of the
equations above yields b = 2.
27 / 35
Finding a linear function passing through (1, 3) and (2, 4):

Option 2. Using the following way to derive the slope:

Let (x1 , y1 ) and (x2 , y2 ) be two points with x1 6= x2 . The


slope of the linear function passing through both points is
y2 − y1
.
x2 − x1

To find m: the formula yields


4−3
m= = 1.
2−1
To find b: plugging the first point into y = mx + b yields
3=m×1+b =1+b
(as we found m = 1). Hence b = 2.
Our final formula for the linear function: f (x) = x + 2.
28 / 35
Properties of linear functions

Let f be a linear function (i.e. f (x) = mx + b).

Zeroes? If m 6= 0, f has exactly one zero:


b
mx + b = 0 if and only if x = − .
m
If m = 0, we either have no zeroes (if b 6= 0) or every real x is a
zero (if b = 0).

Bijection? If m 6= 0, f is bijective (find an inverse function!). If


m = 0 the function is not injective and the image f (X ) is {b}.

Convex/Concave? Linear function are both convex and concave

29 / 35
Example: Market of watermelons
n − th year: price Pn , supply Qn .
Experience: When Qn increases, Pn decreases. When Pn
increases, Qn+1 increases.

Mathematical model:
(
Pn = a − bQn ,
Qn+1 = c + dPn .
where a, b, c, d are parameters, to be determined.

Fit the model with the samples


Year 2000 2001 2002
Data: P P1 P2 P3
Q Q1 Q2 Q3
Find a, b, c, d, Solve Pn . Predict the behaviour of Pn when
n → ∞.

30 / 35
Polynomial functions
A polynomial function of degree n is a function with

f (x) = an x n + an−1 x n−1 + · · · + a1 x + a0

where a0 , . . . , an are real numbers and an 6= 0.

Examples
f (x) = x 3 + 2x 2 − x + 21 is a polynomial function of degree 3
f (x) = −0.3 is a polynomial function of degree 0
f (x) = 21 (x − 2)2 + 3 is a polynomial function of degree 2
Linear functions (with m 6= 0) are polynomials of degree 1
Sigma notation. Recall that we can also write
n
X
f (x) = ai x i .
i=0

31 / 35
Approximation with polynomials

In Theory
Suppose we are given n + 1 data points (x0 , y0 ), . . . , (xn , yn ).
We can always try to find a polynomial of degree n that
passes through all data points
This requires solving a system of n + 1 linear equations. We
will learn how to do that in a couple of weeks

In Practice
For very large data sets (large n) this might be unnecessarily
complicated and inefficient
We might permit some “noise” into the model – find simple
functions that are a reasonable approximation of data

32 / 35
How about these data points?

250
200
150
y

100
50

5 10 15 20

Data does not look like a linear function


Might be quadratic?

33 / 35
600
500

20
400

15
y^(1/2)
300
y

200

10
100

5
0

5 10 15 20 25 30 35 5 10 15 20 25 30 35

x x

Finding the ‘best’ approximation with quadratic functions is


possible but complicated
For now only try to find a quadratic function f (x) = ax 2 + c
through two of the data points

34 / 35
Exercise
Find real numbers a and c such that the function
f (x) = ax 2 + c passes through the points (2, 4) and (4, 10)

We want f (2) = 4 and f (4) = 10, i.e.

a × 22 + c = 4,
a × 42 + c = 10

Hence c = 4 − 4a and c = 10 − 16a. The only way this works is if

4 − 4a = 10 − 16a.

Thus a = 1/2. With c = 4 − 4a we get c = 2.

The answer is: f (x) = 12 x 2 + 2.

35 / 35

You might also like