Assignment 1

4 Questions Python
Question 1: Write a function non_increasing_tst(ts, delta) that returns a Boolean value indicating
whether the time series is 'non-increasing', i.e., successive values never increase by more
than a given threshold.
The function takes two arguments,
ts: a time series as a list of integers, containing at least two values.
delta: an integer ≥ 0 such that the positive change between two successive values in the time
series must be greater than delta for the change to be considered as an 'increase'.
The function should return True if there is no pair of consecutive values ts[i] and ts[i+1] in the
time series such that (ts[i+1] – ts[i]) > delta, or False otherwise.
You can assume that the input arguments are correctly formatted.
For example- you are given a time series in the form of a list of integers e.g., the daily Covid
case count leckietopia = [58, 54, 54, 52, 54, 50, 47]
The time series in leckietopia has a small increase from 52 to 54 However, if we tolerate noise of
at least 2 in the differences between successive readings, then this time series could be
considered to be consistently falling or steady, i.e., 'non-increasing'.
Here are some example calls to the function:

4 Questions Python
>>> leckietopia = [58, 54, 54, 52, 54, 50, 47]
>>> print(non_increasing_tst(leckietopia, 0))
False
False
True
>>> print(non_increasing_tst([100, 90, 80, 70, 70, 60], 0))
True
Question 2: We might be interested in testing whether a time series contains a particular 'pattern',
such as two decreases in value in successive values (e.g., the definition of a recession), or
a sequence of alternating increases and decreases (e.g., a steady breathing rhythm in a
patient).
To do this we first need to a way to specify the pattern to search for. We will represent a pattern
as a string, which can contain the characters 'u' for up (or increase), 'd' for down (or
decrease), and 's' for steady (no change). For example, 'dd' represents two successive
decreases in value, 'ud' represents an increase in value followed by a decrease, 'dssu'
represents a decrease in value followed by two steady values followed by an increase.

4 Questions Python
Write a function contains_tst(ts, pattern, delta) that returns whether the given time series matches
the given pattern.
The parameters of this function are as follows:
ts: a time series as a list of integers, containing at least two values
pattern: a string of one or more characters, where the possible characters that can appear in
pattern are 'u', 's' and 'd'
delta: an integer ≥ 0 such that the absolute value of the change between two successive values in
the time series must be less than or equal to delta for the two successive values to be
considered steady ('s')
The function returns True if given pattern matches a sequence of consecutive values in ts and if
not, returns False.
Assumptions:
You can assume that the input arguments are syntactically correct given the description on this
and on the previous slides.
Here are some example calls to the function:
>>> leckietopia = [58, 54, 54, 52, 54, 50, 47]
>>> print(contains_tst(leckietopia, 'sdud', 0))
True
4 Questions Python
>>> print(contains_tst(leckietopia, 'sdud', 2))
False
>>> print(contains_tst(leckietopia, 'sss', 2))
True
>>> print(contains_tst(leckietopia, 'dd', 0))
True
>>> print(contains_tst([1, 2, 1, 8, 1, 2], 'udud', 0))
True
>>> print(contains_tst([1, 2, 1, 8, 1, 2], 'udud', 2))
False
NOTES: DISTANCE BETWEEN TIME SERIES.
Sometimes we want to compare two time series to see how similar they are. Or given a set of
time series, we may want to compare a given example time series to each time series in
the set, and decide which one is most similar. To do this, we need a way of calculating a
distance measure between two time series. If two time series are identical, the 'distance'
between them should be zero. If two time series are very dissimilar, the 'distance'
between them should be large.
We are given two time series ts1 and ts2, both of length n:
ts1 is ts1[0], ts1[1], …, ts1[n-1]

4 Questions Python
ts2 is ts2[0], ts2[1], …, ts2[n-1]
We can compute the 'distance' between ts1 and ts2 as follows.
First, compute a new time series (we’ll call it diff) that is the absolute value of the differences of
the corresponding values from ts1 and ts2
diff[0] = abs(ts1[0] – ts2[0])
diff[1] = abs(ts1[1] – ts2[1])
...
diff[n-1] = abs(ts1[n-1] – ts2[n-1])
Next, compute the mean of diff, i.e., the mean of the absolute values of the differences between
the two time series:
mean
n
4 Questions Python
diff
where this summation notation means
x
4 Questions Python
1
4 Questions Python
Finally, we define the distance between ts1 and ts2 as:
distance
abs
4 Questions Python
mean
Question 3: Write a function closest_tst(ts, ts_list) that given a time series ts, and a list of time
series ts_list, returns the index of the time series in ts_list that has the smallest distance to
ts (i.e., is most similar to ts):

4 Questions Python
The function takes the following arguments,
ts_list: a list of one or more time series, where each time series in ts_list is a list of integers, and
has the same length as ts
The function returns an integer corresponding to index of the time series in ts_list that has the
smallest distance to ts (if there are multiple time series in ts_list that have the equal
smallest distance to ts, then return the index of the first such time series)
Assumptions
You can assume that the input arguments are correctly formatted and all time series have the
same length.
Here are some example calls to your function:
>>> t1 = [10, 10, 10, 10]
>>> t2 = [10, 10, 20, 20]
>>> t3 = [0, 0, 40, 40]
>>> t4 = [100, 100, 100, 100]
>>> t5 = [10, 10, 20, 10]
>>> print(closest_tst(t1, [t3, t2, t1]))
>>> print(closest_tst(t4, [t1, t2, t3]))

4 Questions Python
>>> print(closest_tst(t1, [t2, t5]))
Notes: For 4
The final task is to detect whether a given time series contains an unusual subsequence of values
(called an anomaly). To do this, we will examine every possible subsequence of values of
length w in the time series.
For example, the time series ts = [3, 0, 2, 40, 1] contains 4 subsequences of length w = 2,
specifically [3, 0], [0, 2], [2, 40], [40,1]. Likewise, this time series contains 3
subsequences of length w = 3, being [3, 0, 2], [0, 2, 40], [2, 40, 1].
Our aim is to find whether any of these subsequences is an anomaly, compared to the rest of the
time series. We can do this by calculating the “distance” of each subsequence to every
other subsequence. If the mean distance of a subsequence to all other subsequences
exceeds a given threshold, we consider the subsequence to be an anomaly. In this
question, we will use Euclidean distance as our distance measure between a pair of
subsequences s1 and s2:
distance
1
4 Questions Python
]
4 Questions Python
For example, the distance of subsequence [3, 0] has to subsequence [0, 2] is
3.6
4 Questions Python
Similarly, the distance of subsequence [3, 0] to subsequence [2, 40] is 40, and to subsequence
[40,1] is 37. Thus, the mean distance of subsequence [3, 0] to all other subsequences is
(3.6 + 40 + 37)/3 = 26.9.
Likewise, the mean distance of subsequence [0, 2] to all other subsequences is 27.2; for [2, 40] it
is 44.2; and for [40, 1] it is 43.8. In this sequence, the subsequence [2, 40] has the greatest
distance (44.2) and for [40,1] it is 43.8.
If threshold for deciding whether a subsequence is an anomaly is given as 40, then this
subsequence will be reported as an anomaly, i.e., it has the highest mean distance, and
that mean distance is greater than the given threshold.
Question 4: Write a function anomaly_tst(ts, w, threshold) that given a time series ts, a subsequence
length w, and an anomaly cut-off threshold threshold , returns the index of the subsequence in ts that
has the highest mean distance above the threshold (or -1 if no subsequence has a mean distance above
the threshold). If there is more than one subsequence that has the same mean distance that is the max
and above the threshold, then return the index of the first such time series.
The function takes the following parameters:
w: an integer > 0 that specifies the length of subsequences
threshold: a integer > 0, where the mean distance of a subsequence must be greater than this threshold
if it is to be considered as an anomaly.
Assumptions:
4 Questions Python
You can assume that the input arguments are correctly formatted.
Here are some example calls to your function:
>>> ts = [3, 0, 2, 40, 1]
>>> print(anomaly_tst(ts, 2, 40))
>>> print(anomaly_tst(ts, 2, 100))
-1
>>> print(anomaly_tst([10, 10, 10, 10, 0, 0, 10], 2, 12))
>>> print(anomaly_tst([10, 10, 10, 10, 10, 10, 10], 3, 12))
-1

Assignment 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 1

Uploaded by

Copyright:

Available Formats

4 Questions Python

than a given threshold.

The function takes two arguments,

ts: a time series as a list of integers, containing at least two values.

considered to be consistently falling or steady, i.e., 'non-increasing'.

Here are some example calls to the function:

>>> leckietopia = [58, 54, 54, 52, 54, 50, 47]

>>> print(non_increasing_tst(leckietopia, 0))

>>> print(non_increasing_tst(leckietopia, 1))

>>> print(non_increasing_tst(leckietopia, 2))

>>> print(non_increasing_tst([100, 90, 80, 70, 70, 60], 0))

a sequence of alternating increases and decreases (e.g., a steady breathing rhythm in a

decreases in value, 'ud' represents an increase in value followed by a decrease, 'dssu'

represents a decrease in value followed by two steady values followed by an increase.

the given pattern.

The parameters of this function are as follows:

ts: a time series as a list of integers, containing at least two values

pattern are 'u', 's' and 'd'

considered steady ('s')

not, returns False.

and on the previous slides.

Here are some example calls to the function:

>>> leckietopia = [58, 54, 54, 52, 54, 50, 47]

>>> print(contains_tst(leckietopia, 'sdud', 0))

>>> print(contains_tst(leckietopia, 'sdud', 2))

>>> print(contains_tst(leckietopia, 'sss', 2))

>>> print(contains_tst(leckietopia, 'dd', 0))

>>> print(contains_tst([1, 2, 1, 8, 1, 2], 'udud', 0))

>>> print(contains_tst([1, 2, 1, 8, 1, 2], 'udud', 2))

NOTES: DISTANCE BETWEEN TIME SERIES.

between them should be large.

ts1 is ts1[0], ts1[1], …, ts1[n-1]

ts2 is ts2[0], ts2[1], …, ts2[n-1]

We can compute the 'distance' between ts1 and ts2 as follows.

the corresponding values from ts1 and ts2

diff[0] = abs(ts1[0] – ts2[0])

diff[1] = abs(ts1[1] – ts2[1])

diff[n-1] = abs(ts1[n-1] – ts2[n-1])

the two time series:

where this summation notation means

Finally, we define the distance between ts1 and ts2 as:

ts (i.e., is most similar to ts):

The function takes the following arguments,

ts: a time series as a list of integers, containing at least two values

has the same length as ts

Here are some example calls to your function:

>>> t1 = [10, 10, 10, 10]

>>> t2 = [10, 10, 20, 20]

>>> t3 = [0, 0, 40, 40]

>>> t4 = [100, 100, 100, 100]

>>> t5 = [10, 10, 20, 10]

>>> print(closest_tst(t1, [t3, t2, t1]))

>>> print(closest_tst(t4, [t1, t2, t3]))

>>> print(closest_tst(t1, [t2, t5]))

(called an anomaly). To do this, we will examine every possible subsequence of values of

length w in the time series.

other subsequence. If the mean distance of a subsequence to all other subsequences

exceeds a given threshold, we consider the subsequence to be an anomaly. In this

subsequences s1 and s2:

For example, the distance of subsequence [3, 0] has to subsequence [0, 2] is

(3.6 + 40 + 37)/3 = 26.9.

distance (44.2) and for [40,1] it is 43.8.

that mean distance is greater than the given threshold.

The function takes the following parameters: