You are on page 1of 7

MODEL IDENTIFICATION AND DATA ANALYSIS (Automation&Control Engineering), A.Y.

2018/2019
Prof. Sergio M. Savaresi, Prof. Simone Formentin - June 13, 2019

Surname Name University ID Number Signature

................................ ............................... .....................………………….. ....................……………………..


===========================================================================================================
- Write the solutions (including procedures and intermediate steps) in the blank areas (use the back of the page, if needed)
- The number of pages is 7. Additional papers will not be considered.
- Use of books, lecture notes or any other material is forbidden. Electronics (smartphones, calculators, etc.) are forbidden.
- Clarity, order and precision are strongly considered for the final evaluation.
===========================================================================================================
1. [Fundamentals]
Explain why the following problems can or cannot be addressed by Machine Learning (ML) techniques:
1. Partition a set of employees of a large company;
2. Fortune-telling a person information about her/his personal life;
3. Determine the truthfulness of a first order logic formula;
4. Compute the stress on a structure given its physical model;
5. Provide temperature predictions.
In the case the problem can be addressed by ML, provide a suggestion for the technique you would use to solve the
problem.

Solution:
1. The problem of dividing into categories the employee of a company can either be a machine learning problem
(e.g., if we do not know the criterion used to partition) or not (e.g., if we are given the criteria). In the first case we
would use some unsupervised ML techniques (e.g., clustering) by considering for instance the personal data of each
employee.
2. In principle, fortune-telling is not science. If you consider it as a process where a fortune-teller is able to infer
information about a person by looking at her/him, we could use a ML algorithm to determine the important
features to infer information from a person picture (feature selection) and another algorithm able to couple the
appropriate phrase to provide to each person (classification).
3. In the case we have a logical formula and we simply want to derive if it is true or false, it would be unnecessarily
complicate to use a ML algorithm, since it exists a deterministic procedure to provide the answer in a fast way.
4. The computation of the stress is simply the application of the mathematical or physical model to a specific case.
Here, we might resort to analytical solutions, in the case they exists, or techniques in the field of scientific
computing, where an approximation is given by using a discretization scheme.
5. In this case, since we want to predict a continuous value (e.g., temperature), we are considering a regression
problem. One might consider the use of linear models or neural networks, if we take into account the temperature
records of each day as independent from the previous one. If we want to take into account the fact that the
temperature is a time series (there exists correlation among days) we should resort to time series prediction, e.g.,
Kolmogorov-Wiener or Kalman predictors based on identified models.

1
2. [Logistic regression]
Suppose we collect data for a group of workers with the following variables: hours spent working 𝑥𝑥1 and number of
completed projects 𝑥𝑥2 . The workers may receive a bonus or not (represented by the binary variable 𝑦𝑦) depending
on their overall performance. We fit a logistic regression model using the sigmoidal function
1
𝜎𝜎(𝑤𝑤0 + 𝑤𝑤1 𝑥𝑥1 + 𝑤𝑤2 𝑥𝑥2 ) =
1 + 𝑒𝑒 0 1 𝑥𝑥1 +𝑤𝑤2 𝑥𝑥2
𝑤𝑤 +𝑤𝑤
and produce the following estimated coefficients: 𝑤𝑤0 = −6, 𝑤𝑤1 = 0.05, 𝑤𝑤2 = 1.
2.1 Estimate the probability that a worker who worked for 40ℎ and completed 3.5 projects gets an bonus.
2.2 How many hours would that worker need to spend working to have a 50% chance of getting an bonus?
2.3 Do you think that values of 𝑧𝑧 in the logistic function 𝜎𝜎(𝑧𝑧) lower than −6 make sense in this problem? Why?
2.4 Describe the maximum likelihood principle typically used to infer a logistic regression model.

Solution:
2.1 The logistic model provides as output the probability of getting a bonus, thus:
𝑃𝑃(𝑦𝑦 = 1|𝑥𝑥) = 𝜎𝜎(𝑤𝑤0 + 𝑤𝑤1 𝑥𝑥1 + 𝑤𝑤2 𝑥𝑥2 )
where 𝑥𝑥1 = 40 and 𝑥𝑥2 = 3.5. It follows that
1
𝑃𝑃(𝑦𝑦 = 1|𝑥𝑥) = 𝜎𝜎(−6 + 0.05 ∙ 40 + 1 ∙ 3.5) = 𝜎𝜎(−0.5) = = 0.3775
1 + 𝑒𝑒 0.5
2.2 We know that we have 50% chance of getting an bonus when the argument of the sigmoid is equal to zero (in fact,
𝜎𝜎(0) = 0.5), thus we look for 𝑥𝑥�1 such that 𝑤𝑤0 + 𝑤𝑤1 𝑥𝑥�1 + 𝑤𝑤2 𝑥𝑥2 = 0.
By substituting the known parameters and variables,
−6 + 0.05 ∙ 𝑥𝑥�1 + 3.5 = 0 → 𝑥𝑥�1 = 50ℎ
2.3 Since all the considered variables as well as the parameters 𝑤𝑤1 and 𝑤𝑤2 are positive, it makes only sense to consider
values of z greater than −6.
2.4 See Chapter 6 of the lecture notes for the details about maximum likelihood inference.

2
3. [Data pre-processing]
Considering the following dataset.

3.1. Draw the direction of the principal components and say whether the dimensionality of the data could be reduced.
3.2. Apart from PCA, how can data be preprocessed in order to be best prepared for learning?

Solution:
3.1 The computed principal components loadings are:

The variance of the data along the first dimension is larger than that along the second, thus a reduced (1-D) feature
space could be considered, depending on the specific problem.
3.2 See Chapter 11 of the lecture notes about data preprocessing.

3
4. [Kalman filtering]
Consider the following system in state-space form:
𝑥𝑥(𝑡𝑡 + 1) = 3𝑥𝑥(𝑡𝑡) + 𝑤𝑤(𝑡𝑡)

𝑦𝑦(𝑡𝑡) = 𝛼𝛼𝛼𝛼(𝑡𝑡) + 𝑣𝑣(𝑡𝑡)
where 𝛼𝛼 is a real number, 𝑤𝑤~𝑊𝑊𝑊𝑊(0,1) and 𝑣𝑣~𝑊𝑊𝑊𝑊(0,1) are uncorrelated.
4.1 For which values of 𝛼𝛼 does the 1-step ahead predictor converge? Why?
4.2 Write the equations of the steady-state Kalman filter 𝑥𝑥�(𝑡𝑡|𝑡𝑡) for 𝛼𝛼 = 1.
4.3 Compute the steady-state variance of the state filtering error in the above case.

Solution:
4.1 The state dynamics is unstable. It follows that, for 𝛼𝛼 = 0, the estimation error dynamics is also unstable, thus
making the predictor to diverge. For all the other values of the parameters, the system turns out to be observable
and reachable, then the predictor converges for the second theorem of convergence.
4.2 The steady-state variance of the 1-step ahead state prediction error is the positive definite solution of the ARE
associated to the system, i.e.:
𝑃𝑃 = 𝐹𝐹𝐹𝐹𝐹𝐹 ′ + 𝑉𝑉1 − (𝐹𝐹𝐹𝐹𝐻𝐻 ′ )(𝐻𝐻𝐻𝐻𝐻𝐻 ′ + 𝑉𝑉2 )−1 (𝐹𝐹𝐹𝐹𝐻𝐻 ′ )′
where 𝐹𝐹 = 3, 𝐻𝐻 = 1, 𝑉𝑉1 = 1 and 𝑉𝑉2 = 1. Then,
𝑃𝑃 = 9𝑃𝑃 + 1 − (3𝑃𝑃)2 (𝑃𝑃 + 1)−1
which is solved by 𝑃𝑃� ≅ 9.1 (and another, unacceptable, negative solution).
The steady-state 1-step ahead Kalman predictor is given by
𝑥𝑥�(𝑡𝑡 + 1|𝑡𝑡) = 3𝑥𝑥�(𝑡𝑡|𝑡𝑡 − 1) + 𝐾𝐾 � �𝑦𝑦(𝑡𝑡) − 𝑦𝑦�(𝑡𝑡|𝑡𝑡 − 1)�

𝑦𝑦�(𝑡𝑡|𝑡𝑡 − 1) = 𝑥𝑥�(𝑡𝑡|𝑡𝑡 − 1)
where
𝐹𝐹𝑃𝑃�𝐻𝐻 + 𝑉𝑉12 3 ∗ 9.1 + 0
�=
𝐾𝐾 = ≅ 2.7
𝐻𝐻 2 𝑃𝑃� + 𝑉𝑉2 1 ∗ 9.1 + 1
The steady-state Kalman filter is 𝑥𝑥�(𝑡𝑡|𝑡𝑡) = 1/3 ∗ 𝑥𝑥�(𝑡𝑡 + 1|𝑡𝑡).
4.3 The steady-state variance of the state filtering error is
2 2
𝐸𝐸[�𝑥𝑥�(𝑡𝑡|𝑡𝑡)� ] = 𝐸𝐸[�1/3 ∗ 𝑥𝑥�(𝑡𝑡 + 1|𝑡𝑡)� ] = 1/9 ∗ 𝑃𝑃� ≅ 1.01.

4
5. [State-space identification]
𝑥𝑥 (𝑡𝑡 + 1) = 0.5 𝑥𝑥1 (𝑡𝑡) + 𝑢𝑢(𝑡𝑡)
⎧ 1
𝑥𝑥2 (𝑡𝑡 + 1) = 0.5 𝑥𝑥1 (𝑡𝑡)
Given the system in state space representation
⎨𝑥𝑥3 (𝑡𝑡 + 1) = 1.5 𝑥𝑥3 (𝑡𝑡) + 𝑢𝑢(𝑡𝑡)
⎩ 𝑦𝑦(𝑡𝑡) = 𝑥𝑥2 (𝑡𝑡)
• Check observability and reachability of the system and briefly discuss the result

𝐻𝐻 0 1 0
𝑂𝑂 = � 𝐻𝐻𝐻𝐻 � = �1/2 0 0�
𝐻𝐻𝐹𝐹 2 1/4 0 0
1
𝑂𝑂(3, : ) = ∗ 𝑂𝑂(2, : )  the last two rows are not linearly independent  𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝑂𝑂) = 2
2

The system is not observable


1 1/2 1/4
𝑅𝑅 = [𝐺𝐺 𝐹𝐹𝐹𝐹 𝐹𝐹 𝐺𝐺 ] = �0 1/2
2 1/4 �
1 4/3 16/9
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝑅𝑅) = 3 → the system is reachable
• Compute the first 6 samples of the impulse response (𝑡𝑡 = 0,2 … 5)

1/2 0 0 1
𝐹𝐹 = �1/2 0 0 � 𝐺𝐺 = �0� 𝐻𝐻 = [0 1 0] 𝐷𝐷 = 0
0 0 3/2 1
0 𝑡𝑡 = 0
𝜔𝜔(𝑡𝑡) = �
𝐻𝐻𝐹𝐹 𝑡𝑡−1 𝐺𝐺 𝑡𝑡 > 0
𝜔𝜔(0) = 0, 𝜔𝜔(1) = 0, 𝜔𝜔(2) = 1/2, 𝜔𝜔(3) = 1/4 , 𝜔𝜔(4) = 1/8, 𝜔𝜔(5) = 1/16

• Identify the system order from the impulse response and briefly discuss the result

𝐻𝐻1 = [𝜔𝜔(1)] = [0] →→ 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝐻𝐻1 ) = 0


𝜔𝜔(1) 𝜔𝜔(2) 0 1/2
𝐻𝐻2 = � �=� � →→ 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝐻𝐻2 ) = 2
𝜔𝜔(2) 𝜔𝜔(3) 1/2 1/4
𝜔𝜔(1) 𝜔𝜔(2) 𝜔𝜔(3) 0 1/2 1/4
𝐻𝐻3 = �𝜔𝜔(2) 𝜔𝜔(3) 𝜔𝜔(4)� = �1/2 1/4 1/8 �
𝜔𝜔(3) 𝜔𝜔(4) 𝜔𝜔(5) 1/4 1/8 1/16
1
𝐻𝐻3 (3, : ) = ∗ 𝐻𝐻3 (2, : )  the last two rows are not linearly independent  𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝐻𝐻3 ) = 2
2

𝑛𝑛 = 2
Comment:
The identified order is lower than the order of the original system, this means that there is a hidden part of the system.
This is confirmed from the non-reachability of the system.

5
6. [Analysis and prediction of stochastic processes]
Consider the process: 𝑦𝑦(𝑡𝑡) = 𝑦𝑦𝑢𝑢 (𝑡𝑡) + 𝑦𝑦𝑒𝑒 (𝑡𝑡). Where
3
1− 𝑧𝑧 −1
𝑦𝑦𝑒𝑒 = 4
1 𝑒𝑒(𝑡𝑡) 𝑒𝑒(𝑡𝑡)~𝑊𝑊𝑊𝑊(1,1)
1+ 𝑧𝑧 −1
4

1 15
𝑥𝑥1 (𝑡𝑡 + 1) = − 4 𝑥𝑥1 (𝑡𝑡) + 𝑢𝑢(𝑡𝑡)
4
and 𝑦𝑦𝑢𝑢 is the output of the following state space system �𝑥𝑥2 (𝑡𝑡 + 1) = 𝑥𝑥1 (𝑡𝑡) + 𝑢𝑢(𝑡𝑡)
𝑦𝑦𝑢𝑢 (𝑡𝑡) = 𝑥𝑥2 (𝑡𝑡)
𝐵𝐵(𝑧𝑧) 𝐶𝐶(𝑧𝑧)
• Write the process in the form 𝑦𝑦(𝑡𝑡) = 𝐴𝐴(𝑧𝑧) 𝑢𝑢(𝑡𝑡 − 1) + 𝐴𝐴(𝑧𝑧) 𝑒𝑒̃ (𝑡𝑡) + 𝑐𝑐 𝑒𝑒̃ (𝑡𝑡)~𝑊𝑊𝑊𝑊(0,1)

𝑦𝑦𝑢𝑢 = 𝑊𝑊(𝑧𝑧)𝑢𝑢(𝑡𝑡) =
𝑧𝑧 + 1/4 0 −1 15/4 𝑧𝑧+4
𝑊𝑊(𝑧𝑧) = 𝐻𝐻(𝑧𝑧𝑧𝑧 − 𝐹𝐹)−1 𝐺𝐺 + 𝐷𝐷 = [0 1] � � � �=⋯= 2 1
1 𝑧𝑧 1 𝑧𝑧 + 𝑧𝑧
4

𝑧𝑧+4 1+4𝑧𝑧 −1
𝑦𝑦𝑢𝑢 (𝑡𝑡) = 1 𝑢𝑢(𝑡𝑡) = 1 𝑢𝑢(𝑡𝑡 − 1)
𝑧𝑧 2 + 𝑧𝑧 1+ 𝑧𝑧 −1
4 4
3
1+4𝑧𝑧 −1 1− 𝑧𝑧 −1 1
𝑦𝑦(𝑡𝑡) = 1 𝑢𝑢(𝑡𝑡 − 1) + 41 −1 𝑒𝑒̃ (𝑡𝑡) +
1+ 𝑧𝑧 −1
����������� ���������
1+ 𝑧𝑧 5
4 4
𝑦𝑦𝑢𝑢 (𝑡𝑡) 𝑦𝑦𝑒𝑒 (𝑡𝑡)

• Find the 1-step optimal predictor end the associated prediction error variance

3
1+4𝑧𝑧 −1 1− 𝑧𝑧 −1 1
𝑦𝑦(𝑡𝑡) = 1𝑢𝑢(𝑡𝑡 − 1) 41 −1 𝑒𝑒̃ (𝑡𝑡) + ⏟
1+ 𝑧𝑧 −1
�����������������
1+ 𝑧𝑧 5
4 4
𝜇𝜇
𝑦𝑦�(𝑡𝑡)

𝐵𝐵(𝑧𝑧) 𝐶𝐶(𝑧𝑧)−𝐴𝐴(𝑧𝑧) 1+4𝑧𝑧 −1 −1


𝑦𝑦� (𝑡𝑡|𝑡𝑡 − 1) = 𝑢𝑢(𝑡𝑡 − 1) + 𝑦𝑦�(𝑡𝑡) = 3 −1 𝑢𝑢(𝑡𝑡 − 1) + 3 𝑦𝑦�(𝑡𝑡 − 1)
𝐶𝐶(𝑧𝑧) 𝐶𝐶(𝑧𝑧) 1− 𝑧𝑧 1− 𝑧𝑧 −1
4 4

1+4𝑧𝑧 −1 −1
𝑦𝑦�(𝑡𝑡|𝑡𝑡 − 1) − 𝜇𝜇 =
��������� 3 𝑢𝑢(𝑡𝑡 − 1) + 3 (𝑦𝑦(𝑡𝑡)
�� ��� −��
𝜇𝜇)
1− 𝑧𝑧 −1 1− 𝑧𝑧 −1
4 4
𝑦𝑦�� �𝑡𝑡 �𝑡𝑡 − 1� 𝑦𝑦�(𝑡𝑡)

1+4𝑧𝑧 −1 −1 −1
𝑦𝑦�(𝑡𝑡|𝑡𝑡 − 1) = 3 𝑢𝑢(𝑡𝑡 − 1) + 3 𝑦𝑦(𝑡𝑡) + �1 − 3 � 𝜇𝜇
1− 𝑧𝑧 −1 1− 𝑧𝑧 −1 1−
4 4 4

1+4𝑧𝑧 −1 1
𝑦𝑦�(𝑡𝑡|𝑡𝑡 − 1) = 3 𝑢𝑢(𝑡𝑡 − 1) − 3 𝑦𝑦(𝑡𝑡) + 1
1− 𝑧𝑧 −1 1− 𝑧𝑧 −1
4 4

6
7. [Identification of ARMA/ARMAX models]
Consider the stochastic process 𝑦𝑦(𝑡𝑡) generated by the following system:
1 1
𝑆𝑆: 2𝑦𝑦(𝑡𝑡) = 𝑦𝑦(𝑡𝑡 − 1) + 𝑒𝑒(𝑡𝑡 − 1) + 𝑒𝑒(𝑡𝑡 − 2) 𝑒𝑒(𝑡𝑡)~𝑊𝑊𝑊𝑊(0,4)
2 4

And the following model class (characterized by the parameter 𝛼𝛼)


1
𝑀𝑀: 𝑦𝑦(𝑡𝑡) = 𝑦𝑦(𝑡𝑡 − 1) + 𝜉𝜉(𝑡𝑡) + 𝛼𝛼𝛼𝛼(𝑡𝑡 − 1) 𝜉𝜉(𝑡𝑡)~𝑊𝑊𝑊𝑊(0, 𝜆𝜆2 )
2
2
Identify the value 𝛼𝛼 ∗ of parameter 𝛼𝛼 which minimizes 𝐽𝐽(̅ 𝑎𝑎) = 𝐸𝐸 ��𝑦𝑦(𝑡𝑡) − 𝑦𝑦�(𝑡𝑡|𝑡𝑡 − 1)� �.

The system 𝑆𝑆 is:


1 1 1
𝑦𝑦(𝑡𝑡) = 𝑦𝑦(𝑡𝑡 − 1) + 𝑒𝑒(𝑡𝑡 − 1) + 𝑒𝑒(𝑡𝑡 − 2)
2 4 8
Defining
1 1
𝜂𝜂(𝑡𝑡) = 𝑒𝑒(𝑡𝑡 − 1) 𝜂𝜂(𝑡𝑡)~𝑊𝑊𝑊𝑊 �0, �
4 4
The process can be written as:
1 1
𝑦𝑦(𝑡𝑡) = 𝑦𝑦(𝑡𝑡 − 1) + 𝜂𝜂(𝑡𝑡) + 𝜂𝜂(𝑡𝑡 − 1)
2 2
The system belongs to the model class, then the value of 𝛼𝛼 which minimizes 𝐽𝐽 ̅ is
1
𝛼𝛼 =
2

You might also like