You are on page 1of 2

Machine learning has become in the recent years increasingly popular because it helped engineers

overcome many problems that seemed before unsolvable. And the development of the internet of
things in the next years will provide us with enormous amounts of data that data scientists can build
upon many applications. The development of machine learning has also been possible thanks to the
improvements in calculation capacities we have witnessed in the recent period.

To solve an engineering problem with a traditional conventional approach, we have to gain as much
knowledge as we can about the questions our problem arises, in order to build a good enough model
that faithfully represents it. The next step is to come up with an efficient algorithm that helps us
obtain the numerical solutions of the problem. But sometimes we can be unable to implement this
method, either because the problem it too complex, preventing us to imagine a model that covers
the properties of the problem in a satisfactory way, or because the best algorithm we can think of is
not efficient enough to give us the numerical solutions we need in a reasonable amount of time.
Osvaldo Simeone refers to these problems as a „model deficit“ and an „algorithm deficit“.

Machine learning can be in both these situations very helpful because it enables us, we normally
don’t need deep understanding of for example the physics underlying the problem as we do when
solving a problem in a conventional way. However, this knowledge can be helpful when we want to
choose the hypothesis class, or to say it differently, the „machines“ we want to train. Osvaldo
Simeone has given in the article the example of „convolutional neural networks „ used in „image
processing“. Another example is the use LSTM-networks to detect anomalies in a signal (obtained for
example from a sensor). But when using Machine Learning, data mining and cleaning is of
fundamental importance because the quality of the results we get from the algorithm is highly
dependent on the quality of the data used.

Machine learning techniques can be divided into three main groups :

- Supervised learning : supervised learning problems consist of trying to find a relationship


between sets of input and output data. We have in this case training data along with labels
can be either continuous (in this case we have a regression problem) or discrete ( in this case
we have a classification problem) . We can make the machine learn the relationship between
the features and the labels. The next step is to make predictions about the labels of a data
set : the test set, that the program has not yet seen, and we calculate the generalization loss
( that measures the difference between the true label and the predicted label). The
generalization loss must be as small as possible. We must pay attention as far as supervised
learning is concerned to the problems of overfitting and underfitting . In the case of
overfitting, the program tries too much to minimize the training loss, to the cost of the
generalization loss. While in the case of underfitting, we have generally a too gib training and
generalization loss either because of the lack of good quality data, or because of a poor
choice of the model.
We can cite as examples of supervised learning problems the prediction of stock market
prices, the estimation of the rent of an apartment, the prediction of the outcome of an
election, classifying emails into spams and normal „safe“ emails …
- Unsupervised learning : when dealing with unsupervised learning problems, we don’t have
labels associated to out input data. The goal of unsupervised learning techniques is to find
patterns within data, cluster them into coherent groups, reducing the dimensionality of the
data and extracting useful and meaningful features . Although it has been more difficult to
come up with a comprehensive formalism for unsupervised learning methods, they have
many useful applications in the real life : for example, for a company to find segmenting their
clients into different clusters that can target differently and according to their needs.
- Reinforcement learning : Reinforcement learning techniques “apply to sequential decision-
making problems in which the learner interacts with an environment by sequentially taking
actions.”

Osvaldo Simeone insists on the fact that not every problem is suitable for a resolution with
machine learning techniques. I agree with him on that. We should make sure that we deal with
either a “model deficit” or an “algorithm deficit” and that problem doesn’t require us to give
explanation for every result we get from the program. Machine learning techniques are a sort of
black boxes, taking input and delivering output but we cannot access in general an explicit
relationship between for example the features and the labels. When using data-driven
approaches, we should keep in mind the limits inherent to these techniques.

Total amount of characters in text (spaces included): 4911

You might also like