Professional Documents
Culture Documents
Try to
avoid these mistakes
The things I learned the hard way as a data scientist
Assaad MOAWAD Following
Jul 20 · 6 min read
The only magical thing about ML is that there is no magic behind. It’s based
on pure logic, math, and of course on some randomness and luck…
Machine learning is magical but NOT magic
Any feature whose value would not actually be available in practice at the
time you’d want to use the model to make a prediction, is a feature that can
introduce leakage to your model.
When the input data you are using to train a machine learning algorithm
happens to have the information you are trying to predict — Daniel
Gutierrez, Ask a Data Scientist: Data Leakage
So be sure that there is no easy way to cheat and 3nd an easy but
meaningless correlation between inputs and outputs.
The lesson is not to assume that your initial idea of model will work, and
you might be mistaken with your assumption, the faster you discover
that, the faster you might try something else without losing much time,
reaching deadlines or paying hefty cloud compute bills before
discovering the truth.
It’s always better to start with simple models that you can understand
and control 3rst, then increase in complexity gradually to check if the
complexity brings any added value or not. Remember that at the end,
the more complex the model gets, the more data will be needed in order
to truly validate that the model didn’t over3t.
Loss functions can be seen as the punishment or reward you want the
ML model to achieve. So if you want the ML model to converge to a
speci3c behavior, you can do so by creating a loss function that rewards
this behavior while punishing the misbehavior for your speci3c problem
and speci3c dataset.
Machine learning is like a child — It needs guidance, through loss functions
. . .