You are on page 1of 4

13 common mistakes aspiring fresher data scientists make and how to avoid them

Article written by Pravnar Dar from Analytics Vidhya


Link: https://www.analyticsvidhya.com/blog/2018/07/13-common-mistakes-aspiring-
fresher-data-scientists-make-how-to-avoid-them/#

Science needs a mix of problem solving, structured thinking, coding and various
technical skills among others to be truly successful.

1) Learning Theoretical Concepts without Applying Them - Pay Attention

So when I was faced with a challenge or problem where I had the chance to apply
all that I had learned, I couldn’t remember half of it! There’s so much to learn –
algorithms, derivations, research papers, etc.

As soon as you learn a concept, head over to Google and find a dataset or problem
where you can use it.
Fill in the gaps as you practice and you will learn a whole lot more!

2) Heading Straight for Machine Learning Techniques without Learning the


Prerequisites

Should get to know how techniques work before you apply them in a problem.
Learning this will help you understand how an algorithm works, what you can do to
fine tune it, and will also help you build on existing techniques.
You have to know advanced calculus and (Linear Algebra, Calculus, Statistics,
Probability).

3) Relying Solely on Certifications and Degrees

Understanding how a data science project lifecycle works, how to design your
model to fit into the existing business framework – these are just some of the
things you will need to know to succeed as a data scientist.
Certifications are valuable, but only when you apply that knowledge outside the
classroom and put it out in the open
Use real-world datasets and whatever analysis you do, make sure you write about
it. Create your own blog, post it on LinkedIn, and ask for feedback from the
community. This shows that you are willing to learn and are flexible enough to ask
for suggestions and work them into your projects.

4) Assuming that what you see in ML Competitions is what Real-Life Jobs are Like

You will almost always have to work with messy and unclean data. The old saying
about spending 70-80% of your time just collecting and cleaning data is true. It’s
the grueling part and you will (most likely) not enjoy but it’s something that
eventually becomes part of a routine.

Also, and we will cover this in more detail in the next point, the simpler model
will win precedence over any complex stacked ensemble model. Accuracy isn’t always
the end goal, and this is one of the most contrasting things you’ll learn on the
job.

Interviewers will want to know how you can optimize your algorithm for impact,
not for the sake of increasing accuracy

5) Focusing on Model Accuracy over Applicability and Interpretability in the Domain

As mentioned above, accuracy isn’t always what the business is after. Sure a
model that predicts loan default with 95% accuracy is good, but if you can’t
explain how the model got there, which features led it there, and what your
thinking was when building the model, your client will reject it

You will rarely, if ever, find a deep neural network being used in commercial
applications. It’s just not possible to explain to the client how a neural network
(let alone a deep one) worked with hidden layers, convolutions layers, etc. The
first preference is, and will always be, on ensuring that we are able to understand
what’s going on underneath the model. If you can’t tell whether age, or number of
family members, or previous credit history went into rejecting a loan application,
how will the business run?

Another key aspect is whether your model will fit within the organization’s
existing framework. Using 10 different types of tools and libraries will fail
spectacularly if the production environment cannot support it. You will have to
redesign and retrain the model from scratch with a simpler approach.

Then add complexity to your model and keep doing this until even you don’t
understand what’s going on beneath. This will teach you when to stop, and why
simple models are always given preference in real-life applications.

6) Using too Many Data Science Terms in your Resume

Your resume is a profile of what you have accomplished and how you did it – not a
list of things to simply jot down. When a recruiter looks at your resume, he/she
wants to understand your background and what all you have accomplished in a neat
and summarized manner. If half the page is filled with vague data science terms
like linear regression, XGBoost, LightGBM, without any explanation, your resume
might not clear the screening round.

The simplest way to eliminate resume clutter is to use bullet points. Only list
the techniques which you have used to accomplish something (could be a project or a
competition). Write a line about how you used it – this helps the recruiter
understand your thinking.

When you’re applying for fresher or entry-level jobs, your resume needs to
reflect what potential impact you can add to the business. You will be applying to
roles in different domains so perhaps having a set template will help – just change
the story to relfect your interest in that particular industry.

7) Giving Tools and Libraries Precedence over the Business Problem

Combining that knowledge with the business problem posed by the domain is where a
true data scientist steps in.

8) Not Spending Enough Time on Exploring and Visualizing the Data

Data visualization is such a wonderful facet of data science, yet a lot of


aspiring data scientists prefer to skim over it and get to the model building
stage. This approach might work out in competitions, but is bound to fail in a real
job. Understanding the data you’re given is the single most important thing you
will do, and your model’s results will reflect that.

By spending time on getting to know the dataset and trying out different charts,
you will gain a deeper knowledge of the challenge or problem you’ve been tasked
with solving. You’d be surprised to know how much insight you can gain just by
doing this! Pattern and trends emerge, stories are told and the best part?
Visualizations are the best way to present your findings to the client.
As a data scientist, you need to be inherently curious. It’s one of the great
things about data science – the more curious you are, the more questions you’ll
ask. This leads to a much better understanding of the data you are given and also
helps solve problems you didn’t know existed in the first place!

Practice! Next time you work on a dataset, spend more time on this step. You will
be stunned at the amount of insight it will generate for you. Ask questions! Ask
your manager, ask domain experts, search for solutions on the internet and if you
don’t find any, ask on social media. So many options!

9) Not Having a Structured Approach to Problem Solving

When you go for a data science interview, you will inevitably be given a case
study, guess estimate and puzzle problem(s). Because of the pressure filled
atmosphere in an interview room and the time constraint, the interviewer looks at
how well you structure your thoughts to arrive at a final result. In many cases,
this can be a deal breaker or deal sealer for getting the job.

10) Trying to Learn Multiple Tools at Once

Pick one tool and stick to it until you have mastery over it. If you’ve already
started learning R, then don’t be tempted by Python (yet). Stick with R, learn it
end-to-end and only then try to incorporate another tool into your skillset. You
will learn more with this approach.

11) Not Studying in a Consistent Manner

This one applies to all data scientists, not just freshers. We have a tendency to
get distracted easily. We study for a period of time (say, a month), then we give
it a break for the next 2 months. Trying to get back into the groove of things
after that is a nightmare. Most of the earlier concepts are forgotten, notes are
lost and it feels like we just wasted the last few months.

I have personally experienced this as well. Due to various things we have going
on, we find excuses and reasons not to get back to studying. But this is eventually
our loss – if data science was as easy as opening a text book and cramming
everything, everyone would be a data scientist today. It demands consistent effort
and learning, something which people don’t appreciate until it’s too late.

Set goals for yourself. Map out a time table and stick it on your wall. Plan how
and what you want to study and set deadlines for yourself. For example, when I
wanted to learn about neural networks, I gave myself a couple of weeks and then
tested what I’d learned by competing in a hackathon.

You have decided to become a data scientist so you should be ready to put in the
hours. If you continually keep finding excuses not to study, this might not be the
field for you.

12) Shying Away from Discussions and Competitions

This is a combination of a few things we’ve seen in the above points. Aspiring
data scientists tend to shy away from posting their analysis online in fear of
being criticized. But if you don’t receive feedback from the community, you will
not grow as a data scientist.

Data science is a field where discussions, ideas and brainstorming is of utter


importance. You cannot sit in a silo and work – you need to collaborate and
understand other data scientists’ perspective. Similarly, people don’t take part in
competitions because they feel they won’t win. This is a wrong mindset! You
participate in these competitions to learn, not to win. Winning is a bonus,
learning is the goal.

13) Not Working on Communication Skills

You can learn all the latest techniques, master multiple tools and make the best
graphs, but if you cannot explain your analysis to your client, you will fail as a
data scientist.

One of the things I find most helpful is explaining data science terms to a non-
technical person. It helps me gauge how well I have articulated the problem. If
you’re working in a small to medium-sized company, find a person in the marketing
or sales department and do this exercise with them. It will help you immensely in
the long term.

There are plenty of free resources available on the internet to get you started
but remember, practice is key when it comes to soft skills. Ensure you start doing
this TODAY.

14) Other links

13 common mistakes aspiting fresher data scientists make and how to avoid them
Article written by Pravnar Dar from Analytics Vidhya
Link: https://www.analyticsvidhya.com/blog/2018/07/13-common-mistakes-aspiring-
fresher-data-scientists-make-how-to-avoid-them/#

Art of structured thinking analyzing


Article written by Kunal Jain from Analytics Vidhya
Link: https://www.analyticsvidhya.com/blog/2013/06/art-structured-thinking-
analyzing/

You might also like