You are on page 1of 2

What is Data Science :

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from many structural and unstructured data.[1][2] Data
science is related to data mining, machine learning and big data.

Data science 25 years ago referred to gathering and cleaning datasets then applying
statistical methods to that data. In 2018, data science has grown to a field that encompasses
data analysis, predictive analytics, data mining, business intelligence, machine learning, and
so much more.
Data science, 'explained in under a minute', looks like this.
You have data. To use this data to inform your decision-making, it needs to be relevant, well-
organized, and preferably digital. Once your data is coherent, you proceed with analyzing it,
creating dashboards and reports to understand your business’s performance better. Then
you set your sights to the future and start generating predictive analytics. With predictive
analytics, you assess potential future scenarios and predict consumer behavior in creative
ways.

Data Science Process:


Obtain:

In this step, you will need to query databases, and this will include a technical skillset
like MySQL to process the data. You may even start out with simple formats like Microsoft Excel to
obtain the data and then, later on, convert it into usable data. If you are using Python or R, they
have specific packages that can directly read data from these platforms into the programmes.

Scrub

In this process, you need to convert the data from one format to another and consolidate
everything into one standardized format across all data. For example, if your data is
collected in CSV files, then you will need to apply SQL queries to these CSV data so that
you will be able to pair it with programming languages like Python or R.

Explore:
In order to achieve that, this process comes into place. First of all, you will need to inspect
the data and all its properties. There are different types of data like numerical data,
categorical data, ordinal and nominal data etc. With that, there are different types of data
characteristics which will require you to handle them differently.
Following that, the next step would be to compute descriptive statistics to to extract
features and test significant variables. Testing significant variables often times is done
with correlation. For example, exploring the correlation of the risk of someone getting
high blood pressure in relations to their height and weight. Do note that some variables
are correlated, but to significant in terms of the model.
Model:

Our purpose of this stage can also include the grouping of data to understand the logic
behind those clusters. For example, you would like to group your e-commerce customers
to understand their behaviour on your website. So this would require you identify groups
of data points with clustering algorithms, using methods like k-means; or make
predictions using regressions like linear or logistic regressions.

Intrepret:

You will need to visualise your findings accordingly, keeping it driven by your business
questions. It is very important to be able to present your findings in such a way that is
useful to your organisation, or else it would be pointless to your stakeholders.
In this process, technical skills only is not sufficient. One very important skill you need is to
be able to tell a very clear and actionable story. If your presentation does not trigger
actions in your audience, it means that your communication was not efficient. Remember
that you will be presenting to an audience with no technical background, so the way you
communicate the message is key.

You might also like