You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/335380708

How To Become Data Scientist

Article · August 2019

CITATIONS READ
0 1

1 author:

Vansh Jatana
SRM Institute of Science and Technology
4 PUBLICATIONS   1 CITATION   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Predicting Transparent Conductors View project

Help Navigate Robots View project

All content following this page was uploaded by Vansh Jatana on 24 August 2019.

The user has requested enhancement of the downloaded file.


How To Become Data Scientist
Data science is the study of data, it may be structured or unstructurd.It involve
understanding, extracting values and visualise the data.Various machine learning
algorithms and statistical methods are used for this.It’s the hottest topic of 21st century
and the goal is to predict the information from the existing data.Business intelligence(BI)
is to make analysis and report with data, it's a subset of data science Building predictive
models help market to grow with great acceleration.

The following skills are required to be Data Scientist


1. Data Mining
2. Data Analysis
3. Data Visualisation
4. Statistics
5. Machine learning
6. Programming Language

Data Mining
Data mining is the technique of discovering patterns and extraction of useful
information from the data.The other name of data mining is Knowledge Discovery of
Data (KDD). For accurate model we require more data.

Stages of data mining


1. Data Exploration
This is the first stage of data mining, it consist of collecting data along with cleaning and
transforming according to need of the problem.It can be done automatically as well as
manually. For manual data exploration queries and script in programming languages
can be used.

2. Modeling
Data modeling is to apply the algorithms on the data and the goal is to choose the best
data model based on the problem.Different model on the same data set are applied for
choosing the best.Bagging, Boosting and Meta Learning are some popular techniques
3. Deploying Model
The final stage is the deployment of model which is the best in previous stage.It
is important because the whole study is based on this.Before deployment we
ensure the model is with the least noise

Data Analysis
Data analysis is the process of discovering useful results.Mined and cleaned data goes
to analytic tools where it find patterns.In simpler term its analysis of past or future
data.Data analyst use various techniques for analysing data it can be done manually as
well as automatically. Programming languages and analytic tools like R, python are
used.

Types of data analysis


1. Text Analysis
The analysis which is done on text data is called text analysis.It is a method used for
converting data into important information which can be used in multiple
industries.Sentimental analysis and lexical analysis are the part of text analysis.
Text analysis help us to sort and rank the webpages

2. Predictive Analysis
Predictive analysis is the analysis of the unknown future result. It uses many techniques
from machine learning and artificial intelligence. It combines the statistics with
computational intelligence and result into the expected future values.Fraud detection
and Risk management are some application of the predictive analysis

Data Visualisation
Data visualisation is the technique for visualising the analysed data.Large amount of
data are very difficult to understand, that's why we use data visualisation techniques as
graphs and charts are more easy to understand trends and pattern

Types of Data Visualisation

● Charts 
● Tables 
● Graphs 
● Maps 

 
There are also many data visualisation tools like Qlickviews and FusionCharts
which help us to visualise the data without running any programme. Manual data
visualisation can be done by Python and R.

Statistics
Statistics is the building block of all machine learning algorithms.It help us to get deep
and precise knowledge of data which help us to study about the data. Without statistics,
we can’t do machine learning and data science

Two categories of statistics


1. Descriptive Statistics
It provide information/description about the data. Data is categorised and organised
based on the given parameter. It can be through the numerical value, table or by graphs

2. Inferential ​ Statistics
It predict the output based on the past data. The methods of inferential statistics is
based on estimation of parameters and testing of hypotheses.

Machine learning
Machine learning is a part of data science, the learning are on the data and its by
computational machine. The machine learning algorithms are used for classification,
regression and clustering.

● Regression
It is a technique used to predict the dependent variable in a set of independent
variable.

● Classification
It is a technique used for approximating a mapping function (f) from input variables
(X) to
discrete output variables (y)

● Clustering
It is a technique for dividing the population or data points into a number of groups
such
that data points in the same groups are more similar to other data points in the same
group and dissimilar to the data points in other groups
View publication stats

Programming language
Knowledge of programming language is must for writing the programme to perform
the art data science. There are many languages which we can used. Python are R
are most popular and used language

You might also like