You are on page 1of 1

Data scientist is a person employed to analyse and interpret complex digital data, such as the

usage statistics of a website, especially in order to assist a business in its decision-making. Many
data scientists began their careers as statisticians or data analysts. But as big data (and big data
storage and processing technologies such as Hadoop) began to grow and evolve, those roles
evolved as well. Data is no longer just an afterthought for IT to handle. It’s key information that
requires analysis, creative curiosity and a knack for translating high-tech ideas into new ways to
turn a profit.
There is a few things that data scientist do such as collecting large amounts of unruly data and
transforming it into a more usable format. Besides, staying on top of analytical techniques such
as machine learning, deep learning and text analytics. Furthermore, solving business-related
problems using data-driven techniques. For me,the most interesting data scientist do is looking
for order and patterns in data, as well as spotting trends that can help a business’s bottom line.
To be a data scientist, the qualities of a persons needed is solid understanding of statistics and
machine learning. Besides, know about Hadoop MapReduce an also databases such as MySQL.
Moreover, know how to coding languages such as SAS, R or Python.
R and Python are both open-source languages used in a wide range of data analysis fields. Their
main difference is that R has traditionally been geared towards statistical analysis, while Python
is more generalist.
Parameter R Python
Objective Data analysis and statistics Deployment and production
Primary Users Scholar and R&D Programmers and developers
Task Easy to get primary results Good to deploy algorithm
Database size Handle huge size Handle huge size
IDE Rstudio Spyder, Ipthon Notebook
Slow High Learning curve
Disadvantages Not as many libraries as R
Dependencies between library
 Jupyter notebook:
 Graphs are made to talk. R
Notebooks help to share
makes it beautiful
data with colleagues
 Large catalog for data
 Mathematical computation
analysis
Advantages  Deployment
 GitHub interface
 Code Readability
 RMarkdown
 Speed
 Shiny
 Function in Python

You might also like