You are on page 1of 11

E-guide

5 Data Science
Tools to Consider
E-guide

In this e-guide
In this e-guide:
Data scientists weigh in: 5
Data and analytics provide the fuel for digital transformation
data science tools to consider and disruption.
p. 2 And the only way enterprises can make that fuel high-octane is
if they arm their teams of statisticians, math gurus and
Further Reading
business analytics experts with the right data science tools to
p. 10 squeeze insight out of the ever-growing pools of corporate
data.

But the market just keeps expanding!

That’s why we asked data scientists what tools they're using.

In this brief guide, learn what the top 5 data science tools are
and why they’re so popular right now.

Page 1 of 10
E-guide

In this e-guide
Data scientists weigh in: 5 data science
Data scientists weigh in: 5 tools to consider
data science tools to consider
Ericka Chickowski, Business and Technology Writer, SearchBusinessAnalytics
p. 2
Data and analytics provide the fuel for digital transformation and disruption.
Further Reading And the only way enterprises can make that fuel high-octane is if they arm
their teams of statisticians, math gurus and business analytics experts with
p. 10
the right data science tools to squeeze insight out of the ever-growing pools
of corporate data.

Whether they're for straight statistical analysis, machine learning modeling


or visualizations, a strong set of data science tools is essential for
developing a data-driven business culture.

We recently caught up with a number of experienced data scientists across


a range of industries to ask which tools they use the most. Here are the top
five picks that came up over and over again.

Page 2 of 10
E-guide

In this e-guide
1. Python
Data scientists weigh in: 5 Not so much a distinct piece of software as much as a programmatic means
data science tools to consider for creating custom algorithms, Python is the go-to for many data scientists.
p. 2
In a recent KDnuggets analytics/data science software poll of 2,052 users,
the language was cited as the top tool by 65.6% of respondents.

Further Reading "We use Python both for data science and back end, which provides us with
p. 10
rapid development and machine learning model deployment," said Alexander
Osipenko, lead data scientist at Cindicator Inc. "It's also of great importance
for us to ensure the security of implemented tools."

Katie Malone, who started out as a particle physicist before she moved on to
co-leading the data science research team at Civis Analytics Inc., said
Python was her choice of the data science tools as a physicist, and she's
kept on using it in the business world. For her, one of the big draws is the
strong open source ecosystem surrounding Python, which has led her to a
wide variety of data science libraries to help her solve specific analytical
problems.

"It's just got a really, really vibrant community of open source folks who are
using Python to solve interesting data science problems," she said.

Page 3 of 10
E-guide

Leslie De Jesus, innovation director and lead data scientist at Wovenware,


In this e-guide agreed. She depends on Python libraries quite a bit.

Data scientists weigh in: 5 "[We use] Python Libraries, including Scrapy, for web scraping and being
data science tools to consider able to extract data from the internet and upload it into a data frame for
analysis," De Jesus said. "And [we use] Pandas and NumPy Python Libraries
p. 2
for data analysis and matrix manipulation. They both help to create faster
code, and NumPy allows for complex broadcasting functions."
Further Reading
Niranjan Krishnan, head of data science and innovation at Tiger Analytics,
p. 10
explained that the use cases for Python are pretty multifaceted.

"We have successfully deployed Python data science models for optimizing
direct-to-customer marketing campaigns and life insurance underwriting and
improving real-time bidding for online advertising," Krishnan said.

The drawback, obviously, is that Python is code-based and requires a high


level of programming and analytical skills to use.

"However, Knime and Alteryx are excellent menu-driven, low-code


alternatives that can be used by citizen data scientists and business
analysts, as well," he said.

Page 4 of 10
E-guide

In this e-guide
2. R
Data scientists weigh in: 5 In a similar vein to Python, R is another programming language that many
data science tools to consider data science professionals depend on, though it is a little simpler and more
p. 2
purpose-built for data science. It ranked third in the KDnuggets poll, with
48.5% of the respondents listing it as one of the leading data science tools.

Further Reading Malone from Civis Analytics said R has very sophisticated capabilities for
p. 10
machine learning and statistics, and it's another frequent pick by those on
her team in addition to Python.

"It depends on the context. We're polyglots here, so we like them both," she
said. "R comes a little bit more from the kind of statistics and quantitative
social sciences side."

According to Jon Krohn, chief data scientist at Untapt Inc., R is his go-to tool
for data exploration.

"I can quickly see summary stats like mean, median and quartiles; quickly
create different graphs; and create test data sets, which can be easily
shared and exported to CSV format," he said.

Page 5 of 10
E-guide

In this e-guide
3. Jupyter Notebook
Data scientists weigh in: 5 For the sake of data visualization and data communication, many data
data science tools to consider science teams include Jupyter Notebook on their list of data science tools.
p. 2
"Jupyter Notebook supports R and Python with great library support for data
access and visualizations," said Sofus Macskássy, vice president of data
Further Reading science at HackerRank. "This tool also enables teams to easily export
p. 10
workbooks for presentations and is becoming a standard in the data science
field."

Jupyter's flexibility to use the most popular data science libraries is a perk
for Michael Golub, senior vice president of digital and analytics services at
Anexinet. Golub explained that Jupyter is his team's favorite collaborative
development environment.

"Jupyter Notebook is our go-to for collaborative data science project work
and is also very useful when engaging in endeavors that require education,"
Golub said.

In addition, Untapt's Krohn said Jupyter Notebook is a great tool to


prototype models interactively.

Page 6 of 10
E-guide

"At Untapt, we use Jupyter Notebooks to write prototype code, but also for
In this e-guide printing out tables of data, summary metrics and charts," he said.

Data scientists weigh in: 5


data science tools to consider 4. Tableau
p. 2
For crossing the chasm between hard data science teams and more
business-focused analytics folks, Tableau Software can provide a good
Further Reading bridge.
p. 10
"It is a fantastic tool for data scientists and noobs working on data science,"
said Pooja Pandey, senior executive for SEO at Entersoft Security. "[It's a]
quick dashboarding tool to visualize insights and analytical data with a very
short learning curve."

The speed at which Tableau's visualization and reporting functions can


provide insights to a range of users has drawn praise.

"It's the fastest data visualization tool and business intelligence [tool] in
evolution. It is very quick to implement, easy to learn and very intuitive to
use," said Sophie Miles, CEO of QuotesAdvisor.com. "With Tableau, the
different divisions of the company can customize the exhaustive report
according to their needs."

Page 7 of 10
E-guide

Miles explained that the number of ad hoc requests for data combinations
In this e-guide has decreased because the dashboarding is so flexible. As a result,
QuotesAdvisor.com has seen a 95% increase in efficiency.
Data scientists weigh in: 5
data science tools to consider "People spend much more time doing their thinking job than producing serial
reports," she said.
p. 2

Further Reading 5. Keras


p. 10
According to Wei Lin, chief data scientist for the CTO office at Hitachi
Vantara, his most-used data science tools are Python, R and Keras. Python
and R he uses for all the reasons mentioned above, in addition to Keras for
its deep learning capabilities.

"Keras is an open source neural network library written in Python to enable


fast experimentation with deep neural networks, and [it] is capable of
running on top of TensorFlow, Microsoft Cognitive Toolkit or Theano," Lin
said.

Keras' sweet spot is in high-dimensional pattern matching, he said.

"For example, image and natural language processing and supporting well-
established deep learning analytic models, including convolutional neural
networks and short-term memory," he said.

Page 8 of 10
E-guide

According to Cindicator's Osipenko, the big draw of Keras is that it's a huge
In this e-guide time-saver.

Data scientists weigh in: 5 "The main criterion for adding a new tool is how much it can make your life
data science tools to consider as a data scientist easier. [An] example of this is Keras, an open-source,
high-level wrapper that can dramatically speed up the process of developing
p. 2
neural networks," he said. "Anyone who has written neural networks on
TensorFlow will understand what I'm talking about. And even though Keras
Further Reading is not perfect, it can change the development process and make your code
p. 10
much more readable for other developers."

Further Reading

Page 9 of 10
E-guide

In this e-guide
Further Reading
Data scientists weigh in: 5
data science tools to consider
Tableau vs. Qlik Sense: Pros and cons of the rival
p. 2
analytics tools
Further Reading The core products of Tableau and Qlik are starting to resemble each other
more as the need for strong data visualization and scalability crystalizes. But
p. 10
differences remain. Learn more >>>

Building a data science team in today’s data-centric


climate
Finding and training data scientists to build a data science team can be
challenging. But in a recent webinar, a Gartner analyst offered tips on how to
do it. Keep reading >>>

Images: Fotalia

© 2018 TechTarget. No part of this publication may be transmitted or reproduced in any form or by any means

without written permission from the publisher

Page 10 of 10

You might also like