You are on page 1of 6

Data Science & Data Analytics

What is Data ?
Facts and statistics collected together for reference or analysis. Things known or assumed as facts,
making the basis of reasoning or calculation.
What is Data Science?
Data science is an interdisciplinary field about processes and systems to extract knowledge or insights
from data in various forms, either structured or unstructured which is a continuation of some of the data
analysis fields such as statistics, machine learning, data mining, and predictive analytics, similar to
Knowledge Discovery in Databases (KDD).
The data scientist has the ability to handle the crude data using the latest technologies and techniques,
can perform the necessary analysis, and can present the acquired knowledge to his associates in an
informative way.
Programming Languages used for Data Science
S.No

Languages

R - Language

Industrial
Usage %

60.9 %

Where ? (Industries)

Productive Industry
Performance Monitoring
Enterprise Deployment
Tech Support - CRM
Maintenance
Training and Consulting
Solutions

How ? (Basic Usage)

All Kind of Quantitative


Analysis - data import
and cleaning, exploration
and visualization

Python

35.8 %

SQL

12.4 %

Java & Java Script

8.8 %

Yahoo Groups
Google
Zope Corp.,
Ultraseek
Gaming Industries etc.,

Banking & Finance


Health Care etc.,

Banking & Finance


Health Care etc.,

It is Embedding &
Scripting Language
For various testing /
building /
deployment /
monitoring
frameworks,
building scripts,
system monitoring
and logging tools
etc.,
Data Entry and Data
Mapping
(Manipulate data and
produce reports)
Java-based
frameworks,

statistical modeling
5

Unix - Shell
Script/AWK/SED

Pig Latin /Hive/Other


Hadoop Based
Languages

Matlab

8.5 %

8.5 %

Banking & Finance


Health Care etc.,

Banking and Securities


Communications, Media
and Entertainment
Healthcare Providers
Education
Manufacturing and Natural
Resources
Government
Insurance
Retail and Whole sale trade
Transportation
Energy and Utilities etc.,

6.3 %

Scala

5.9 %

GO (Golang) - Google
9

5.2 %

Modeling and
Scripting of data

To create data
models, Data
management
frameworks (HDFS)
for large volume of
data, Statistics
models preparations
using map reduce
etc.,

Multi-type data support sensor, image, video,


Financial data servers
telemetry, binary, and
Live and historical market
other real-time formats.
data analysis
Mainly Focuses on Electronics Product
Machine learning,
manufacturing
neural networks,
statistics.

Real time Applications

Object oriented and


functional
programming
languages

Finance and Insurance


Stock exchange

statistical computing but


has gained mainstream
presence for data
programming because
of its speed and
familiarity

What is Data Analytics?


Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions
about that information. Data analytics is used in many industries to allow companies and organization
to make better business decisions and in the sciences to verify or disprove existing models or theories.
Data Analysis is the process of systematically applying statistical and/or logical techniques to describe
and illustrate, condense and recap, and evaluate data.
Programming Languages / Tools used for Data Analytics
S.No

Languages

SAS

SPSS

MATLAB

GNU Octave

Ruby

Specialization
Basic procedures and data management
Statistical analysis
Graphics and presentation
Econometrics and Time Series Analysis
Quality control
Clinical trial analysis etc.,
o Business intelligence
o Data management
o Predictive analytics
o Multivariate analysis
Survey authoring and deployment (IBM SPSS
Data Collection)
data mining (IBM SPSS Modeler)
Text analytics
collaboration and deployment (batch and
automated scoring services).
Multi-type data support - sensor, image, video,
telemetry, binary, and other real-time formats.
Mainly Focuses on - Machine learning, neural
networks, statistics.
Primarily intended for numerical computations.
Octave helps in solving linear and nonlinear
problems numerically
Performing other numerical experiments using a
language that is mostly compatible with
MATLAB.
Supports multiple programming paradigms,
including functional, object-oriented, and
imperative. It also has a dynamic type system

Scala

Julia

R - Language

Python

10

SQL

11

Java Scripting

12

C/C++

and automatic memory management.


Java based Programming language and Support
for functional programming and a very strong
static type system.
High-level dynamic programming language
designed to address the requirements of highperformance numerical and scientific computing
while also being effective for general-purpose
programming.
All Kind of Quantitative Analysis - data import
and cleaning, exploration and visualization
It is Embedding & Scripting Language
For various testing / building / deployment /
monitoring frameworks, building scripts, system
monitoring and logging tools etc.,
Data Entry and Data Mapping (Manipulate data
and produce reports)
Java-based frameworks, statistical modeling
Object Oriented frameworks, statistical
modeling

13

14

Perl

Pig, Hive, or other Hadoop-based


languages

high-level, general-purpose, interpreted,


dynamic programming languages.
The Perl languages borrow features from other
programming languages including C, shell
script (sh), AWK, and sed. They provide
powerful text processing facilities without the
arbitrary data-length limits of many
contemporary Unix command line tools,
facilitating easy manipulation of text files.
To create data models, Data management
frameworks (HDFS) for large volume of data,
Statistics models preparations using map
reduce etc.,

DATA STORAGE AND DATA RETRIEVAL


What is data storage and data retrieval?
Data Storage: storage or memory is a technology consisting of computer components and recording
media used to retain digital data. It is a core function and fundamental component of computers.
Data Retrieval:
Data retrieval means obtaining data from a database management system such as ODBMS.

In order to retrieve the desired data the user present a set of criteria by a query. Then the
Database Management System (DBMS), software for managing databases, selects the
demanded data from the database. The retrieved data may be stored in a file, printed, or viewed
on the screen.

A query language, such as Structured Query Language (SQL), is used to prepare the queries.

Platform for Data Storage and data retrieval


S.No

Tools

Platform
Scope : Object Oriented Python - Basic Structured
Query Language - Data Models and Relational SQL Many-to-Many Relationships in SQL - Databases and
Visualization

Python

Relational databases

Detail: Structured Query Language (SQL) for basic


database design DB (SQLite3) Data Retrieval
(web crawlers and multi-step data gathering and
visualization processes using python scripting &
XML)
IBM System R
MICRO Relational Database Management System
Oracle RDB
Paradox
Pick
PRTV
QBE

3
4

Hadoop File Systems


Map reduce

Spark

IBM SQL/DS
Sybase SQL Server
Scope : Hive Hbase for Real time data storage and
retrieval SQL on Hadoop
Map reduce Programming using R / Python / Java
HDFS storage MapReduce Design Pattern
It is an advanced version of Map Reduce - Resilient
Distributed Dataset and DataFrames - Spark
application programming (Spark Shell / PySpark
Shell / Java) - Spark libraries Spark configuration,
monitoring and tuning

You might also like