You are on page 1of 9

Data Science Online Training

Introduction to Data Science

Data Science is a process of extracting knowledge from
data. Data science is emerging to meet the challenges
of processing large data sets which require versatile
skill set and specialized in specific domain. Data
scientist analyse the complex problems and ensure rich
consistency of data sets with creating visualizations to
aid in understanding data. Data science online training
is designed to teach the techniques of data mining and
gain knowledge on insight of visualization and
optimization of data to become a successful Data

Course Curriculum
Unit-1: Introduction to Data Science
Topics- Introduction to Big Data, Roles played by a Data Scientist, Analysing
Big Data using Hadoop and R, Methodologies used for analysis, the
Architecture and Methodologies used to solve the Big Data problems, For
example, Data Acquisition from various sources, Data preparation, Data
transformation using Map Reduce (RMR), Application of Machine Learning
Techniques, Data Visualization etc., problem statement of few data science
problems which we shall solve during the course
Unit-2: Basic Data Manipulation using R
Topics- Understanding vectors in R, Reading Data, Combining Data,
subsetting data, sorting data and some basic data generation functions
Unit-3: Machine Learning Techniques Using R Part-1
Topics- Machine Learning Overview, ML Common Use Cases, Understanding
Supervised and Unsupervised Learning Techniques, Clustering, Similarity
Metrics, Distance Measure Types: Euclidean, Cosine Measures, Creating
predictive models

Unit-4: Machine Learning Techniques Using R Part-2

Topics- Understanding K-Means Clustering, Understanding TF-IDF and Cosine
Similarity and their application to Vector Space Model, Implementing
Association rule mining in R.
Unit-5: Machine Learning Techniques Using R Part-3
Topics- Understanding Process flow of Supervised Learning Techniques,
Decision Tree Classifier, How to build Decision trees, Random Forest Classifier,
What is Random Forests, Features of Random Forest, Out of Box Error
Estimate and Variable Importance, Naive Bayes Classifier
Unit-6: Introduction to Hadoop Architecture
Topics- Hadoop Architecture, Common Hadoop commands, MapReduce and
Data loading techniques (Directly in R and in Hadoop using SQOOP, FLUME,
and other Data Loading Techniques), Removing anomalies from the data
Unit-7: Integrating R with Hadoop
Topics- Integrating R with Hadoop using RHadoop and RMR package,
Exploring RHIPE (R Hadoop Integrated Programming Environment), Writing
MapReduce Jobs in R and executing them on Hadoop

Unit-8: Mahout Introduction and Algorithm Implementation

Topics- Implementing Machine Learning Algorithms on larger Data Sets
with Apache Mahout

Unit-9: Additional Mahout Algorithms and Parallel Processing using R

Topics- Implementation of different Mahout algorithms, Random Forest
Classifier with parallel processing Library in R
Unit-10: Project
Topics- Project Discussion, Problem Statement and Analysis, Various
approaches to solve a Data Science Problem, Pros and Cons of different
approaches and algorithms

Our Data Science Online Training

batches starts every day.

You can attend a DEMO for free

We Provide Online Training On

TIBCO Spotfire
SAP Hybris
Oracle DBA
Oracle SOA
Oracle Financials
IOS Development
Data Modeling- Erwin
Performance Testing
SAP Hana

We offers You
1. Interactive Learning at Learners convenience
2. Industry Savvy Trainers
3.Real Time" Practical scenarios
4. Learn Right from Your Place
5. Customized Course Curriculum
6. 24/7 Server Access

7. Support after Training and Certification Guidance

8. Resume Preparation and Interview assistance
9. Recorded version of sessions

Thank you
Your feedback is highly important to improve our course material.

For Free Demo Please Contact

INDIA: +91-9246333245,
US: +1-2013780518,
Email id: