You are on page 1of 21

COMBINED COURS OF

(Data Science + Artificial intelligence + Machine Learning + Deep Learning + Python + SQL)

Break into the rapidly growing field of data science with Onspace Technologies
For Data Science Professional Program

Fundamentals
1
Al

COURSE DURATION

Total Duration 50 Days

Total Training Hours 75 hours

Training hours per day 1.5 Hour

Training Type Live Online

Course Materials &


Every video of the class will be provided after the class

Batch Starting on 27th Feb 2023


(Classes only on Weekdays, Leave on Public holidays will
be decided by Students and Trainer)
2
Al

ABOUT DATA SCIENCE?


MAKE BETTER DATA-DRIVEN BUSINESS DECISIONS

Today Organizations are struggling to extract the powerful insights they need to make smarter
business decisions. To help uncover the true value of the data, On Space Technologies created
live online course Data Science for students, IT professionals looking to harness data in new and
innovative ways.
3
Al

Why learn Data Science?

· It’s in Demand- Data Science is highly in demand. It is the fastest growing job on
LinkedIn and is predicted to create 11.5 million jobs by 2026.

· Abundance of Positions- Data Science is a vastly abundant field & has a lot of
opportunities.
4
Al

· A Highly Paid Career-It is one of the most highly paid jobs. According to
Glassdoor, Data Scientist makes an average of $116,100 per year.

· Data Scientists are Highly Prestigious- Data Scientist allow companies to make
smarter business decisions. Companies rely on Data Scientists, use their expertise
to provide better results to their clients.

· Data Science is Versatile- Data Science is widely used in health care, banking, e-
commerce industries, etc. It is a versatile field.

Understanding the Data Science Lifecycle


5
Al

1. Business Understanding: It is extremely important to understand the business


objective
clearly because that will be your final goal of the analysis.
2. Data Understanding: Here you need to closely work with the business team as they are
actually aware of what data is present, what data could be used for this business problem and
other information. This step involves describing the data, their structure, the relevance, their
data type. Explore the data using graphical plots.

3. Data Preparation: This includes steps like selecting the relevant data, integrating the data by
merging the data sets, cleaning it, treating the missing values by either removing them or
imputing them, treating erroneous data by removing them, also check for outliers using box
plots and handle them.

4. Exploratory Data Analysis: This step involves getting some idea about the solution and factors
affecting it, before building the actual model. Distribution of data within different variables of
a feature is explored graphically using bar-graphs, Relations between different features is
captured through graphical representations like scatter plots and heat maps.

5. Data Modeling: Data modeling is the heart of data analysis. A model takes the prepared data
as input and provides the desired output. This step includes choosing the appropriate type of
model, whether the problem is a classification problem, or a regression problem ora
clustering problem. We need to tune the hyperparameters of each model to achieve the
desired performance. We also need to make sure there is a correct balance between
performance and generalizability

6. Model Evaluation: The model is tested on an unseen data, evaluated on a carefully thought
out set of evaluation metrics. We also need to make sure that the model conforms to reality.
If we do not obtain a satisfactory result in the evaluation, we must re-iterate the entire
modeling process until the desired level of metrics is achieved.

7. Model Deployment: The model after a rigorous evaluation is finally deployed in the desired
format and channel. This is the final step in the data science life cycle. Each step in the data
science life cycle explained above should be worked upon carefully.
6
Al

COURSE OVERVIEW
All industries now utilize data and Data-Science and Data-Analytics are increasingly identified
as key industrial activities. The position of Data Scientist is rapidly becoming a required post for
any company that wishes to take full advantage of the data that they collect. This course is
designed to give you the skills to step into a career as a Data Scientist in a wide range of industries
and companies.

AUDIENCE

· You should take this course if you want to become a Data Scientist or if you want to
learn about the field.
· This course is for you if you want a great career.
· The course is also ideal for beginners, as it starts from the fundamentals and gradually
builds up your skills.

PREREQUISITES

· Must have good Configuration Laptop to connect Video Conference for attending online
· Must have Good Internet Connection
· Must have Administrative privileges for any installation in the laptop
· No other specific Prerequisites required.
· Training will be given on GOOGLE COLAB.

WHAT YOU WILL LEARN

· Statistical analysis, Python programming with NumPy, pandas, matplotlib, and Seaborn,
Advanced statistical analysis, Machine Learning with stats models and scikit-learn, Deep
learning with TensorFlow.
· Understand the mathematics behind Machine Learning.
· Learn how to pre-process data.
· Start coding in Python and learn how to use it for statistical analysis.
· Be able to create Machine Learning algorithms in Python, using NumPy and scikit-learn.
· Improve Machine Learning algorithms by studying underfitting, overfitting, training,
validation, n-fold cross validation, testing, and how hyperparameters could improve
performance.
· Unfold the power of deep neural networks.
7
Al

TOPICS
DATA SCIENCE + ARTIFICIAL INTELLIGENCE + MACHINE LEARNING

FOUNDATIONS
Python Programming and Computer Science
· Types
· Flow Control and
· Data Structures

SciPy Stack
· NumPy
· Pandas
· Matplotlib

Mathematics
· Statistics
· Probability and
· Linear Algebra

Data Analysis
· Getting, cleaning, Analysing
· Visualizing raw data is the main responsibility of the industry data scientists.

Statistical Inference
· Probability
· Distributions and
· Hypothesis Testing

Summarizing and Visualizing Data


· Descriptive Statistics
· Univariate and Multivariate Exploratory Data Analysis
8
Al

Machine Learning
Will learn how to explore New data sets, implement a comprehensive set of machine learning
algorithms from scratch and master all the components of a predictive model such as
· Data pre-processing
· Feature engineering
· Model selection
· Performance metrics and hyper parameter optimization

Predictive Modeling
· Regression
· Classification
· Data Preprocessing
· Model Evaluation and Ensembles

Data Mining
· Dimensionality Reduction
· Clustering
· Association Rules
Specialty Topics
· Data Engineering
· Natural Language Processing and Neural Networks
9
Al

PROGRAM CURRICULUM

MODULE 1

Fundamentals of Python

Vectors, Matrices, and Arrays


· Creating a Vector
· Creating a Matrix
· Creating a Sparse Matrix
· Selecting Elements
· Describing a Matrix
· Applying Operations to Elements
· Finding the Maximum and Minimum Values
· Calculating the Average, Variance, and Standard Deviation
· Reshaping Arrays
· Transposing a Vector or Matrix
· Flattening a Matrix
· Finding the Rank of a Matrix
· Calculating the Determinant
· Getting the Diagonal of a Matrix
· Calculating the Trace of a Matrix
· Finding Eigenvalues and Eigenvectors
· Calculating Dot Products
· Adding and Subtracting Matrices
· Multiplying Matrices
· Inverting a Matrix
· Generating Random Values

Loading Data
· Loading a Sample Dataset
· Creating a Simulated Dataset
· Loading a CSV File
· Loading an Excel File
· Loading a JSON File
· Querying a SQL Database
10
Al

Data Wrangling
· Creating a Data Frame
· Describing the Data
· Navigating Data Frames
· Selecting Rows Based on Conditionals
· Replacing Values
· Renaming Columns
· Finding the Minimum, Maximum, Sum, Average, and Count
· Finding Unique Values
· Handling Missing Values
· Deleting a Column
· Deleting a Row
· Dropping Duplicate Rows
· Grouping Rows by Values
· Grouping Rows by Time
· Looping Over a Column
· Applying a Function Over All Elements in a Column
· Applying a Function to Groups
· Concatenating Data Frames
· Merging Data Frames

MODULE 2

Fundamentals of Exploratory Data Analysis

Working with Numerical Data


· Rescaling a Feature
· Standardizing a Feature
· Normalizing Observations
· Generating Polynomial and Interaction Features
· Transforming Features
· Detecting Outliers
· Handling Outliers
· Discretization of Features
· Grouping Observations Using Clustering
· Deleting Observations with Missing Values
· Imputing Missing Values
11
Al

Working with Categorical Data


· Encoding Nominal Categorical Features
· Encoding Ordinal Categorical Features
· Encoding Dictionaries of Features
· Imputing Missing Class Values
· Handling Imbalanced Classes

Working with Text


· Introduction
· Cleaning Text
· Parsing and Cleaning HTML
· Removing Punctuation
· Tokenizing Text
· Removing Stop Words
· Stemming Words
· Tagging Parts of Speech
· Encoding Text as a Bag of Words
· Weighting Word Importance

Working with Dates and Times


· Converting Strings to Dates
· Handling Time Zones
· Selecting Dates and Times
· Breaking Up Date Data into Multiple Features
· Calculating the Difference Between Dates
· Encoding Days of the Week
· Creating a Lagged Feature
· Using Rolling Time Windows
· Handling Missing Data in Time Series

Working with Images


· Loading Images
· Saving Images
· Resizing Images
· Cropping Images
· Blurring Images
· Sharpening Images
· Enhancing Contrast
· Isolating Colors
12
Al

· Binarizing Images
· Removing Backgrounds
· Detecting Edges
· Detecting Corners
· Creating Features for Machine Learning
· Encoding Mean Color as a Feature
· Encoding Color Histograms as Features

Visual Aids for EDA


· Line chart
· Steps involved
· Bar charts
· Scatter plot
· Bubble chart
· Scatter plot using seaborn
· Area plot and stacked plot
· Pie chart
· Table chart
· Polar chart
· Histogram
· Lollipop chart
· Pairplot using seaborn
· Heatmap using seaborn
· Choosing the best chart
· Other libraries to explore

Descriptive Statistics
· Understanding statistics
· Distribution function
· Uniform distribution
· Normal distribution
· Exponential distribution
· Binomial distribution
· Cumulative distribution function
· Descriptive statistics
· Measures of central tendency
· Mean/average
· Median
· Mode
· Measures of dispersion
· Standard deviation
· Variance
· Skewness
13
Al

· Kurtosis
· Types of kurtosis
· Calculating percentiles
· Quartiles
· Visualizing quartiles

Correlation
· Introducing correlation
· Types of analysis
· Understanding univariate analysis
· Understanding bivariate analysis
· Understanding multivariate analysis
· Correlation does not imply causation

Hypothesis Testing
· Hypothesis testing principle
· Stats Models library
· Types of hypothesis testing
· T-test

MODULE 3
Fundamentals of Big Data
· Big Data Landscape
· Hadoop Eco System
· Spark Eco System

MODULE 4
Fundamentals of Machine Learning

Dimensionality Reduction Using Feature Extraction


· Reducing Features Using Principal Components
· Reducing Features When Data Is Linearly Inseparable
· Reducing Features by Maximizing Class Separability
· Reducing Features Using Matrix Factorization
· Reducing Features on Sparse Data

Dimensionality Reduction Using Feature Selection


· Thresholding Numerical Feature Variance
· Thresholding Binary Feature Variance
· Handling Highly Correlated Features
· Removing Irrelevant Features for Classification
· Recursively Eliminating Features
Model Evaluation
· Cross-Validating Models
· Creating a Baseline Regression Model
· Creating a Baseline Classification Model
· Evaluating Binary Classifier Predictions

14
Al

· Evaluating Binary Classifier Thresholds


· Evaluating Multiclass Classifier Predictions
· Visualizing a Classifier’s Performance
· Evaluating Regression Models
· Evaluating Clustering Models
· Creating a Custom Evaluation Metric
· Visualizing the Effect of Training Set Size
· Creating a Text Report of Evaluation Metrics
· Visualizing the Effect of Hyperparameter Values

Model Selection
· Selecting Best Models Using Exhaustive Search
· Selecting Best Models Using Randomized Search
· Selecting Best Models from Multiple Learning Algorithms
· Selecting Best Models When Preprocessing
· Speeding Up Model Selection with Parallelization
· Speeding Up Model Selection Using Algorithm-Specific Methods
· Evaluating Performance After Model Selection
Machine Learning and Big Data
Various tools and technologies used in the projects of ML

MODULE 5
Machine Learning Algorithms
Supervised Learning
· Linear Regression
· Fitting a Line
· Multiple Regression
· Handling Interactive Effects
· Fitting a Nonlinear Relationship
· Reducing Variance with Regularization
· Reducing Features with Lasso Regression
· Assumptions in Regression Process
Trees and Forests
· Training a Decision Tree Classifier
· Training a Decision Tree Regressor
· Visualizing a Decision Tree Model
· Training a Random Forest Classifier
· Training a Random Forest Regressor
· Identifying Important Features in Random Forests
· Selecting Important Features in Random Forests
· Handling Imbalanced Classes
· Controlling Tree Size
· Improving Performance Through Boosting
· Evaluating Random Forests with Out-of-Bag Errors
15
Al

K-Nearest Neighbors
· Finding an Observation’s Nearest Neighbors
· Creating a K-Nearest Neighbor Classifier
· Identifying the Best Neighborhood Size
· Creating a Radius-Based Nearest Neighbor Classifier

Logistic Regression
· Training a Binary Classifier
· Training a Multiclass Classifier
· Reducing Variance Through Regularization
· Training a Classifier on Very Large Data
· Handling Imbalanced Classes

Support Vector Machines


· Training a Linear Classifier
· Handling Linearly Inseparable Classes Using Kernels
· Creating Predicted Probabilities
· Identifying Support Vectors
· Handling Imbalanced Classes

Naive Bayes
· Training a Classifier for Continuous Features
· Training a Classifier for Discrete and Count Features
· Training a Naive Bayes Classifier for Binary Features
· Calibrating Predicted Probabilities

Comparison of Algorithms
· Model Construction with Various Algorithms
· Usage of Accuracy and Other Parameters
· Calibrating Results

Unsupervised Learning
· Clustering
· Clustering Using K-Means
· Speeding Up K-Means Clustering
· Clustering Using Mean shift
· Clustering Using DBSCAN
· Clustering Using Hierarchical Merging
16
Al

TOPICS
ARTIFICIAL INTELLIGENCE + DEEP LEARNING
MODULE 6

Neural Networks
· Introduction to neural prediction
· Forward propagation
· Gradient Descent
· Learning multiple weights at a time: generalizing gradient descent
· Full, batch, and stochastic gradient descent
· Backpropagation
· Regularization
· Three-layer network on MNIST
· Overfitting in neural networks
· Modeling probabilities and nonlinearities: activation functions
· Softmax computation
· Deep Learning for Computer Vision
· Convolutional Neural Networks
· Recurrent Neural Networks
· Deep Learning with Pytorch
· TensorFlow 2.0
Natural Language Processing
· Introduction to Natural Language Processing
· Text Analytics and NLP
· Various Steps in NLP
· Kick Starting an NLP Project
· Basic Feature Extraction Methods
· Bag of Words
· TF-IDF
· Feature Engineering
· Developing a Text classifier
· Topic Modeling
· Text Summarization and Text Generation
· Vector Representation
· Sentiment Analysis
Saving and Loading Trained Models
· Saving and Loading a scikit-learn Model
· Saving and Loading a Keras Model
17
Al

TOPICS
STRUCTURED QUERY LANGUAGE
MODULE 7

Analysis with SQL


· What is data analysis?
· Why SQL
· What is SQL?
· Benefits of SQL
· SQL vs. R or Python
· SQL as part of the analysis workflow
· Database Types and How to Work with Them
· Row-store databases
· Column-store databases
· Other flavors of data infrastructure

Preparing Data for Analysis


· Types of Data
· Database data types
· Structured vs. Unstructured
· First-party, Third-party, and Cloud Vendor data
· Sparse data
· Quantitative vs. qualitative data
· Categorical vs. continuous
· Profiling: Distributions
· Histograms and frequencies
· Binning
· N-tiles
· Profiling: Data Quality
· Detecting duplicates
· Deduplication with GROUP BY and DISTINCT
· Missing data
· Data cleaning
· CASE transformations
· Dealing with nulls: COALESCE, NULLIF, NVL
· Casting and type conversions
· Shaping Data
18
Al

· For which output: BI, Visualization, statistics, ML


· Pivoting with CASE statements
· Unpivot with UNION statements
· PIVOT and UNPIVOT

Time Series Analysis


· Date, datetime, and time manipulations
· Time zone conversions
· Date and timestamp format conversions
· Date math
· Time math
· Joining data from different sources
· The retail sales data set
· Trending the data
· Simple trends
· Comparing components
· Percent of total calculations
· Indexing to see % change over time
· Rolling time windows
· Calculating rolling time windows
· Rolling time windows with sparse data
· Calculating cumulative values
· Analyzing with seasonality
· Period over period comparisons - YoY and MoM
· Period over period comparisons - Same month vs. last year
· Comparing to multiple prior periods
19
Al

Live Project

COURSE COMPLETION CERTIFICATE ISSUANCE

THANK YOU NOTE FROM ONSPACE TECHNOLOGIES


For more information www.onspaceglobal.com
www.innovatzglobal.com

For Enquiries
info@onspaceglobal.com

You might also like