Professional Documents
Culture Documents
(Data Science + Artificial intelligence + Machine Learning + Deep Learning + Python + SQL)
Break into the rapidly growing field of data science with Onspace Technologies
For Data Science Professional Program
Fundamentals
1
Al
COURSE DURATION
Today Organizations are struggling to extract the powerful insights they need to make smarter
business decisions. To help uncover the true value of the data, On Space Technologies created
live online course Data Science for students, IT professionals looking to harness data in new and
innovative ways.
3
Al
· It’s in Demand- Data Science is highly in demand. It is the fastest growing job on
LinkedIn and is predicted to create 11.5 million jobs by 2026.
· Abundance of Positions- Data Science is a vastly abundant field & has a lot of
opportunities.
4
Al
· A Highly Paid Career-It is one of the most highly paid jobs. According to
Glassdoor, Data Scientist makes an average of $116,100 per year.
· Data Scientists are Highly Prestigious- Data Scientist allow companies to make
smarter business decisions. Companies rely on Data Scientists, use their expertise
to provide better results to their clients.
· Data Science is Versatile- Data Science is widely used in health care, banking, e-
commerce industries, etc. It is a versatile field.
3. Data Preparation: This includes steps like selecting the relevant data, integrating the data by
merging the data sets, cleaning it, treating the missing values by either removing them or
imputing them, treating erroneous data by removing them, also check for outliers using box
plots and handle them.
4. Exploratory Data Analysis: This step involves getting some idea about the solution and factors
affecting it, before building the actual model. Distribution of data within different variables of
a feature is explored graphically using bar-graphs, Relations between different features is
captured through graphical representations like scatter plots and heat maps.
5. Data Modeling: Data modeling is the heart of data analysis. A model takes the prepared data
as input and provides the desired output. This step includes choosing the appropriate type of
model, whether the problem is a classification problem, or a regression problem ora
clustering problem. We need to tune the hyperparameters of each model to achieve the
desired performance. We also need to make sure there is a correct balance between
performance and generalizability
6. Model Evaluation: The model is tested on an unseen data, evaluated on a carefully thought
out set of evaluation metrics. We also need to make sure that the model conforms to reality.
If we do not obtain a satisfactory result in the evaluation, we must re-iterate the entire
modeling process until the desired level of metrics is achieved.
7. Model Deployment: The model after a rigorous evaluation is finally deployed in the desired
format and channel. This is the final step in the data science life cycle. Each step in the data
science life cycle explained above should be worked upon carefully.
6
Al
COURSE OVERVIEW
All industries now utilize data and Data-Science and Data-Analytics are increasingly identified
as key industrial activities. The position of Data Scientist is rapidly becoming a required post for
any company that wishes to take full advantage of the data that they collect. This course is
designed to give you the skills to step into a career as a Data Scientist in a wide range of industries
and companies.
AUDIENCE
· You should take this course if you want to become a Data Scientist or if you want to
learn about the field.
· This course is for you if you want a great career.
· The course is also ideal for beginners, as it starts from the fundamentals and gradually
builds up your skills.
PREREQUISITES
· Must have good Configuration Laptop to connect Video Conference for attending online
· Must have Good Internet Connection
· Must have Administrative privileges for any installation in the laptop
· No other specific Prerequisites required.
· Training will be given on GOOGLE COLAB.
· Statistical analysis, Python programming with NumPy, pandas, matplotlib, and Seaborn,
Advanced statistical analysis, Machine Learning with stats models and scikit-learn, Deep
learning with TensorFlow.
· Understand the mathematics behind Machine Learning.
· Learn how to pre-process data.
· Start coding in Python and learn how to use it for statistical analysis.
· Be able to create Machine Learning algorithms in Python, using NumPy and scikit-learn.
· Improve Machine Learning algorithms by studying underfitting, overfitting, training,
validation, n-fold cross validation, testing, and how hyperparameters could improve
performance.
· Unfold the power of deep neural networks.
7
Al
TOPICS
DATA SCIENCE + ARTIFICIAL INTELLIGENCE + MACHINE LEARNING
FOUNDATIONS
Python Programming and Computer Science
· Types
· Flow Control and
· Data Structures
SciPy Stack
· NumPy
· Pandas
· Matplotlib
Mathematics
· Statistics
· Probability and
· Linear Algebra
Data Analysis
· Getting, cleaning, Analysing
· Visualizing raw data is the main responsibility of the industry data scientists.
Statistical Inference
· Probability
· Distributions and
· Hypothesis Testing
Machine Learning
Will learn how to explore New data sets, implement a comprehensive set of machine learning
algorithms from scratch and master all the components of a predictive model such as
· Data pre-processing
· Feature engineering
· Model selection
· Performance metrics and hyper parameter optimization
Predictive Modeling
· Regression
· Classification
· Data Preprocessing
· Model Evaluation and Ensembles
Data Mining
· Dimensionality Reduction
· Clustering
· Association Rules
Specialty Topics
· Data Engineering
· Natural Language Processing and Neural Networks
9
Al
PROGRAM CURRICULUM
MODULE 1
Fundamentals of Python
Loading Data
· Loading a Sample Dataset
· Creating a Simulated Dataset
· Loading a CSV File
· Loading an Excel File
· Loading a JSON File
· Querying a SQL Database
10
Al
Data Wrangling
· Creating a Data Frame
· Describing the Data
· Navigating Data Frames
· Selecting Rows Based on Conditionals
· Replacing Values
· Renaming Columns
· Finding the Minimum, Maximum, Sum, Average, and Count
· Finding Unique Values
· Handling Missing Values
· Deleting a Column
· Deleting a Row
· Dropping Duplicate Rows
· Grouping Rows by Values
· Grouping Rows by Time
· Looping Over a Column
· Applying a Function Over All Elements in a Column
· Applying a Function to Groups
· Concatenating Data Frames
· Merging Data Frames
MODULE 2
· Binarizing Images
· Removing Backgrounds
· Detecting Edges
· Detecting Corners
· Creating Features for Machine Learning
· Encoding Mean Color as a Feature
· Encoding Color Histograms as Features
Descriptive Statistics
· Understanding statistics
· Distribution function
· Uniform distribution
· Normal distribution
· Exponential distribution
· Binomial distribution
· Cumulative distribution function
· Descriptive statistics
· Measures of central tendency
· Mean/average
· Median
· Mode
· Measures of dispersion
· Standard deviation
· Variance
· Skewness
13
Al
· Kurtosis
· Types of kurtosis
· Calculating percentiles
· Quartiles
· Visualizing quartiles
Correlation
· Introducing correlation
· Types of analysis
· Understanding univariate analysis
· Understanding bivariate analysis
· Understanding multivariate analysis
· Correlation does not imply causation
Hypothesis Testing
· Hypothesis testing principle
· Stats Models library
· Types of hypothesis testing
· T-test
MODULE 3
Fundamentals of Big Data
· Big Data Landscape
· Hadoop Eco System
· Spark Eco System
MODULE 4
Fundamentals of Machine Learning
14
Al
Model Selection
· Selecting Best Models Using Exhaustive Search
· Selecting Best Models Using Randomized Search
· Selecting Best Models from Multiple Learning Algorithms
· Selecting Best Models When Preprocessing
· Speeding Up Model Selection with Parallelization
· Speeding Up Model Selection Using Algorithm-Specific Methods
· Evaluating Performance After Model Selection
Machine Learning and Big Data
Various tools and technologies used in the projects of ML
MODULE 5
Machine Learning Algorithms
Supervised Learning
· Linear Regression
· Fitting a Line
· Multiple Regression
· Handling Interactive Effects
· Fitting a Nonlinear Relationship
· Reducing Variance with Regularization
· Reducing Features with Lasso Regression
· Assumptions in Regression Process
Trees and Forests
· Training a Decision Tree Classifier
· Training a Decision Tree Regressor
· Visualizing a Decision Tree Model
· Training a Random Forest Classifier
· Training a Random Forest Regressor
· Identifying Important Features in Random Forests
· Selecting Important Features in Random Forests
· Handling Imbalanced Classes
· Controlling Tree Size
· Improving Performance Through Boosting
· Evaluating Random Forests with Out-of-Bag Errors
15
Al
K-Nearest Neighbors
· Finding an Observation’s Nearest Neighbors
· Creating a K-Nearest Neighbor Classifier
· Identifying the Best Neighborhood Size
· Creating a Radius-Based Nearest Neighbor Classifier
Logistic Regression
· Training a Binary Classifier
· Training a Multiclass Classifier
· Reducing Variance Through Regularization
· Training a Classifier on Very Large Data
· Handling Imbalanced Classes
Naive Bayes
· Training a Classifier for Continuous Features
· Training a Classifier for Discrete and Count Features
· Training a Naive Bayes Classifier for Binary Features
· Calibrating Predicted Probabilities
Comparison of Algorithms
· Model Construction with Various Algorithms
· Usage of Accuracy and Other Parameters
· Calibrating Results
Unsupervised Learning
· Clustering
· Clustering Using K-Means
· Speeding Up K-Means Clustering
· Clustering Using Mean shift
· Clustering Using DBSCAN
· Clustering Using Hierarchical Merging
16
Al
TOPICS
ARTIFICIAL INTELLIGENCE + DEEP LEARNING
MODULE 6
Neural Networks
· Introduction to neural prediction
· Forward propagation
· Gradient Descent
· Learning multiple weights at a time: generalizing gradient descent
· Full, batch, and stochastic gradient descent
· Backpropagation
· Regularization
· Three-layer network on MNIST
· Overfitting in neural networks
· Modeling probabilities and nonlinearities: activation functions
· Softmax computation
· Deep Learning for Computer Vision
· Convolutional Neural Networks
· Recurrent Neural Networks
· Deep Learning with Pytorch
· TensorFlow 2.0
Natural Language Processing
· Introduction to Natural Language Processing
· Text Analytics and NLP
· Various Steps in NLP
· Kick Starting an NLP Project
· Basic Feature Extraction Methods
· Bag of Words
· TF-IDF
· Feature Engineering
· Developing a Text classifier
· Topic Modeling
· Text Summarization and Text Generation
· Vector Representation
· Sentiment Analysis
Saving and Loading Trained Models
· Saving and Loading a scikit-learn Model
· Saving and Loading a Keras Model
17
Al
TOPICS
STRUCTURED QUERY LANGUAGE
MODULE 7
Live Project
For Enquiries
info@onspaceglobal.com