You are on page 1of 9

www.onthegomodel.

com 1
Foundations of Data
Science
In this course you’ll learn the fundamentals of data science
and widely used terminology and tools in the industry. You’ll
also understand the processes involved in solving any data
science problem and get a grip on how some real world
problems are solved.

Introduction to Data Science

What is Data Science? Components of Data Science

Applications of Data Science in Python Libraries for Data Science


various sectors

Mathematical Preliminaries

Basic Probability theory Poisson distribution

Basic Statistics Exponential distribution

Bayesian statistics Mean, Median and Mode

Probability distributions Central Limit theorem

Normal distribution

Handling Data with Python

Importing Data Sets into Python Grouping Data

DataFrame Object in Pandas Aggregating Data

Dataframe Operations

www.onthegomodel.com 2
Data Cleaning

Data types Handling Inconsistent Data

Handling Missing Values Handling Outliers

Handling Duplicates

Data Visualization fundamentals

Principles of Information Applied Visualizations


visualization

Charting Fundamentals

Matplotlib, Seaborn

2D Plots and Subplots Styling

3D plots Complex layouts with Pandas and


Seaborn
Animations and Interactivity

Data Munging and Exploratory Data Analysis

Data Formatting and Normalization Descriptive Statistics

Feature Scaling Correlation

Dimensionality Reduction Scores and Rankings

Summarizing Data Frames

Course Project Develop a model and study the top products with
Ecommerce Sale - An best ratings and sales performance on the e-
Exploratory Analysis commerce platform.

www.onthegomodel.com 3
Advanced Data Science
In this course, you’ll learn advanced concepts in data
science using statistical models. During this course, you’ll
get to apply the learnt processes on different data sets and
also draw interesting insights from the data.

Statistical Analysis

Inferential Statistics F Test

Statistical signi cance Chi Square Test

Z-test Multivariate Analysis

T-test Hypothesis Testing

ANOVA

Building and Validating Models

Linear regression Nearest neighbor methods

Logistic regression Clustering methods

Design of Experiments and Surveys - I

Data science in the ideal versus real Sampling bias and random
life sampling

Components of Experimental Blocking and adjustment


Design
Multiplicity
Causality
Effect size, signi cance, & modelling
Confounding

www.onthegomodel.com 4
Design of Experiments and Surveys - II

Comparison with benchmark One Factor Experiments


effects
Multi-factor Experiments
Negative controls
Taguchi Methods
Non-signi cance
Report writing
Design Process and Guidelines

Are young generations most affected by Covid 19?


Course Project
Develop hypotheses to statistically analyze the
Unemployment Trends
unemployment trends across various european
due to Covid
countries.

www.onthegomodel.com 5
Data Science at Scale
In this course, you will learn how to build predictive data
science models for product teams using different cloud
environment tools. Throughout the course, you’ll have
plenty of exercises to work with a diverse range of toolset.
You’ll also explore model work ows that move data between
different cloud environments and also build real-time data
pipelines.

Fundamentals of Data Engineering

What is Data Engineering Tools of a Data Engineer

Data Engineering Problems Cloud Providers and Cloud


Computing

Data Engineering Tools

Fundamentals of databases Work ow scheduling frameworks

Parallel computing frameworks Air ow DAGs

Spark, Hadoop and Hive

Building Data Engineering Pipelines in Python

Scalable computing with Python Data ingestion with Pandas

Cloud Environments Data pipelines

Coding Environments ETL process

www.onthegomodel.com 6
Data Modelling Tools

Data Modelling with PostgreSQL Storing Data in cloud warehouses

Data Modelling with Apache Building Data Lakes


Cassandra

Prototyping Data Models

Linear Regression Keras Regression

Logistic Regression Automated Feature Engineering

Data Models as Web Endpoints

Web service Model Endpoints

Echo service Deploying Endpoints with Gunicorn


and Heroku
Model persistence

Pyspark for Batch pipelines

Introduction to Pyspark Structured data processing with


SparkSQL and Python
Resilient distributed data sets
Machine learning with Pyspark

www.onthegomodel.com 7
Cloud Computing

Fundamentals of Google Cloud Overview of Apache Kafka


Platform for Machine learning
Sklearn Streaming
Cloud Data warehouses with
Streaming analytics systems
Google cloud platform

Batch Model pipelines in cloud

Introduction to streaming model


work ows

www.onthegomodel.com 8

You might also like