NDS Data Practitioner Degree Curriculum

ABOUT
The Data practitioner degree is an online, immersive and practical

course that equips students with a holistic understanding of data, spanning
data engineering, data science, machine learning, generative AI, SQL,
traditional databases, and proficiency in programming languages such as
Python and R. In an era where data drives decisions across industries, this
course offers a unique opportunity to master these critical skills while learning
from industry professionals and real-world case studies.
Starting with the essentials of data engineering, students will gain

hands-on experience in collecting, cleaning, and transforming data for real-
world applications. Our curriculum emphasizes the significance of traditional
databases and SQL, offering a comprehensive view of data management. As
you progress, you’ll delve into data science, uncovering methodologies to
extract insights and effectively communicate your findings. The course further
extends into machine learning, providing detailed explanation on training and
deploying models. A highlight of the program is our exploration of generative
AI, enabling you to create innovative solutions.
What sets this course apart is our commitment to real-world

relevance. Taught by industry professionals, you’ll engage in hands-on labs
and examinations based on actual scenarios encountered in data-driven
roles. By the end of the Comprehensive Data Practitioner degree, you will
emerge as a well-rounded data expert, ready to navigate the complexities of
today’s data landscape, apply your knowledge to real-world challenges, and
drive meaningful change in your career.
DATA
PRACTITIONER
degree
CURRICULUM
content
1.SQL and Relational Databases:
MySQL, Postgres, SQL Server And Oracle
Database History and timeline
Introduction to ANSI SQL

CONTENT
SQL: Select Statements
WHERE clauses and operators
ORDER BY clause
Common SELECT statement errors and tips
DDL and DML
DDL: Insert, Update, Truncate and Delete
statements
DML: Create table, alter table, DROP statement
Keys: Primary and Foreign
Relational data model
Data types
JOINS
Subqueries
Normalization:
Distinct vs Distinct ON
String functions for data transformation
Date functions
GROUP BY
Single Row functions
Multirow functions
Maskin functions
Numeric functions and arithmetic
Advanced data types:
Transactions
Views
2. Data model 5. Database administration
Data modeling introduction ●Connecting to a database securely:
XML data format ○ MySQL
JSON data format ○ Postgres
Entity relationship Model ○ SQL Server
Entity relationship Diagram ○ Oracle
Slowly changing dimensions ●User creation
Fact dimension model ●User permissions
Snowflake model ●Load data
Data Vaults ●Replication
Event driven architecture. ●Backup creation and monitoring
●Dumping data out of database
●Cloud databases:
3.GIT for source control
○ Amazon RDS
Intro to GIT and GitHub.com
○ Google Cloud SQL
Installing GIT tools
○ Azure SQL instance
Creating and instantiating a git repository
●Indexes:
Branches
○ Unique
Tracking and committing changes
○ Partial and multi-column
Cloning a repository
○ Postgres Indexes
Resolving conflicts manually
○ MySQL Indexes
○ Oracle Indexes
4. Analytical SQL ○ SQL Server Indexes
OLAP vs OLTP
Windowing functions 6.Data Lakes and data

Create series warehouse
Materialized views Introduction to data lake
Common table expressions (CTE) Lake functionality and structure
Introduction to GeoSpatial SQL Data warehousing principles
Pivoting results Introduction to data lake houses

7. Analytical Databases
Introduction to Columnar database systems
Column Store databases:
Redshift
BIgQuery
SnowFlake
Azure data warehouse
Databricks
8. Data Streaming
Introduction to data streams
Data streaming vs Batch
Kafka
Hosted and managed Kafka
9. Data pipelines
Introduction to ETL and data pipelines Orchestration
ELT vs ETL Airflow
Batch vs Streaming Argoflow
ETL with SQL
Athena and S3 as data lake and data ETL works
warehouse Fivetran
Database to Data lake: DBZium
AWS DMS Continuous ingestion
GCP Dataflow Lift and move
Semi-structured/Unstructured to
structured
10. Data migration 11. NoSQL
Introduction to data migration Introduction to NoSQL, what is it and when
Alembic to use it
Flyway NoSQL database types
Schema change (snowflake) MongoDB
12. PYTHON 13.R

Introduction to Python for data analytics Introduction to R and R studio
Python vs Java vs C#/C++ R Syntax
Virtual Environment Matrix, list, vectors and other data types
Python development environment Control structures
Dataframes Functions in R
Performance considerations Ingesting Data
Connecting to database systems Data cleaning
Data ingestion Data Transformation
Data export and conversion Data manipulation and transformation
SQL queries on Dataframes Data visualization
Numpy Probabilistic and Statistical Analysis
Webscraping Hypothesis testing
Python ETL: ML:
PySpark Web Scraping
Mlib Text mining
Data plotting
Custom classes and objects
Jupyter Notebook
Python for ML and AI

14.MACHINE LEARNING
Introduction to ML models and algorithms K-means
K-cluster Random forest
Linear regression Dimensionality reduction
logistic regression Gradient boosting
Decision trees Image classification
SVM Speech recognition
Naive bayes Sentiment analysis
KNN
15. Generative AI and ChatGPT

Introduction to Generative AI Dall-E
Use cases LLama and Open Llama
Attention and Recurrent Neural Networks Generative AI project life cycle
Encoder and decoder architecture Challenges tuning models
Tokenization Efficient model tuning
LLM: LLM performance metrics
ChatGPT PEFT Fine tuning
ChatGPT4 and Mixture of Model Human values alignment
Architecture Reinforce learning:
Prompt Engineering Catastrophic forgetting
Prompt Tuning Post training optimization
Chain of thought Prompting LLM Pruning
Multimodal AI ●LLM Distilation
Program Aided Model
Responding-Action Framework
16. Vector Databases
●Introduction to Vector databases
●Querying a Vector Database
●PineCone walkthrough
17. Data visualization and

Business Intelligence
Business intelligence introduction
Data visualization principles
Caching and performance
Visualization with Metbase
Visualization with Looker
Visualization with Data Studio
Visualization with PowerBI
Visualization with Tableau
18.Tools walkthrough
DataGrip
Fivetran
Duomo
AutoGPT
19. Regulatory compliance

GDPR
CCPA
HIPAA
Data residency requirements
Privacy preserving Analytics and differential privacy

Copyright © 2023 by NERIO DATA SCHOOL
All rights reserved.

NDS Data Practitioner Degree Curriculum

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NDS Data Practitioner Degree Curriculum

Uploaded by

Copyright:

Available Formats

ABOUT

The Data practitioner degree is an online, immersive and practical

Starting with the essentials of data engineering, students will gain

What sets this course apart is our commitment to real-world

 Introduction to ANSI SQL

 WHERE clauses and operators

 Common SELECT statement errors and tips

 DDL and DML

 DDL: Insert, Update, Truncate and Delete

 DML: Create table, alter table, DROP statement

 Keys: Primary and Foreign

 Relational data model

 String functions for data transformation

 Single Row functions

 Numeric functions and arithmetic

 Advanced data types:

 XML data format ○ MySQL

 JSON data format ○ Postgres

 Entity relationship Model ○ SQL Server

 Entity relationship Diagram ○ Oracle

 Slowly changing dimensions  ●User creation

 Fact dimension model  ●User permissions

 Snowflake model  ●Load data

 Data Vaults  ●Replication

 Event driven architecture.  ●Backup creation and monitoring

 ●Dumping data out of database

 Windowing functions 6.Data Lakes and data

 Common table expressions (CTE)  Lake functionality and structure

 Introduction to GeoSpatial SQL  Data warehousing principles

 Pivoting results  Introduction to data lake houses

 Column Store databases:

Azure data warehouse

 Data streaming vs Batch

 Hosted and managed Kafka

 ELT vs ETL Airflow

 Batch vs Streaming Argoflow

 ETL with SQL

 Athena and S3 as data lake and data  ETL works

 Database to Data lake:  DBZium

AWS DMS  Continuous ingestion

GCP Dataflow  Lift and move

 Flyway  NoSQL database types

 Schema change (snowflake)  MongoDB

12. PYTHON 13.R

 Custom classes and objects

 Python for ML and AI

 K-cluster  Random forest

 Linear regression  Dimensionality reduction

 logistic regression  Gradient boosting

 Decision trees  Image classification

 SVM  Speech recognition

 Naive bayes  Sentiment analysis

15. Generative AI and ChatGPT

 Use cases  LLama and Open Llama

 Attention and Recurrent Neural Networks  Generative AI project life cycle

 Encoder and decoder architecture  Challenges tuning models

 Tokenization  Efficient model tuning

 LLM:  LLM performance metrics

ChatGPT  PEFT Fine tuning

ChatGPT4 and Mixture of Model  Human values alignment

Architecture  Reinforce learning:

 Prompt Engineering  Catastrophic forgetting

 Prompt Tuning  Post training optimization

Introduction to ANSI SQL

WHERE clauses and operators

Common SELECT statement errors and tips

DDL and DML

DDL: Insert, Update, Truncate and Delete

DML: Create table, alter table, DROP statement

Keys: Primary and Foreign

Relational data model

String functions for data transformation

Single Row functions

Numeric functions and arithmetic

Advanced data types:

XML data format ○ MySQL

JSON data format ○ Postgres

Entity relationship Model ○ SQL Server

Entity relationship Diagram ○ Oracle

Slowly changing dimensions ●User creation

Fact dimension model ●User permissions

Snowflake model ●Load data

Data Vaults ●Replication

Event driven architecture. ●Backup creation and monitoring

●Dumping data out of database

Windowing functions 6.Data Lakes and data

Common table expressions (CTE) Lake functionality and structure

Introduction to GeoSpatial SQL Data warehousing principles

Pivoting results Introduction to data lake houses

Column Store databases:

Data streaming vs Batch

Hosted and managed Kafka

ELT vs ETL Airflow

Batch vs Streaming Argoflow

ETL with SQL

Athena and S3 as data lake and data ETL works

Database to Data lake: DBZium

AWS DMS Continuous ingestion

GCP Dataflow Lift and move

Flyway NoSQL database types

Schema change (snowflake) MongoDB

Custom classes and objects

Python for ML and AI

K-cluster Random forest

Linear regression Dimensionality reduction

logistic regression Gradient boosting

Decision trees Image classification

SVM Speech recognition

Naive bayes Sentiment analysis

Use cases LLama and Open Llama

Attention and Recurrent Neural Networks Generative AI project life cycle

Encoder and decoder architecture Challenges tuning models

Tokenization Efficient model tuning

LLM: LLM performance metrics

ChatGPT PEFT Fine tuning

ChatGPT4 and Mixture of Model Human values alignment

Architecture Reinforce learning:

Prompt Engineering Catastrophic forgetting

Prompt Tuning Post training optimization

Chain of thought Prompting LLM Pruning

Multimodal AI ●LLM Distilation

Program Aided Model

●Querying a Vector Database

Data visualization principles

Caching and performance

Visualization with Metbase

Visualization with Looker

Visualization with Data Studio

Visualization with PowerBI

Visualization with Tableau

Data residency requirements

Privacy preserving Analytics and differential privacy