You are on page 1of 10

ABOUT

The Data practitioner degree is an online, immersive and practical


course that equips students with a holistic understanding of data, spanning
data engineering, data science, machine learning, generative AI, SQL,
traditional databases, and proficiency in programming languages such as
Python and R. In an era where data drives decisions across industries, this
course offers a unique opportunity to master these critical skills while learning
from industry professionals and real-world case studies.

Starting with the essentials of data engineering, students will gain


hands-on experience in collecting, cleaning, and transforming data for real-
world applications. Our curriculum emphasizes the significance of traditional
databases and SQL, offering a comprehensive view of data management. As
you progress, you’ll delve into data science, uncovering methodologies to
extract insights and effectively communicate your findings. The course further
extends into machine learning, providing detailed explanation on training and
deploying models. A highlight of the program is our exploration of generative
AI, enabling you to create innovative solutions.

What sets this course apart is our commitment to real-world


relevance. Taught by industry professionals, you’ll engage in hands-on labs
and examinations based on actual scenarios encountered in data-driven
roles. By the end of the Comprehensive Data Practitioner degree, you will
emerge as a well-rounded data expert, ready to navigate the complexities of
today’s data landscape, apply your knowledge to real-world challenges, and
drive meaningful change in your career.
DATA
PRACTITIONER
degree
CURRICULUM
content
1.SQL and Relational Databases:
MySQL, Postgres, SQL Server And Oracle
‹ Database History and timeline

‹ Introduction to ANSI SQL


CONTENT
‹ SQL: Select Statements

‹ WHERE clauses and operators

‹ ORDER BY clause

‹ Common SELECT statement errors and tips

‹ DDL and DML

‹ DDL: Insert, Update, Truncate and Delete

statements

‹ DML: Create table, alter table, DROP statement

‹ Keys: Primary and Foreign

‹ Relational data model

‹ Data types

‹ JOINS

‹ Subqueries

‹ Normalization:

‹ Distinct vs Distinct ON

‹ String functions for data transformation

‹ Date functions

‹ GROUP BY

‹ Single Row functions

‹ Multirow functions

‹ Maskin functions

‹ Numeric functions and arithmetic

‹ Advanced data types:

‹ Transactions

‹ Views
2. Data model 5. Database administration
‹ Data modeling introduction ‹ ●Connecting to a database securely:

‹ XML data format ○ MySQL

‹ JSON data format ○ Postgres

‹ Entity relationship Model ○ SQL Server

‹ Entity relationship Diagram ○ Oracle

‹ Slowly changing dimensions ‹ ●User creation

‹ Fact dimension model ‹ ●User permissions

‹ Snowflake model ‹ ●Load data

‹ Data Vaults ‹ ●Replication

‹ Event driven architecture. ‹ ●Backup creation and monitoring

‹ ●Dumping data out of database

‹ ●Cloud databases:
3.GIT for source control
○ Amazon RDS
‹ Intro to GIT and GitHub.com
○ Google Cloud SQL
‹ Installing GIT tools
○ Azure SQL instance
‹ Creating and instantiating a git repository
‹ ●Indexes:
‹ Branches
○ Unique
‹ Tracking and committing changes
○ Partial and multi-column
‹ Cloning a repository
○ Postgres Indexes
‹ Resolving conflicts manually
○ MySQL Indexes

○ Oracle Indexes
4. Analytical SQL ○ SQL Server Indexes
‹ OLAP vs OLTP

‹ Windowing functions 6.Data Lakes and data


‹ Create series warehouse
‹ Materialized views ‹ Introduction to data lake

‹ Common table expressions (CTE) ‹ Lake functionality and structure

‹ Introduction to GeoSpatial SQL ‹ Data warehousing principles

‹ Pivoting results ‹ Introduction to data lake houses


7. Analytical Databases
‹ Introduction to Columnar database systems

‹ Column Store databases:

Redshift

BIgQuery

SnowFlake

Azure data warehouse

Databricks

8. Data Streaming
‹ Introduction to data streams

‹ Data streaming vs Batch

‹ Kafka

‹ Hosted and managed Kafka

9. Data pipelines
‹ Introduction to ETL and data pipelines ‹ Orchestration

‹ ELT vs ETL Airflow

‹ Batch vs Streaming Argoflow

‹ ETL with SQL

‹ Athena and S3 as data lake and data ‹ ETL works

warehouse ‹ Fivetran

‹ Database to Data lake: ‹ DBZium

AWS DMS ‹ Continuous ingestion

GCP Dataflow ‹ Lift and move

Semi-structured/Unstructured to

structured
10. Data migration 11. NoSQL
‹ Introduction to data migration ‹ Introduction to NoSQL, what is it and when

‹ Alembic to use it

‹ Flyway ‹ NoSQL database types

‹ Schema change (snowflake) ‹ MongoDB

12. PYTHON 13.R


‹ Introduction to Python for data analytics ‹ Introduction to R and R studio
‹ Python vs Java vs C#/C++ ‹ R Syntax
‹ Virtual Environment ‹ Matrix, list, vectors and other data types
‹ Python development environment ‹ Control structures
‹ Dataframes ‹ Functions in R
‹ Performance considerations ‹ Ingesting Data
‹ Connecting to database systems ‹ Data cleaning
‹ Data ingestion ‹ Data Transformation
‹ Data export and conversion ‹ Data manipulation and transformation
‹ SQL queries on Dataframes ‹ Data visualization
‹ Numpy ‹ Probabilistic and Statistical Analysis
‹ Webscraping ‹ Hypothesis testing
‹ Python ETL: ‹ ML:
‹ PySpark ‹ Web Scraping
‹ Mlib ‹ Text mining
‹ Data plotting

‹ Custom classes and objects

‹ Jupyter Notebook

‹ Python for ML and AI


14.MACHINE LEARNING
‹ Introduction to ML models and algorithms ‹ K-means

‹ K-cluster ‹ Random forest

‹ Linear regression ‹ Dimensionality reduction

‹ logistic regression ‹ Gradient boosting

‹ Decision trees ‹ Image classification

‹ SVM ‹ Speech recognition

‹ Naive bayes ‹ Sentiment analysis

‹ KNN

15. Generative AI and ChatGPT


‹ Introduction to Generative AI ‹ Dall-E

‹ Use cases ‹ LLama and Open Llama

‹ Attention and Recurrent Neural Networks ‹ Generative AI project life cycle

‹ Encoder and decoder architecture ‹ Challenges tuning models

‹ Tokenization ‹ Efficient model tuning

‹ LLM: ‹ LLM performance metrics

ChatGPT ‹ PEFT Fine tuning

ChatGPT4 and Mixture of Model ‹ Human values alignment

Architecture ‹ Reinforce learning:

‹ Prompt Engineering ‹ Catastrophic forgetting

‹ Prompt Tuning ‹ Post training optimization

‹ Chain of thought Prompting ‹ LLM Pruning

‹ Multimodal AI ‹ ●LLM Distilation

‹ Program Aided Model

‹ Responding-Action Framework
16. Vector Databases
‹ ●Introduction to Vector databases

‹ ●Querying a Vector Database

‹ ●PineCone walkthrough

17. Data visualization and


Business Intelligence
‹ Business intelligence introduction

‹ Data visualization principles

‹ Caching and performance

‹ Visualization with Metbase

‹ Visualization with Looker

‹ Visualization with Data Studio

‹ Visualization with PowerBI

‹ Visualization with Tableau

18.Tools walkthrough
‹ DataGrip

‹ Fivetran

‹ Duomo

‹ AutoGPT

19. Regulatory compliance


‹ GDPR

‹ CCPA

‹ HIPAA

‹ Data residency requirements

‹ Privacy preserving Analytics and differential privacy


Copyright © 2023 by NERIO DATA SCHOOL
All rights reserved.

You might also like