MAHE Manipal Academy of Higher Education Data Science and AI Syllabus

MANIPAL ACADEMY OF HIGHER EDUCATION, MANIPAL (MAHE)
Centre for Executive Education (CEE)

(Applicable to the candidates admitted to the M Tech programs from July 2019)
Master of Technology in Data Science and Artificial Intelligence
Syllabus
SL No Code Term 1 (5 Months) L T P C

1 DDS 501 Programming for Data Science 2 0 3 3
2 DDS 503 Statistical Techniques for Data Science 2 1 0 3
3 DDS 505 Data Scraping and Wrangling 2 0 3 3
4 DDS 507 Data Analysis and Visualization 2 1 3 4
5 DDS 509 Big Data Technologies 2 0 3 3
6 DDS 511 Machine Learning 2 1 3 4
SL No Code Term 2 (6 Months) L 0 P C
7 DDS 502 Artificial Intelligence 2 0 3 3
8 DDS 504 Elective 1 2 1 3 4
9 DDS 506 Elective 2 2 0 3 3
10 DDS 510 Mini Project - - - 10
Total credits 40
L – Lecture; T – Tutorial; P – Practical; C – Credits

Faculty will decide on the case studies, Assignments and learning activities for each subject.
Electives
SL No Code Elective 1 L T P C
1 DDS 504.1 Financial Services Analytics 2 1 3 4
2 DDS 504.2 Marketing Analytics 2 1 3 4
1 DDS 506.1 Unstructured Data Analysis 2 0 3 4
2 DDS 506.2 Robotic Process Automation 2 0 3 4
1
1 MDA 509.1 H R Analytics 2 1 3 4
2 MDA 509.1 Supply Chain Analytics 2 1 3 4
Term 1
DDS 501 Programming for Data Science
Unit – 1 Unit – 1 Starting Python & Basics of Python Language (4 hours)

What is Python? Why Python for Data Science?
Programming Model of Python. Python Installation, Simple Input/Output, Work with Numbers, Basic
Data Types, Variables, input, Data Types, Control Structures, if Condition, while Loop, for Loop,
break and continue, Arithmetic & Logical operators.
Unit – 2 Python Core Data Structures (4 hours)

Strings, Lists, Tuples, Dictionaries, Sets, List Comprehensions, Regular expressions. Lambda
Functions – Map, Filter, Reduce
Unit – 3 Functions, Modules and Object Oriented Programming (4 hours)

Introduction to Functions, Function Syntax, Introduction to Modules, Create Modules, Importing
Modules, Introduction to Object oriented concepts
Unit – 4 Exception Handling, FILE Input/output (2 hours)

Exception handling, File operations: The open Function, Reading and Writing to Files
Unit 5 – NumPy (8 hours)

NumPy array creation, NumPy datatypes, NumPy indexing, slicing, Basic Reduction, statistical &
Logical operations, Array shape manipulation, Array sorting, copies and views
Unit 6– Introduction to Pandas, Operations in Pandas - Part I (4 hours)

Series creation, Operations on Series, DataFrames creation, Operations on Data frames. Basic
Indexing using.loc, iloc, .ix, Multi Indexing, Boolean Indexing.
Unit 7 - Operations in Pandas - Part II (4 hours)

Grouping of data, Merging and joining data, pivots and reshaping data
References:
1) Python for Data Analysis by Wes McKinney, O’Reilly Publication
2) Python Data science Handbook by Jake VanderPlas , O’Reilly Publication
3) Python Cookbook by Alex Martelli, Anna Martelli Ravenscroft, and David Ascher, O’Reilly
Publication
DDS 503 Statistical Techniques for Data Science
Unit 1: Introduction to Statistics and Linear Algebra (6 Hours)

• Collection, Categorization, and Presentation of Data
• Measures of Central Tendency such as Mean, median, mode
• Measures of Dispersion such as Range, variance, standard deviation
• Scalers, Vectors, Matrices and Operations on Matrices & Vectors - Addition,
Multiplication, Transpose and Inverse
2
Unit 2: Probability (5 Hours)
• Introduction to Probability
• Probability Distributions such as Normal, and Binomial
• Conditional Probability and Bayes Theorem
• Central Limit Theorem
Unit 3: Sampling (3 Hours)

• Introduction to Sampling
• Common Sampling Techniques such as Random Sampling
• Estimation - Sample size and standard error,
• Point Estimates, Interval Estimates, Confidence Intervals
• SMOTE – Under-sampling, Over-sampling
Unit 4: Testing of Hypothesis (5 hours)

• Introduction to Hypothesis Testing
• Parametric test such as t-test, z-test, and ANOVA
• Non-parametric test such as chi-square, Wilcoxson tests, Kolmogorov Simon (KS)
test, Kruskal Wallis test
Unit 5: Data Relationship through Correlation (2 Hour)

• Correlation, Pearson, Kendall, Spearman
• Correlation coefficient
• Correlation cautions due to outliers and causation
Unit 6: Regression (9 Hours)

• Introduction to Regression – Simple & Multiple Regression
• Estimation, Goodness of fit measures, Diagnostics
• Binary Logistic Regression
• Model validation
• Applications
Hands-on assignments to be conducted using Excel / SciPy
References:
1. Richard I. Levin, David S. Rubin, Statistics for Management, Pearson Education
2. Statistics for Management and Economics by Gerald Keller ,Cengage Learning
3. Sampling Techniques by William G Cochran ,Wiley and Sons
4. Bayesian Methods for Management and Business by Eugene D. Hahn, Wiley
5. Guide to Programming and Algorithms Using R, Ozgur Ergul, Springer
6. Python Data Science Essentials – PACKT Publication
7. Learning Python David Asher and Mark Lutz
8. Introduction to Statistical Learning with Applications in R, Springer
DDS 505 Data Scraping and Wrangling
Unit 1: Unit 1: Data Scraping (16 Hours)

• Introduction to Data Scraping and Wrangling
• Types of Data
• Finding data across sources
• Manual Scraping
• Scraping tables such as Wikipedia
• API-based scraping: Querying Twitter API using tweepy
• Browser-based scraping: Automating browsers, Identifying selectors
3
• Server-side scraping: scraping tables such as IMDb
• Scraping across pages
• Tools: Python, Selenium
Unit 2: Data Wrangling (4 Hours)

• Data quality detection
• Types of data quality issues
Unit 3: Introduction to Database Management Systems (8 Hours)

• Introduction to database, RDBMS
• RDBMS terminologies
• Concept of keys in RDBMS
• Conceptual Database Design Entity – Relationship Model: Relationship: Degree of
relationship, Cardinality, Participation, Key features of E-R Model
• Relational databases and SQL (Structured Query Language):
• Basic SQL queries, Integrity constraints on tables
• SQL querying to do operations such as identifying nulls, special characters, blank
rows/columns, get unique counts
• SQL Joins, Aggregate functions and GROUP BY, and sub queries.
• GROUP BY CLAUSE along basic aggregations such as SUM, COUNT, AVG
• RANK (), ROWNUM () & DENSE_RANK.
• UNION and UNION ALL and CASE statement
References:
1) Web Scrapping with Python – Ryan Mitchell, O’Reilly Publishers
2) Data Wrangling with Python – Jacqueline Kazil and Katharine Jarmul, O’Reilly Publishers
3) Automated Data Collection with R, Simon Munzert et al, John Wiley & Sons
4) Database System Concepts, Abraham Silberschatz, Henry F. Korth, and S. Sudarshan
5) Fundamentals of Databases – Elmasri and Navathe.
DDS 507 Data Analysis and Visualization
Unit 1: Introduction to Data Science (2 hours):

• Key components in Data Science
• Use cases from different Domains such as Banking, Retail, Telecom or Healthcare
• Data Science life cycle
• The roles in a Data Science stream
• Challenges involved in Data Science work
• Ethics in Data Science
Unit 2: Data Analysis and Story Telling (6 Hours):

• Characteristics of Data – data at rest, data in motion, data of many types (structured,
unstructured, semi-structured)
• Types of Data Analysis – Descriptive, Exploratory, Predictive, Inferential
• Steps in Data Analysis
• Representing and visualizing multiple variables
• Types of plot - such as Scatter plot, Pie chart, Histogram, Boxplot
• Which graphs are most suitable?
Unit 3: Visualization and Communication using Descriptive Analysis (6 Hours):

• Data and Datasets
4
• Quantitative variables and Qualitative variables
• Data Analysis and using charts, basic and advanced formatting
• Connecting to data and Using Extracts
• Joining and Blending Data
• Building Views
• Visual Analytics; Formatting and dynamic data manipulations
• Data visualization of Numeric data versus Non-numeric data
• Design strategies for information visualization such as Tuftes design principles
Unit 4: Data cleansing and transformation, Building Dashboards (8 Hours):

• Why Data cleansing is important?
• Treating missing values
• Treating outliers and errors
• Data virtualization – a unified view of business entities from multiple sources of data
• Building dashboards and automation
• Adding advanced interactive visualization capabilities – Slicers, Hierarchies, Pivotcharts
Unit 5: Dimension Reduction and Visualisation (8 Hours):

• Challenges of high dimensionality
• Principal Component Analysis, Factor Analysis
• Building views, dashboards
• MatplotLib – Plotting tool in Python, Seaborn
• A visualization tool loke _ Tableau or Power BI will be used.
References
1. An Introduction to Data Science, Jeffrey Stanton, Syracuse University
2. Exploratory Data Analysis with R, Roger Peng
3. Analysing Multivariate Data, James M Lattin; J Douglas Carroll; Paul E Green
4. Hands-On Data Science and Python Machine Learning - , FrankKane, Packt books
DDS 509 Big Data Technologies
Unit 1: Motivation for Big Data (8 Hours)

• What Is Big Data? Big Data Programming Models: Massively Parallel Processing (MPP)
• Database Systems - In-Memory Database Systems - MapReduce Systems - Bulk Synchronous
Parallel (BSP) Systems
• Big Data and Transactional Systems, How Much Can We Scale?
• Big Data Technology options – Hadoop, NoSQL, SAP Hana
• Use-Cases for Big Data, Hadoop Concepts
Unit 2: Hadoop Components (8 Hours)

• Hadoop Framework and Architecture
• Overview of all Hadoop Ecosystem components
• Hadoop Distributed File System (HDFS) Architecture
• HDFS Commands
• MapReduce Architecture
• Word Count Implementation using map reduce
• Hadoop 2.0 - YARN , HDFS High Availability
• Precedence of Hadoop Configuration Files
5
• Preparing the Development Environment
Unit 3: Understanding HBase (2 Hours)

• HBase, Architecture and role of HBase
• HBase schema design, Basic programming for HBase
• Combining the capabilities of HBase and HDFS
• Hbase case Study
Unit 4: Analyzing Data with Hive and Pig (10 Hours)

• Hive Architecture and Concepts
• Data Definition Language
• Data Manipulation Language
• External Interfaces, Hive Scripts
• Performance, MapReduce Integration
• Creating Partitions
• Hive Case Study
• Scripting with Apache pig and its use cases
Unit 5: Sqoop, Flume and Kafka (8 Hours)

Sqoop:
• Objectives of Sqoop
• Preview of MySQL
• Sqoop eval command
• Sqoop simple import
• Incremental Imports
• Sqoop Export
Flume:
• Data Ingestion - From non-traditional data sources
• Flume Agent and configuration files
• Ingest data using Flume - exec source and HDFS sink
Kafka:
• Kafka Architecture
• Producers and consumers
Unit 6: Understanding Spark (4 Hours)

• Spark Architecture
• Spark capabilities such as distributed datasets (RDD)
• In-memory caching
• Interactive shell
Unit 7: Spark programming (10 hours)

• Programming for simple batch jobs in PYSPARK
• Stream processing and machine learning using built-in libraries
• Introduction to SPARK-SQL
• Spark streaming
References:
1. Pro Apache Hadoop, 2nd Edition, Jason Venner, Sameer Wadkar, and Madhu Siddalingaiah
2. Big Data Analytics with R and Hadoop, Vignesh Prajapati
6
DDS 511 Machine Learning
Unit 1: Introduction to Machine Learning and Data Science (2 Hours)

• Introduction to Machine Learning
• Broad classification – Supervised vs Un-supervised Learning
• Overview of Regression
• Use cases of Machine Learning
Unit 2: Classification (6 Hours)

• Classification using Decision Trees, Random Forests
• Classification using Nearest Neighbours
• Classification using Naïve Bayes
• Goodness measures such as confusion matrix

• Ensemble Techniques (Bagging, Boosting, Extreme Boosting)
• Applications discussed with case studies
Unit 3: Validation Measures (4 Hours)

• ROC curves- comparison of distribution function/business measures
• Divergence
• Kolmogorv Smirnov- difference in distribution function
• Gini coefficient/D concordance statistic etc.
Unit 4 Clustering (4 Hours)

• Introduction to Clustering
• K-means, Hierarchical Clustering
• Practical Issues in clustering
• Validation
Unit 5 Recommendation systems (4 Hours):

• Collaborative filtering such as User-based, Item-based, Matrix Factorization
• Evaluation of Recommenders such as Cumulative gain and discounted cumulative gain
• Applications of recommendation systems
• Association Rules such as Apriori Algorithm
Unit 6: Customer Analytics (4 Hours)

• Customer Life Cycle
• Segmentation
• Scoring
• Use cases
Unit 7: Time Series (6 hours)

• Introduction to time series with examples
• Trends and Cyclic Variations, normalization of data
• Stationary processes and ARIMA models
• Exponential, Smoothing, ARCH /GARCH
• Forecasting Time series data,
• Use cases
References:
7
1. Machine Learning with R Edition 2, Brett Lantz
2. Data Mining and Business Analytics with R
4. Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner,
6. Hands-On Data Science and Python Machine Learning - PACKT Publication
7. The Analysis of Time Series – an Introduction by Chris Chatfield, Chapman & Hall/CRC
8. Time Series Analysis: Forecast and Control by Box and Jenkins
Term 2
DDS 502 Artificial Intelligence
Unit1: Introduction to AI (3 Hours)

• Recent Developments using AI: Sophia, AlphaGo and the rebirth of AI;
• Sneak peek into the future;
• Current trends in AI
Unit2: Applications of AI (3 Hours)

• Enterprise Applications of AI-Industries,
• Consumer Applications - Gaming,
• Home Automation with AI
• Understanding Artificial intelligence,
• Machine learning and Deep learning.
• Use case driven comparison of AI, ML and DL
Unit3: Programming with TensorFlow (12 Hours)

• Understanding what is TensorFlow
• Need for tensor flow along with use cases,
• Describe TensorFlow and its basic features
• Creating a computation Graph;
• Variables, Constants and Placeholders,
• evaluate how to create computation graphs in TensorFlow;
• List various programming elements and relate their significance in TensorFlow
• Using tensor flow on machine learning algorithms: Regression and Classification with TF
Unit4: High-level TensorFlow APIs (3 Hours)

• Understanding the high level TensorFlow AIP’s of Estimators,
• Tf. layers and Keras APIs - Overview
• Describe the role and types of high-level TensorFlow APIs - Overview
• Compare different types of high-level TensorFlow APIs - Overview
Unit5: TensorBoard (3 Hours)

• understanding tensorBoard.
• Need for tensorboard.
• Elaborate what TensorBoard is and how it helps evaluate TensorFlow runtime data,
Unit6: Deep Learning (DL) and Reinforcement learning (RL) (6 Hours)

• Introduction and Applications of Deep learning
• Image recognition Using Deep Learning
• Introduction to Reinforcement learning - Overview
• Types of Reinforcement learning: Value based(Q-learning) on policy-based methods -
Overview
8
References:
1. Artificial Intelligence, A Modern Approach: Stuart J. Russell and Peter Norvig
2. Neural Networks – Satish Kumar
3. Neural Networks and Machine Learning – Haykin and Simon
4. Specific papers for deep learning algorithms.
DDS 504.1 Financial Services Analytics
Unit 1: Understanding Financial Services: (6 hours)

• Time Preference Rate and Required Rate of Return
• Present Value and Future Value of Money
• Annuity and Growing Annuity
• Perpetuity
• Applications of Time value of money
• Introduction to Financial Statements (balance sheet, Income Statement, Cash flow
Statements)
• Risk Analysis in Capital Budgeting
• Understanding banking assets and liability products
• Basel Norms
Unit 2: Predictive Analytics (6 hours)

• Valuation of Bonds
• Valuation of Shares
• Understanding and analyzing High Frequency Data
• Time Series Analysis on Stock prediction
• Types and sources of risk
Unit 3: Credit Risk Analytics (6 hours)

• Understanding Credit and its functions
• Understanding Scoring models
• The Value –at –Risk(VaR)
• Probability Default Model
• Loss at Given Default Model
• Exposure at Default models
Unit 4: Customer Relationship Management (CRM) Analytics (6 hours)

• Customer Acquisition Modelling
• Collection and recovery analytics
• Propensity Model
• Cross selling and Up selling analytics
Unit 5 Operational Risk Analytics (6 Hours)
• Internal and External Fraud Analysis

• Regulatory risk analytics
• Cash flow prediction
• Optimizing Cross channel effectiveness
Case-studies will be used to demonstrate the above areas.
9
References:
1. Essentials of Business Analytics by Jeffrey D Camm (Author)
2. Damodaran, A Corporate Finance: Theory and Practice, John Wiley & Sons. 6.
Chandra, P. Financial Management, Tata McGraw Hill.
3. Srivastava, Rajiv and Misra. Anil, Financial Management, Oxford University Press.
4. Ross S.A., R.W. Westerfield and J. Jaffe, Corporate Finance, McGraw Hill.
5. Python for Finance, Yves Hilpisch, Oreilly Publication
DDS 504. 2 Marketing Analytics
Unit 1 Introduction to Marketing (4 Hours)

• Marketing overview
• Different Marketing Items
• Different Types of Markets
Marketing Mix
• 4 P’s of Marketing
• Developing Marketing Strategies
• Competitive Analysis: Porter’s Five Forces Model, PESTEL Analysis
• Product Strategy
• Pricing Strategy
• Place (Distribution Strategy)
• Promotion Strategy
Unit 2 Introduction to Marketing Analytics (4 Hours)
• An overview of Marketing Data and its characteristics
• Preview of different marketing channels data (brief about multiple sources, e.g.
online, social, etc.)
• Integration for modelling purposes.
• Propensity Models
Unit 3 . Price & Promotion Analytics (4 Hours)
• Modelling Price Elasticities: Understanding price-demand curves; Identifying and Optimizing
Price Points
• Modelling Promotional Effectiveness: Measuring Effectiveness of Promotional Measures
such as: i. Discount, ii. Feature, iii. Display;
Unit 4 Forecasting (6 Hours)
• Models for forecasting sales/demand: Understanding and capturing trends, seasonality’s and
cyclic variations. Smoothing Models, ARIMA Models
Unit 5 Conjoint and Scaling/Mapping Analytics (4 Hours)
• Conjoint analysis for market entry, and new product development.
• Positioning Brands/Companies using Multi-Dimensional Scaling, and Perceptual Mappings.
Unit 6 Campaign Analytics (4 Hours)
• Measuring Effectiveness of Advertising: TV, Online, Direct Marketing (Events, Sampling)
• Modelling, and ROI (decomposing and optimization).
Unit 7 Digital Marketing Analytics (4 Hours)
• Introduction to Digital Marketing: Channels (e.g. Social Media) Measurement Metrics.
• Attribution Modelling.
Case-studies will be used to demonstrate the above areas.
References:
1. Marketing Models (Kotler, Lilien, Moorthy)
2. Measuring Marketing: 101 key metrics every marketer needs (Davis)
10
3. Marketing Analytics, Wayne L Winston, Wiley
DDS 506.1 Unstructured Data Analysis
Unit 1: Introduction to Unstructured Data Analysis (8 Hours)

• Introduction to unstructured data,
• Differences in structured and unstructured data,
• Challenges posed due to lack of structure.
• Unstructured data encountered in various applications such as text, speech, multimedia (rich),
web and social media data
• Feature Extraction : text features, speech features, multimedia features, features in web and
social media
o Document Term Matrix, Term Frequency, Inverse Term Frequency
o Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer
o Text Classification Techniques using Vectorizers
• Using Baye’s algorithm for text Classification

• Using ‘Parts of speech’ to provide context to a statement
• Word embedding
Unit 2: Sentiment Analysis and Topic Modelling (12 hrs)

• Using Vader Algorithm for calculating Sentiment polarity
• LDA Techniques
• Classification of words using K-Means clustering
• Document Classification using Cosine Similarity
• Customer Segmentation and Profiling
Unit 3: Introduction to NoSQL Databases (8 Hrs)

• Understanding storage architecture
• Column-oriented databases
• Understanding key-value stores
• Performing operations in a NoSQL database such as MongodB
• Update and deleting data
• Querying data
• Understanding Consistency, Partition tolerance, Availability
• Understanding indexing and aggregation
Unit4: Audio and Video Classification ( 2 hrs)

• Introduction to Audio Data Classification
o Acoustic parameters from audio samples
o Classification & categorisation of audio samples (derive gender, age, singing
capability)
• Introduction to Video Data Classification
o Classification and Categorisation of YouTube videos & Analysing the feedback
comments (Political, Technical, Entertainment, etc.)
References:
1. Tan, Steinbach and Kumar, “Introduction to Data Mining”
2. Camastra and Vinciarelli “Machine Learning for Audio, Video and Image Analysis”
3. MongoDB, The Definitive Guide, O’Reilly
4. Python text processing, Jacob Perkins, PACKT publications
11
DDS 506.2 Robotic Process Automation
UNIT 1: Introduction to RPA (6 Hours)

• What is RPA
• Typical Benefits of RPA
• RPA Concepts and Implementation Approach
• Natural language processing and RPA
• How Robotic Process Automation works with Repetitive tasks
• RPA Solution Architecture Patterns
• Data Handling in RPA
•
UNIT 2: RPA Functionalities: (using open source / proprietary RPA tool) (12 Hours)
• Features and Benefits
• Using Task Editor
• Types of Variables
• Recording an Automation Task
• Recording, Editing and Running Tasks
• Creating an Automation Task
• Recording Web Actions with Web Recorder
• Extracting Data from Websites, Web Data, Pattern-Based Data, Table Data,
• Standard Recorder, Object Recorder
• Viewing and Setting General Properties
• Setting up Hotkeys for a Task and security Features
• Scheduling Tasks to Run, Adding Triggers to a Task and Run Remotely
UNIT 3: Implementation of Functionalities (12 Hours)

• Automation - Email Automation, FTP Automation and PDF Integration,
• Web Recorder with Database Automation
• Using MetaBots, Web Recorder, Smart Recorder
References:
1. Learning Robotic Process Automation: Create Software robots and automate business processes
with the leading RPA tool – UiPath by, Alok Mani Tripathi
2. Robotic Process Automation Tools, Process Automation and their benefits: Understanding RPA
and Intelligent Automation by, Srikanth Merianda (Author), Kiwa K (Editor)
DDS 510 Project / Internship

(Mini Project)
Students can undergo project in a company in the second term and submit a report based on the
project. Students will have milestones on the project, which they would be required to submit to the
academy at regular intervals as notified by the Director/ Head. The project will have a midterm review
and a final review (Viva Voce) during which the students are expected to present it in front of the
review panel and submit a report based on which they are evaluated.
12

MAHE Manipal Academy of Higher Education Data Science and AI Syllabus

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MAHE Manipal Academy of Higher Education Data Science and AI Syllabus

Uploaded by

Copyright:

Available Formats

MANIPAL ACADEMY OF HIGHER EDUCATION, MANIPAL (MAHE)

Centre for Executive Education (CEE)

SL No Code Term 1 (5 Months) L T P C

L – Lecture; T – Tutorial; P – Practical; C – Credits

DDS 501 Programming for Data Science

Unit – 1 Unit – 1 Starting Python & Basics of Python Language (4 hours)

Unit – 2 Python Core Data Structures (4 hours)

Unit – 3 Functions, Modules and Object Oriented Programming (4 hours)

Unit – 4 Exception Handling, FILE Input/output (2 hours)

Unit 5 – NumPy (8 hours)

Unit 6– Introduction to Pandas, Operations in Pandas - Part I (4 hours)

Unit 7 - Operations in Pandas - Part II (4 hours)

DDS 503 Statistical Techniques for Data Science

Unit 1: Introduction to Statistics and Linear Algebra (6 Hours)

Unit 3: Sampling (3 Hours)

Unit 4: Testing of Hypothesis (5 hours)

Unit 5: Data Relationship through Correlation (2 Hour)

Unit 6: Regression (9 Hours)

DDS 505 Data Scraping and Wrangling

Unit 1: Unit 1: Data Scraping (16 Hours)

Unit 2: Data Wrangling (4 Hours)

Unit 3: Introduction to Database Management Systems (8 Hours)

DDS 507 Data Analysis and Visualization

Unit 1: Introduction to Data Science (2 hours):

Unit 2: Data Analysis and Story Telling (6 Hours):

Unit 3: Visualization and Communication using Descriptive Analysis (6 Hours):

Unit 4: Data cleansing and transformation, Building Dashboards (8 Hours):

Unit 5: Dimension Reduction and Visualisation (8 Hours):

DDS 509 Big Data Technologies

Unit 1: Motivation for Big Data (8 Hours)

Unit 2: Hadoop Components (8 Hours)

Unit 3: Understanding HBase (2 Hours)

Unit 4: Analyzing Data with Hive and Pig (10 Hours)

Unit 5: Sqoop, Flume and Kafka (8 Hours)

Unit 6: Understanding Spark (4 Hours)

Unit 7: Spark programming (10 hours)

Unit 1: Introduction to Machine Learning and Data Science (2 Hours)

Unit 2: Classification (6 Hours)

• Goodness measures such as confusion matrix

Unit 3: Validation Measures (4 Hours)

Unit 4 Clustering (4 Hours)

Unit 5 Recommendation systems (4 Hours):

Unit 6: Customer Analytics (4 Hours)

Unit 7: Time Series (6 hours)

DDS 502 Artificial Intelligence

Unit1: Introduction to AI (3 Hours)

Unit2: Applications of AI (3 Hours)

Unit3: Programming with TensorFlow (12 Hours)

Unit4: High-level TensorFlow APIs (3 Hours)

Unit5: TensorBoard (3 Hours)

Unit6: Deep Learning (DL) and Reinforcement learning (RL) (6 Hours)

DDS 504.1 Financial Services Analytics

Unit 1: Understanding Financial Services: (6 hours)

Unit 2: Predictive Analytics (6 hours)

Unit 3: Credit Risk Analytics (6 hours)

Unit 4: Customer Relationship Management (CRM) Analytics (6 hours)

Unit 5 Operational Risk Analytics (6 Hours)

• Internal and External Fraud Analysis

Case-studies will be used to demonstrate the above areas.

DDS 504. 2 Marketing Analytics

Unit 1 Introduction to Marketing (4 Hours)

Case-studies will be used to demonstrate the above areas.