Professional Documents
Culture Documents
CPEE Big Data Analytics and Optimization Curriculum
CPEE Big Data Analytics and Optimization Curriculum
and Optimization
Ce r t ifi ca t e P ro gr a m i n E n gin e e r in g E xce l l en ce
LIST OF COURSES
Essential Business Skills for a Data Scientist ..................................................................................................... 3
Planning and Thinking Skills for Architecting Data Science Solutions .............................................................. 4
Essential Engineering Skills in Big Data Analytics ............................................................................................. 5
Statistical Modeling for Predictive Analytics in Engineering and Business ...................................................... 6
Engineering Big Data with R and Hadoop Ecosystem....................................................................................... 7
Text Mining, Social Network Analysis and Natural Language Processing ........................................................ 8
Methods and Algorithms in Machine Learning ................................................................................................ 9
Optimization and Decision Analysis ............................................................................................................... 10
Communication, Ethical and IP challenges for Analytics Professionals ......................................................... 11
http://www.insofe.edu.in
CSE 7110c
Why should we build models or use data to run a business: The edge of evidence
over intuition
What kind of models do data scientists build and where they do not work
When you want a prediction
o How do you estimate how much to pay and how long to wait
o How do you precisely define for the teams what to deliver
o How do you evaluate how good their prediction are
When does big unstructured data become really important
When you want to build an analytics group
o What software or hardware should you invest in
o Several engagement models and the ideal teams for each
Business plan: Each team develops a business plan for setting up an analytics
organization, and creates a complete business plan and presents.
Case analysis: Participants would be divided into separate teams and would be
given several high level business problems. They have to identify the prediction
problems with high ROI and provide concise requirements
http://www.insofe.edu.in
CSE 7111c
Thinking tools
o Approximations and estimations
o Geometric visualization of data and models
o Probabilistic analysis of data and models
o Analyzing networks and graphs
o Analyzing transitions, Markov chains and unstructured data
o Estimating complexity of algorithms
Choosing the right models and architecting a solution
o Structure and anatomy of models
o Problematic data and choosing the best experimentation
Sources of errors in predictive models and techniques to minimize them
Interacting with technical and business teams
o Translating typical business problems into technical specifications
o Brainstorming and analyzing data and designing transformations
o Manual analysis of the models
Case study: Participants will be given business problems. They need to:
o Translate it into a specific technical solution
o Brain storm for data and design transformations
o Architect complete solution plan
http://www.insofe.edu.in
CSE 7212c
http://www.insofe.edu.in
CSE 7302c
Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range,
Variance, Standard Deviation); Expectations of a Variable; Moment Generating Functions
Describing an attribute: Probability distributions (Discrete and Continuous) - Bernoulli, Geometric,
Binomial and Poisson distributions
Describing the relationship between attributes: Covariance; Correlation; ChiSquare
Describing a single variable continued: Exponential distribution; Special emphasis on Normal
distribution; Central Limit Theorem
Inferential statistics: How to learn about the population from a sample and vice versa; Sampling
distributions; Confidence Intervals, Hypothesis Testing
ANOVA
Regression (Linear, Multivariate Regression) in forecasting
Analyzing and interpreting regression results
Logistic Regression
http://www.insofe.edu.in
CSE 7304c
SQL, Sqoop, Hive, Hive variants like Impala, Spark and Storm
http://www.insofe.edu.in
CSE 7206c
http://www.insofe.edu.in
CSE 7305c
Rule based knowledge: Logic of rules, evaluating rules, Rule induction and association rules
Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each
non-leaf node; Entropy; Information Gain
Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with numerical variables;
Other measures of randomness
Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as rules
Specialized decision trees (oblique trees),
Ensemble and Hybrid models
AdaBoost, Random Forests and Gradient boosting machines
K-Nearest Neighbor method
Wilson editing and triangulations
K-nearest neighbors in collaborative filtering, digit recognition
Motivation for Neural Networks and its applications
Perceptron and Single Layer Neural Network, and hand calculations
Learning in a Neural Net: Back propagation and conjugant gradient techniques
Application of Neural Net in Face and Digit Recognition
Linear learning machines and Kernel methods in learning
VC (Vapnik-Chervonenkis) dimension; Shattering power of models
Algorithm of Support Vector Machines (SVM)
Connectivity models (hierarchical clustering)
Centroid models (k-means algorithm)
Distribution models (expectation maximization)
Trend analysis and Time Series
Cyclical and Seasonal analysis; Box-Jenkins method
Smoothing; Moving averages; Auto-correlation; ARIMA Holt-Winters method
Bayesian analysis and Nave Bayes classifier
http://www.insofe.edu.in
CSE 7213c
http://www.insofe.edu.in
CSV 1103
http://www.insofe.edu.in