Professional Documents
Culture Documents
1. Introduction
Data science combines the knowledge of mathematics, computer science and statistics to
solve exciting data-intensive problems in industry and in many fields of science. Data
scientists help organizations make sense of their data. As data is collected and analyzed in
all areas of society, demand for professional data scientists is high and will grow higher.
It is a full-time course of Two year and four semesters in which the last semester will be
Industrial training.
PO2 Critical Thinking and Problem solving: Exhibit the skill of critical thinking and
understand scientific texts and place scientific statements and themes in contexts and
also evaluate them in terms of generic conventions. Identify the problem by observing
the situation closely, take actions and apply lateral thinking and analytical skills to
design the solutions.
PO3 Social competence: Exhibit thoughts and ideas effectively in writing and orally;
communicate with others using appropriate media, build effective interactive and
presenting skills to meet global competencies. Elicit views of others, present complex
information in a clear and concise and help reach conclusion in group settings.
PO4 Research-related skills and Scientific temper: Infer scientific literature, build sense of
enquiry and able to formulate, test, analyse, interpret and establish hypothesis and
research questions; and to identify and consult relevant sources to find answers. Plan
and write a research paper/project while emphasizing on academics and research ethics,
scientific conduct and creating awareness about intellectual property rights and issues of
plagiarism.
PO7 Effective Citizenship and Ethics: Demonstrate empathetic social concern and equity
centred national development, and ability to act with an informed awareness of moral
and ethical issues and commit to professional ethics and responsibility.
PSO2 Personal and Professional Competence: (i) Apply laboratory-oriented problem solving
and be capable in data visualization and interpretation. (ii) Solve case studies by applying
various technologies, comparing results and analysing inferences. (iii) Develop problem
solving approach and present output with effective presentation and communication skills
PSO3 Research Competence: (i) Design and develop tools and algorithms. (ii) contribute in
existing open sources platforms (iii) Construct use case based models for various domains
for greater perspective
PSO4 Entrepreneurial and Social competence: (i) Cater to/ provide solutions particular
domain specific problems by having in depth domain knowledge (ii) Exposure to emerging
trends and technologies to prepare students for industry (iii) Develop skills required for
social interaction.
Programme Structure
Note: Following listed concepts should be explained and executed using large datasets
with the help of R.
I Descriptive Statistics:
1.1 Measures of Central Tendency: Mean, Median, Mode
1.2 Partition Values: Quartiles, Percentiles, Box Plot
1.3 Measures of Dispersion: Variance, Standard Deviation, Coefficient
of variation
1.4 Skewness: Concept of skewness, measures of skewness
1.5 Kurtosis: Concept of Kurtosis, Measures of Kurtosis
(All topics to be covered for raw data using R software. Manual calculations
are not expected.)
II Introduction to Probability:
2.1 Probability - classical definition, probability models, axioms of
probability, probability of an event.
2.2 Concepts and definitions of conditional probability, multiplication
theorem P(A∩B) =P(A). P(B|A)
2.3 Bayes’ theorem (without proof)
2.4 Concept of Posterior probability, problems on posterior probability.
2.5 Definition of sensitivity of a procedure, specificity of a procedure.
Application of Bayes’ theorem to design a procedure for false positive and
false negative.
2.6 Concept and definition of independence of two events.
2.7 Numerical problems related to real life situations.
IV Special Distributions
4.1 Binomial Distribution
4.2 Uniform Distribution
4.3 Poisson Distribution
4.4 Negative Binomial Distribution
4.5 Geometric Distribution
4.6 Continuous Uniform Distribution
4.7 Exponential Distribution
4.8 Normal Distribution
4.9 Log Normal Distribution
4.10 Gamma Distribution
4.11 Weibull Distribution
4.12 Pareto Distribution
(For all the probability distributions its pmf/pdf, p-p plot, q-q plot,
generation of probabilities and random samples using R software is
expected. )
regression.
5.5 Fitting of line Y = a+bX
5.6 Concept of residual plot and mean residual sum of squares.
5.7 Multiple correlation coefficient, concept, definition, computation
and interpretation.
5.8 Partial correlation coefficient, concept, definition, computation and
interpretation.
5.9 Multiple regression plane.
5.10 Identification and solution to Multicollinearity
5.11 Evaluation of the Model using R square and Adjusted R square
All topics to be covered for raw data using R software. Manual calculations
are not expected.
VI Logistic Regression
6.1 Introduction to logistic regression
6.2 Difference between linear and logistic regression
6.3 Logistic equation
6.4 How to build logistic regression model in R
6.5 Odds ratio in logistic regression.
Learning Resources:
1. Fundamentals of Applied Statistics (3rd Edition), Gupta and Kapoor, S.Chand and
Sons, New Delhi, 1987.
2. An Introductory Statistics, Kennedy and Gentle.
3. Statistical Methods, G.W. Snedecor, W.G. Cochran, John Wiley & sons, 1989.
4. Introduction to Linear Regression Analysis, Douglas C. Montgomery, Elizabeth A.
Peck, G. Geoffrey Vining, Wiley
5. Modern Elementary Statistics, Freund J.E., Pearson Publication, 2005.
6. Probability, Statistics, Design of Experiments and Queuing theory with applications
Computer Science, Trivedi K.S., Prentice Hall of India, New Delhi,2001.
7. A First course in Probability 6th Edition, Ross, Pearson Publication, 2006.
8. Introduction to Discrete Probability and Probability Distributions, Kulkarni M.B.,
Ghatpande S.B., SIPF Academy, 2007.
9. A Beginners Guide to R, Alain Zuur, Elena Leno, Erik Meesters, Springer, 2009
10. Statistics Using R, Sudha Purohit, S.D.Gore, Shailaja Deshmukh, Narosa, Publishing
Company
CO4 Explain and analyze basic algorithms for massive data problems
CO5 Determine eigenvalues and eigenvectors of a given matrix and apply the
concept for various methods of matrix factorization.
CO6 Perform different matrix operations and integrate them to solve complex
data science problems.
I Vectors
Vector: Vector addition, Scalar Vector multiplication, Inner Product,
Complexity of Vector Computations
Linear Functions: Linear Functions, Taylor Approximation, Regression
Model
Norms and Distance: Norm distance, Standard deviation, Angle,
Complexity
Clustering: Clustering, a clustering Objective, The K means algorithm,
Examples and Applications
Linear Independence: Linear Dependence, Basis, Orthonormal Vectors,
Gram Smith algorithm
II Matrices
Matrices: Introduction to Matrices, Zero and identity Matrices, Transpose,
addition and norm, Matrix Vector Multiplication, Complexity
Matrix Examples: Geometric Transformation, Selectors, Incidence Matrix
and Convolution
Linear Equations: Linear and affine functions, Linear function models,
System of Linear Equations
Matrix Multiplication: Matrix Multiplication, Composition of Linear
Functions, Matrix Power and QR Factorization
Matrix Inverses: Left and right inverses, Inverse, Solving Linear
Equations,
Examples, Pseudo Inverse.
Eigen values, Eigen vectors , orthogonalization
Learning Resources:
CO2 Explain programming constructs and apply them to build and package
python modules for reusability.
CO3 Use various data structures to gain suitable knowledge about their
implementation.
CO5 Evaluate patterns , compile expressions and write scripts to extract data.
I Introduction To Python
1.1 Introduction
1.2 Various IDEs
II Data Types
2.1 Numeric data types: int, float, complex
2.2 String, list and list slicing
2.3 Tuples
3.4 Packages
3.5 Programming using functions, modules and external packages
IV Data Structures
4.1 Lists as Stacks, Queues, Comprehensions
4.2 Tuples and sequences
4.3 Sets
4.4 Dictionaries
IX WEB Scraping
9.1 Basics of scraping,
9.2 Scrape HTML Content From a static / dynamic pages
9.3 Parsing HTML code using packages like request and Beautiful soup
Learning Resources:
1. Learning Python, O’Reilly publication
2. Programming Python, O’Reilly publication
3. https://docs.python.org/3/tutorial/
4. https://realpython.com/beautiful-soup-web-scraper-python
CO3 Apply the basic and advanced concepts of SQL language to solve the
queries in the databases.
CO4 Analyse database requirements and determine the entities involved in the
system and their relationship.
CO5 Compare traditional relational databases and NoSQL stores and explain
types of NoSQL databases.
I Introduction
1.1 Database-system Applications
1.2 Purpose of Database Systems
1.3 View of Data-Data Abstraction, Instance and Schemas
1.4 Relational Databases: Tables, DML, DDL
1.5 Data storage and querying: Storage Manager, The query processor
1.6 Database Architecture
1.7 Speciality Databases
Learning Resources:
CO1 Describe concepts of Data Science and its specialised branches. State the use
of the R and R-Studio’s interactive environment
CO3 Apply the data manipulation and transformation techniques to prepare data
for further processing.
CO4 Analyze the nature of data with help of statistical methods, different tools and
visualization techniques.
CO6 Write R scripts to solve complex business problems from different domains.
5 Visualizations in R – 1
6 Visualizations in R – 2
10 Case Study
Title of the Data Science Practical - II (Data Structures and RDBMS) Number of
Course and (CSD4106) Credits : 04
Course Code
CO1 Identify the concepts of Data structures and RDBMS to design solutions for
different types of problems.
CO2 Explain the use of data structures and stored functions, views.
CO4 Analyse/explain database application scenarios in the form of E-R and transform the
ER-model to relational tables.
CO5 Test and validate the outputs of Data structures programs and SQL queries.
CO6 Write SQL queries to implement DDL, DML commands on relational databases to
create and manipulate the table data.
4 File Handling
9 Joins
11 Case Study
CO1 Identify sampling methods from the pattern of the observed data.
CO4 Analyze sample data and identify the parameters and their probability
distributions.
CO5 Validate the hypothesis to ensure that the entire research process remains
scientific and reliable.
I Sampling
1.1 Introduction to Sampling
1.2 Simple random Sampling
1.3 Stratified Random Sampling
1.4 Cluster Sampling
1.5 Concept of Sampling Error
II Sampling Distributions
2.1 Introduction to Sampling distributions
2.2 Student’s t distribution
2.3 Chi square distribution
2.4 Snedecor’s F distribution
2.5 Interrelations among t, chi-square and F distributions
2.6 Central Limit Theorem (Various Versions) and its applications.
IV Analysis of Variance
4.1 One Way ANOVA
4.2 Two Way ANOVA
4.3 Application of ANNOVA to test the overall significance of
Regression.
All topics to be covered using R software. Manual calculations are not
expected.
Time Series
5.1 Meaning and Utility.
5.2 Components of Time Series.
V 5.3 Additive and Multiplicative models.
5.4 Methods of estimating trend: moving average method, least squares
method and exponential smoothing method. (single, double and triple)
5.5 Elimination of trend using additive and multiplicative models.
5.6 Simple time series models: AR (1), AR (2).
5.7 Introduction to ARIMA Modelling.
Learning Resources:
1. Fundamentals of Applied Statistics (3rd Edition), Gupta and Kapoor, S.Chand and
Sons, New Delhi, 1987.
2. Time Series Methods, Brockell and Devis, Springer, 2006.
3. Time Series Analysis,4th Edition, Box and Jenkin, Wiley, 2008.
4. Modern Elementary Statistics, Freund J.E., Pearson Publication, 2005.
5. Probability, Statistics, Design of Experiments and Queuing theory with applications
Computer Science, Trivedi K.S., Prentice Hall of India, New Delhi,2001.
6. Common Statistical Tests, Kulkarni M.B., Ghatpande S.B., Gore S.D., Satyajeet
Prakashan,Pune, 1999.
7. Probability and Statistical Inference, 9th Edition, Robert Hogg, Elliot Tanis, Dale
Zimmerman, Pearson education Ltd, 2015
8. A Beginners Guide to R, Alain Zuur, Elena Leno, Erik Meesters, Springer, 2009
9. Statistics Using R, Sudha Purohit, S.D.Gore, Shailaja Deshmukh, Narosa, Publishing
Company
CO2 Distinguish stochastic process for continuous or discrete time and state
space.
CO4 Integrate the knowledge of SVD to find the best k-dimensional subspace.
II Least Squares
Least Squares: Least Squares Problem, Solution, Solving Least Squares
Problems, Examples.Least squares data fitting: Least Squares data fitting,
Validation, Feature Engineering. Least Squares Classification:
Classification, Least Squares Classifier, Estimation and Inversion,
Regularised data fitting, Complexity Constrained Least Squares:
Constrained Least Squares problem, Solution, Solving constrained Least
Squares problems.Constrained Least Squares Applications: Portfolio
Optimization, Linear Quadratic control, Linear Quadratic State Estimation.
III
Graph Theory ,Random Graphs , Random Walks and Markov Chains:
IV
Building Blocks of Machine Learning:
Learning Resources:
1. Foundations of Data Science: Alvin Blum, John Hopcroft and Ravindran Kannan
CO1 Define a problem to find appropriate solutions in the field of data science
and other interdisciplinary areas.
CO2 Classify and apply machine learning techniques to solve real world
problems.
CO6 Construct use case based models by analyzing datasets from various
domains.
IV Classification Techniques
4.1 Decision Tree
4.2 SVM
4.3 Naïve Bayes
4.4 KNN
V Clustering
5.1 K means clustering
5.2 Association Rule Mining
5.3 Apriori Algorithm
VI Model Evaluation
6.1 Introduction
6.2 Performance Measures
6.3 Confusion Matrix
Learning Resources:
1. Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and Techniques, 3rd
Edition
2. Margaret H. Dunham, S. Sridhar, Data Mining - Introductory and Advanced Topics,
Pearson Education 5. Tom Mitchell, Machine Learning‖, McGraw-Hill, 1997
3. R.O. Duda, P.E. Hart, D.G. Stork., Pattern Classification, Second edition. John Wiley
and Sons, 2000.
4. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer 2006 8.
Ian H. Witten, Data Mining: Practical Machine Learning Tools and Techniques, Eibe Frank
Elsevier / (Morgan Kauffman)
5. Bing Liu: Web Data Mining: Exploring Hyperlinks, Contents and Usage Data,
Springer (2006).
6. Soumen Chakrabarti: Mining the Web: Discovering knowledge from hypertext data,
Elsevier (2003).
7. Christopher D Manning, Prabhakar Raghavan and Hinrich Schütze: An Introduction
to Information Retrieval, Cambridge University Press (2009)
CO5 Evaluate and select algorithmic design paradigms and methods of analysis.
I Introduction
Definition of Algorithm & its characteristics, Recursive and Non-recursive
Algorithms, Time & Space Complexity, Definitions of Asymptotic
Notations, Insertion Sort (examples and time complexity), Heaps & Heap
Sort (examples and time complexity)
IV Dynamic Programming
The General Method, Principle of Optimality, Matrix Chain Multiplication,
0/1 Knapsack Problem, Concept of Shortest Path, Single Source shortest
path, Dijkstra’s Algorithm, Bellman Ford Algorithm, Floyd- Warshall
Algorithm, Travelling Salesperson Problem
Learning Resources:
CO1 Outline different basics and techniques of soft computing and its
importance.
IV Membership functions
4.1 Features of Membership Functions
4.2 Standard Forms and Boundaries
4.3 Fuzzification methods
4.4 Problems on Inference method of Fuzzification
Learning Resources:
1. S. N. Sivanandam, S. N. Deepa, Principles of Soft Computing (With CD),
ISBN:9788126527410, Wiley India
2. Timothy J Ross, Fuzzy Logic: With Engineering Applications, ISBN: 978-81-265-
3126- Wiley India, Third Edition
3. Kumar Satish, Neural Networks: A Classroom Approach, ISBN:9780070482920,
2008 reprint, 1/e TMH
4. David E. Goldberg, Genetic Algorithms in search, Optimization & Machine Learning,
ISBN:81-7808-130-X, Pearson Education
Title of the Data Science Practical - III (Machine Learning using R) Number of
Course and (CSD4207) Credits : 04
Course Code
CO3 Interpret the data using data pre-processing techniques in machine learning
CO4 Analyze different machine learning models to get better accuracy and
results.
CO6 Construct models using machine learning algorithms and compare the
results.
1 Data Preprocessing – I
2 Data Preprocessing – II
Title of the Data Science Practical - IV (Python for Data Science) Number of
Course and (CSD4208) Credits : 04
Course Code
CO3 Apply the data transformation and data manipulation operations using
“pandas”.
CO4 Analyze nature of data with help of different tools and visualization
techniques.
CO6 Write scripts, follow data pipelining, build models and measure accuracy to
communicate the observations.
1.3 Aggressions
3.3 Specialized Visualization Tools – pie chart, Box plot, scatter plot, Bobble
plot
4.6 Tokenizing
V Data Pipelining
Learning Resources: