You are on page 1of 5

CSE3041 Programming for Data Science LTP J C

0 0 6 0 3
Pre-requisite Syllabus
version
v. 1.0
Course Objectives:
1. To provide necessary knowledge on how to manipulate data objects using python and R
2. To Provide knowledge on how to analyze the data graphically,
3. Emphasize on different statistical methods and ways to analyze data using python and R.
4. Provide solid understanding of programming in Scala

Course Outcomes:
Upon completion of the course, the students will be able to
1. Use Python and R programming languages and Python libraries such as Pandas,
Numpy, Scipyetc., for solving analytical problems.
2. Import, export, visualize and manipulate the continuous and categorical data
effectively usingPython and R
3. Solve problems using Scala functional programming language
Session:1 2 hours
Expressions, Operators, matrices, Decision Statements in python

Session:2 2 hours
Control Flow and Functions in python

Session:3 2 hours
Classes, Objects, Packages and Files in python

Session:4 2 hours
Strings,List,Tuple, Dictionaries, Comprehensions.

Session:5 2 hours
Numpy Arrays objects, Creating Arrays, basic operations, Indexing, Slicing and
iterating, copyingarrays, shape manipulation, Identity array, eye function, Universal
function

Session:6 2 hours
Linear algebra with Numpy,

Session:7 2 hours
eigen values and eigen vectors with Numpy

Session:8 2 hours
Linear algebra using SciPy and basic functionality of SciPy

Session:9 2 hours
Pandas series Object, Pandas data Frame

Session:10 2 hours
Pandas Objects: Data Aggregation and Joining
Session:11 2 hours
Pandas Object: Concatenating and appending data frames, index objects

Session:12 2 hours
Data Data Wrangling With Pandas

Session:13 2 hours
Handling Time series data using pandas

Session:14 2 hours
Handling missing values using pandas

Session:15 2 hours
Reading and writing the data including JSON data

Session:16 2 hours
Web scraping using python, Combining and merging datasets

Session:17 2 hours
Data transformations

Session:18 2 hours
Common plots for statistical analysis using matplotlib, seaborn, etc

Session:19 2 hours
common plots for statistical analysis using ggplot, ggvis, etc in python

Session:20 2 hours
common plots for statistical analysis using Plotly, Altair etc in python

Session:21 2 hours
Linear algebra using SciPy and basic functionality of SciPy

Session:22 2 hours
Data types, Sequence generation, Vector and subscript, Random number generation, Data frames in
R

Session:23 2 hours
R functions, Data manipulation and Data Reshaping using plyr, dplyr, reshape2

Session:24 2 hours
Parametric statistics and Non-parametric statistics, Continuous and Discrete Probability distribution
using R, Correlation and covariance, contingency tables

Session:25 2 hours
Overview of Sampling, different sampling techniques

Session:26 2 hours
R and data base connectivity

Session:27 2 hours
Web application development with R using Shiny and Approaches to dealing with missing data in R

Session:28 2 hours
Exploratory data analysis with simple visualizations using R

Session:29 2 hours
Feature or Attribute selection using R

Session:30 2 hours
Dimensionality Reduction with R

Session:31 2 hours
Time series data analysis with R

Session:32 2 hours
Variables, types, Literals, Operators in scala

Session:33 2 hours
Classes and objects

Session:34 2 hours
Functional objects: choosing between val and var, class parameters, constructors, self references,
method overloading in scala

Session:35 2 hours
Conditional and loop statements in scala

Session:36 2 hours
Functions in scala

Session:37 2 hours
Control abstraction in scala

Session:38 2 hours
Composition and Inheritance

Session:39 2 hours
Traits and Mixins

Session:40 2 hours
File IO in scala
Session:41 2 hours
Case Classes and Pattern Matching

Session: 42 2 hours
Packages and imports in Scala

Session: 43 2 hours
Working with Lists and Collections in Scala

Session: 44 2 hours
Working with XML, Implementing List

Session: 45 2 hours
Extractors and objects as modules

Total hours: 90 hours

Reference Books
1. James Payne, “Beginning Python: Using Python 2.6 and Python 3.1” Wrox, Ist Edition,
2010
2. Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser, “Data Structures
andAlgorithms in Python”, John Wiley & sons, 2013.
3. Ivan Idris, “Python Data Analysis”, Packt Publishing Limited, 2014
4. Wes McKinney, “Python for Data Analysis Data Wrangling with Pandas, NumPy, and
IPython”,
O'Reilly Media, Ist Edition, 2012
5. Michael Heydt, “Learning Pandas - Python Data Discovery and Analysis Made
Easy”, Packt
Publishing Limited , 2015.
6. Jacqueline Kazil , Katharine Jarmul, “Data Wrangling with Python: Tips and Tools
to Make
Your Life Easier”, O'Reilly Media, Ist Edition, 2016.
7. https://docs.scipy.org/doc/numpy-dev/reference/index.html#reference
8. http://www.python-course.eu/numpy.php
9. Michael J. Crawley, “The R Book”, Wiley, 2nd Edition, 2012.
10. Robert Kabacoff, “R in Action”, Manning Publication, Ist Edition, 2011.
11. Torsten Hothorn, Brian S. Everitt, “A Handbook of Statistical Analyses Using R”,
Chapman andHall_CRC, 2nd Edition, 2009.
12. Chris Beeley "Web Application Development with R Using Shiny", Pact Publishing,
2013.
13. Phil Spector, “Data Manipulation with R”, Springer, 2008.
14. Prabhanjan N. Tattar, Suresh Ramaiah, B. G. Manjunath, “ A Course in Statistics with
R”, wiley,
2016
15. Pawel Cichosz, “Data Mining Algorithms: Explained Using R”, wiley, 2014
16. Bater Makhabel, “Learning Data Mining with R”, Packt Publication, 2015
17. Martin Odersky, Lex Spoon, and Bill Venners, “Programming in Scala”, 3rd Edition
18. Alvin J. Alexander “Learning Functional Programming in Scala”, 2017
Mode of Evaluation: Continuous Assessment, Final Assessment Test
Recommended by Board of Studies 11.09.2019
Approved by Academic Council 56 Date 20.09.2019

You might also like