You are on page 1of 18

School of Computing

Science and Engineering

Program: B.Tech Specialization


Course Code: BCSE3092
Course Name: Data Science
School of Computing Science and Engineering
Course
ourse Code
Code :: BCSE3092
BCSE3092 Course
Course Name:
Name: Data
Data Science
Science

Course Outcomes :
• CO1 To acquire good introducing knowledge of the essentials in
Statistical
Fundamentals used in Data science.(K1)
• CO2 Develop an ability to apply algorithmic principles and Programing
knowledge
using Python and R language on Data Science .(K2,K3)
• CO3 Develop ability to visualise the data for Analysis. (K4)
• CO4 Apply and Implement ML principles using Probability and Statistics
(K5)
• CO5 Understating and Recommending statistics and Machine learning
solutions (K6)
• CO6 Gaining Research insights and latest solutions provided by
researchers(K6)
Program Name: BCSE3029 Program Code: Data Science
School of Computing Science and Engineering
Course Code : BCSE3092 Course Name: Data Science

Course Prerequisites

• PYTHON BASICS
• STATISTICS
• LINEAR ALGEBRA

Program Name: BCSE3029 Program Code: Data Science


School of Computing Science and Engineering
Course Code : BCSE3092 Course Name: Data Science

Syllabus

Program Name: BCSE3029 Program Code: Data Science


School
School of
of Computing
Computing Science
Science and
and Engineering
Engineering
C
Course
ourse Code
Code :: BCSE3092
BCSE3092 Course
Course Name:
Name: Data
Data Science
Science

Recommended Books
Text books

Reference Book

Program Name: BCSE3029 Program Code: Data Science


CSV (comma-separated value) files are a
common file format for transferring and storing
data.
The ability to read, manipulate, and write data
to and from CSV files using Python
how to read CSV files into Pandas DataFrames,
and how to write DataFrames back to CSV files
post analysis.
• Pandas is the most popular data manipulation
package in Python, and DataFrames are the
Pandas data type for storing tabular 2D data.
Debug the operation of the data loading procedure if you run into
issues:

• Understanding file extensions and file types – what do the letters


CSV actually mean? What’s the difference between a .csv file and
a .txt file?
• Understanding how data is represented inside CSV files – if you
open a CSV file, what does the data actually look like?
• Understanding the Python path and how to reference a file – what
is the absolute and relative path to the file you are loading? What
directory are you working in?
• CSV data formats and errors – common errors with the function.
• Creating Pandas DataFrames
We’ll examine two methods to create a
DataFrame – manually, and from comma-
separated value (CSV) files.
Manually entering data
Print the data

You’ll notice that Pandas displays only 20 columns by default for wide
data dataframes, and only 60 or so rows.

you can edit the defaults using some internal options for Pandas displays
simple use 
• pd.display.options.XX = value to set these):

pd.display.options.width – the width of the display in characters – use this

if your display is wrapping rows over more than one line.


• pd.display.options.max_rows – maximum number of rows displayed.
• pd.display.options.max_columns – maximum number of columns
displayed.
Loading CSV data into Pandas

• Creating DataFrames from CSV (comma-


separated value) files is made extremely simple
with the read_csv() function in Pandas
• columns are separated using the ‘,’ comma
character, and rows are on separate lines.
• If we use Excel (XLS / XLSX) file, you can look at
the other functions to read from these sources
into DataFrames, namely read_xlsx, read_sql.
DataFrame rows and columns with shape

Get the shape of your DataFrame – the number of rows and columns using
.shape, and the number of dimensions using .ndim.
Preview DataFrames with head() and tail()

The DataFrame.head() function in Pandas, by


default, shows you the top 5 rows of data in the
DataFrame.
The opposite is DataFrame.tail(), which gives you
the last 5 rows.
Data types (dtypes) of columns
Data types of each column in your dataframe using the .dtypes property.
Notes that character/string columns appear as ‘object’ datatypes.
Describing data with .describe()

• For numeric columns, describe() returns basic


statistics.
• For string columns, describe() returns the
value count,

You might also like