You are on page 1of 12

EXPLORATORY DATA ANALYSIS

M.Sc. (DATA ANALYST) Dissertation


Submitted to christ university

Masters of Science
By

A.LAHARI
Registration Number: 2139465

submitted to
RAJESH.R

Department of Computer Science

CHRIST UNIVERSITY
Banglore, Karnataka-560029

(DEEMED TO BE  UNIVERSITY)


August, 2021
ASSIGNMENT-1
EXPLORATORY DATA ANALYSIS
Introduction:
The dataset chosen consists of 1000 rows and 8 columns. This data set deals with
the various particulars of students of one selected school. Particulars like Gender
of the student, Race of the student, their parents’ level of education and the way
they consume lunch in their school. Along with all these attributes there are four
more attributes which speaks about whether the student has completed their Test
preparation course. This data set also include the score of that particular student
secured in three different exams namely Math, Reading, Writing.
Below table show the various variable level considered of their respective
attributes in the selected data set.

Attribute Variables
Gender Male, Female
Race/Ethnicity Group A, Group B, Group C, Group D, Group E
Parental level of Associate's degree, Bachelor's degree, High school, Master's
Education degree, Some college, Some high school
Lunch standard, free/reduced
Test preparation
course None, Completed
Math score 0-100
Reading score 0-100
Writing score 0-100
The above-mentioned Dataset it taken from the one well known website called
GitHub and this known for hosting the coding communities. The reason for
choosing the Dataset is because it meets all the requirement like size and quality
of the data for my assignment in this subject. Along with satisfying requirements
this particular Dataset has also this has been one of the popular Dataset on the
website and I believe the operation on this Dataset would be smooth and flexible
to apply all the class learnt concepts on this Dataset.
The Dataset taken for this assignment can be found by clicking here and it is
associated with the name “StudentPerformance.csv”.

Bascic operations
1.what are the libraries used in the data set?
Ans. Importing the required libraries for EDA are

2.how do you read the data in dataset?


Ans.To read the data in dataset we use read.csv

3. Find starting 6 rows of the Dataset in python?


Ans. First 6 rows can be found with the command head()
4. Find ending 6 rows of the Dataset in python?
Ans. last 6 rows can be found with the command tail()

5.what are the different data types present in the dataset?


Ans.To find different data types we use dtypes
6.how many rows and columns are present in the dataset and what is the length o
the dataset?
Ans. We can use the command .shapes to find the rows and columns of a dataset

Ans.The length of the dataset can be determined by using len()

7.Find the mean ,min,max,and percentile values in the dataset?


Ans.To find the min,max,percentile values we can directly
use describe() command
8.find the boxplot of the dataset?
Ans.To find boxplot of the dataset we use boxplot()

9.find the histogram of the dataset?


Ans.To find the histogram of the dataset we use hist()
10.How to find the non-null values of the dataset?
Ans.To find the non-null values we can use .info()

11.Draw a bar graph for students of math score?


Ans. To draw a bar plot we use
Data cleaning:
1.How to find a missing value in the data frame?
Ans:Count the number of missing values in the dataframe if the value is null then
it displays as True
2.Drop the column lunch and test preparationfrom the dataset?
Ans.To drop the column lunch and test preparation we can use table_name.dop()

3.how do you Rename the column in dataset?


Ans. To rename the column in data set we use table_name.rename()
4.Draw scatter plot using math score and writing score?
Ans. Drawing a plot using

5..Draw a bar plot for parental education


Ans.drawing a bar plot
6 .Draw scatter plot using reading score and writing score?
Ans. Drawing a plot using

7.Draw a bar plot for race/ethnicity


Ans.drawing a bar plot

Conclusion:
EDA is primarily used to see what data can reveal beyond the formal modeling or
hypothesis testing task and provides a provides a better understanding of data set
variables and the relationships between them. It can also help determine if the
statistical techniques you are considering for data analysis are appropriate.
Exploratory Data Analysis is valuable to data science projects since it allows to get
closer to the certainty that the future results will be valid, correctly interpreted,
By performing the above operations on my data set I can clearly analyse,visualize
and can even detect where the data has been incorrect or missing it parts .EDA
helps us to perform our tasks efficiently and it is also easy to use

You might also like