You are on page 1of 68

Project Report

On
LITERACY RATE ANALYSIS
Submitted in partial fulfillment of the requirement of
Bachelors of Computer Applications (BCA)
Guru Gobind Singh Indraprastha University, Delhi

Session 2019-20

Under the Guidance of: Submitted by:

Dr. Ruchi Agarwal Akash Aggarwal


Head of Department (HOD) BCA -VI Sem
35225502017

JIMS ENGINEERING MANAGEMENT TECHNICAL CAMPUS


48/4 Knowledge Park III, Greater Noida-201306 (U.P.)
DECLARATION

I hereby declare that this Major Project Report titled “ Literacy Rate Analysis “ submitted by
me to JEMTEC, Greater Noida is a bonafide work undertaken during the period from 01-
January-2020 to 25-April-2020 by me and has not been submitted to any other University or
Institution for the award of any degree diploma / certificate or published any time before.

_____________________

(Signature of the Student) Date: -

Name: - AKASH AGGARWAL

Enroll. No.: - 35225502017

i
BONAFIDE CERTIFICATE

This is to certify that as per best of my belief the project entitled “Literacy Rate Analysis” is
the bonafide research work carried out by AKASH AGGARWAL student of BCA, JEMTEC,
Greater Noida, in partial fulfilment of the requirement for the major project report of the Degree
of Bachelor of Computer Application.

He has worked under my guidance.

I wish him a success in all his future career endeavours.

Name: Dr.Ruchi Agarwal


___________________

Designation: H.O.D. BCA Department Signature with Date

ii
ACKNOWLEDGEMENT

I offer my sincere thanks and humble regards to JEMTEC, Greater Noida for imparting us very
valuable professional training in BCA.

I pay my gratitude and sincere regards to Dr.Ruchi Agarwal, my project guide for giving me the
cream of her knowledge. I am thankful to her as she has been a constant source of advice,
motivation and inspiration. I am also thankful to her for giving his suggestions and
encouragement throughout the project work.

I take the opportunity to express my gratitude and thanks to our computer Lab staff and library
staff for providing me opportunity to utilize their resources for the completion of the project.

I am also thankful to my family and friends for constantly motivating me to complete the project
and providing me an environment, which enhanced my knowledge.

Date: -

Name: - AKASH AGGARWAL

Enroll. – 35225502017

Course: - BCA (VI-Sem.)

_____________________

(Signature of the Student)

iii
CONTENTS

S.NO. TOPIC PAGE NO.

1. Declaration/ Bonafide Certificate i-ii


2. Acknowledgements iii
3. Abstract 1
Chapter 1: Introduction
4. Project description 2-5
Objective of the study
5. Chapter 2: Software Requirements Specification 6-11
6. Chapter 3: Hardware Requirements and Software Requirements 12-14
7. Chapter 4: Source Code and Output Snapshot 15-57
8. Chapter 5: Conclusion 58-60
9. References/Bibliography 61-62
ABSTRACT

Education is the foremost important tool for change of the society and betterment
of nation. Proficiency and level of training are fundamental pointers of the level of
improvement accomplished by a general public. Spread of literacy is by and large
connected with vital attributes of present day development for example,
modernization, urbanization, trade and industrialization. Literacy shapes a vital
contribution to generally improvement of society empowering them to understand
their social, political and social condition better and react to it appropriately. Better
education and literacy prompt a more noteworthy mindfulness and furthermore
contributes in enhancement of economical and social conditions. Ministry of
Human Resource Development (DISE) releases a data on literacy rate each year
which can be exceptionally valuable in examining different elements influencing
education rate of a state or an area. An all around structured dashboard that
exhibits the best possible examination of the information will give a reasonable
picture of proficiency in different locales of India. Data to be analyzed is handled
and cleaned to draw out the most imperative and significant features. The data at
that point analyzed gives the last outcome which is presented on dashboard making
it easy to understand and comprehend.

1
CHAPTER 1

INTRODUCTION

2
INTRODUCTION TO LITERACY RATE ANALYSIS

LITERACY is defined as the ability to read, write and think rationally. It


represents the lifelong, intellectual process of gaining meaning from print. Key to
all literacy is reading development, which involves a progression of skills that
begins with the ability to understand spoken words and decode written words, and
culminates in the deep understanding of text.

Literacy has always been an issue for the world. Every country aims to achieve
full literacy rate. Although literacy rate has increased up to a great extent now but
still there is a need to know the areas that are still lagging behind.

Literacy is characterized as the capability to read and compose a basic message in


any language. A more expansive translation is literacy as apprehension and
competence in a specific area. The key to literacy is a fundamental comprehension
of composed content, capacity to comprehend someone else talking and
comprehension and ability to write. Reading and writing are foundation skills. Not
solely are they needed for additional study, they're conjointly crucial in helping us
to know and interact with the world around us. Literacy in india is marked with an
excellent amount of regional variation from one half to another. The regional
differences in literacy levels within the nation has resulted from the regional
diversity in various cultural, economical and social factors beside a marked
distinction within the historical expertise of various regions. India's illiteracy is a
prime concern that has numerous factors connected to it. Illiteracy in India is
majorly involved with completely different sorts of disparities that exist within the
country. There are gender disparity, income variance, state variation, caste
disproportion, technological hurdles which forms the literacy rates that exist within
the country. So, study and analysis of literacy data of India is needed to supply a
timely and sophisticated basis for serving to planning and management of

education services and to ascertain or contribute to an education system for


assortment, organization and utilization of education data.

Following are the means in moving toward the analysis of the data.

COLLECTION OF DATASET

Data preprocessing is a data mining procedure that includes transforming crude


data into a comprehensible format. Real-world data is frequently inadequate,
conflicting, and lacking in certain behaviors, and is probably going to contain
numerous blunders. Data preprocessing is evidenced technique for resolving such
conflicts. Data preprocessing constructs raw data for additional processing. Data
experiences a progression of ventures amid preprocessing:

Data Cleaning: Data is cleaned through processes like filling in missing


values, smoothing the noisy information, or resolving the inconsistencies
within the data.
Data Integration: Data with various portrayals are assembled and clashes
inside the data are settled.
Data Reduction: This progression intends to exhibit a diminished portrayal
of the data in data distribution center.

4
OBJECTIVE

Education is the foremost important tool for change of the society and betterment
of nation.

 Which state has highest male, female literacy rate in an elementary?

 Which state has highest male, female literacy rate in an secondary?

 Which state has lowest male, female literacy rate in an elementary?

 Which state has lowest male, female literacy rate in an secondary?

 Overall highest and lowest literacy rate in an elementary and secondary.

5
CHAPTER-2:

SOFTWARE REQUIREMENT SPECIFICATIONS

6
SOFTWARE REQUIREMENT SPECIFICATIONS

1.2. Language Used:

1.2.1. Python:

Python is a widely used general-purpose, high level programming language. It


was initially designed by Guido van Rossum in 1991 and developed by Python
Software Foundation. It was mainly developed for emphasis on code readability,
and its syntax allows programmers to express concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate
systems more efficiently.

It is an interpreted, high-level, general-purposeprogramming language. Its


language constructs and object-oriented approach aim to help programmers write
clear, logical code for small and large-scale projects. Python is dynamically typed
and garbage-collected.

1.2.2. Why use Python?

Python works on different platforms (Windows, Mac, Linux, Raspberry Pi,)


Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer lines
than some other programming languages.
Python runs on an interpreter system, meaning that code can be executed as
soon as it is written. This means that prototyping can be very quick.
1.3. Tools Used:

1.3.1. Anaconda:

Anaconda is a free and open-source distribution of the Python and R


programming languages for scientific computing (data science, machine learning
applications, large-scale data processing, predictive analytics, etc.), that aims to
simplify package management and deployment. Package versions are managed by
the package management system conda. The Anaconda distribution is used by
over 15 million users and includes more than 1500 popular data-science packages
suitable for Windows, Linux, and MacOS. Anaconda distribution comes with
more than 1,500 packages as well as the Conda package and virtual environment
manager. It also includes a GUI, Anaconda Navigator, as a graphical alternative
to the command line interface (CLI).

1.3.2. Jupyter Notebook:

Jupyter Notebook (formerly IPython Notebooks) is a web-based interactive


computational environment for creating Jupyter notebook documents. The
"notebook" term can colloquially make reference to many different entities,
mainly the Jupyter web application, Jupyter Python web server, or Jupyter
document format depending on context. A Jupyter Notebook document is a
document, following a versioned schema, and containing an ordered list of
input/output cells which can contain code, text (using Markdown), mathematics,

plots and rich media, usually ending with the ".ipynb" extension.

Jupyter Notebook can connect to many kernels to allow programming in many


languages. By default Jupyter Notebook ships with the IPython kernel. As of the
2.3 release (October 2014), there are currently 49 Jupyter-compatible kernels for
as many programming languages, including Python, R Julia.

1.4. Packages Used:

1.4.1. Numpy:

NumPy is the fundamental package for scientific computing in Python.Numpy is a


general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the
fundamental package for scientific computing with Python.Besides its obvious scientific
uses, Numpy can also be used as an efficient multi-dimensional container of generic
data.NumPy’s array class is called ndarray. It is also known by the alias array. Note that
numpy.array is not the same as the Standard Python Library class array, array, which only
handles one-dimensional arrays and offers less functionality.

1.4.2. Pandas:

Pandas is a Python package providing fast, flexible, and expressive data


structures designed to make working with “relational” or “labelled” data both easy
and intuitive. It aims to be the fundamental high-level building block for doing
practical, real world data analysis in Python. Additionally, it has the broader goal
of becoming the most powerful and flexible open source data analysis /
manipulation tool available in any language. It is already well on its way toward
this goal.

1.4.3. Matplotlib:

Matplotlib is a Python 2D plotting library which produces publication quality figures in a


variety of hardcopy formats and interactive environments across platforms. Matplotlib can
be used in Python Scripts, the Python and IPython Shells, the Jupyter Notebook, Web
Application Servers and for graphical user interface toolkits. It is low level, provides lots
of freedom. Matplotlib tries to make easy things and hard things possible. We can generate
plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc. with just few
lines of code.

1.4.4. Seaborn:

Seaborn is a Python data visualization library based on Matplotlib. It provides a


high-level interface for drawing attractive and informative statistical graphics. It
has great default styles. Seaborn aims to make visualization a central part
exploring
understanding the data. Its dataset-oriented plotting functions operate on data frames and
arrays containing whole datasets and internally perform the necessary semantic mapping
and statistical aggregation to produce informative plots.

FUTURE SCOPE

This project deals with the analysis of Literacy Rate in different states of India
based on 680 factors. This dataset contains information about the year 2015-16 and
was published by HRD Ministry of India. We are focusing at finding top five
factors and the least five factors that influence the literacy rate of given state and
Analysation on literacy rate may government use in future for comparing old
educational growth and make changes in new one to provide the quality of
education and lots of facilities May this Analysation report help people living in
rural areas lead a very different life compared to the people living in the urban
areas. There is less motivation to go to school in rural areas as a lot of people tend
to take up their parent’s profession or business. This Analysation help a lot.

11
CHAPTER 3:

HARDWARE AND SOFTWARE REQUIREMENTS

12

HARDWARE AND SOFTWARE REQUIREMENTS

1.5.1. Data Analysis Using Python:

Python is an increasingly popular tool for data analysis. In recent years, a number of
libraries have reached maturity, allowing R and Stata users to take advantage of the beauty,
flexibility, and performance of Python without sacrificing the functionality these older
programs have accumulated over the years. Data analysis is the process of evaluating data
using analytical and statistical tools to discover useful information and aid in business
decision making. There are a several data analysis methods including data mining, text
analytics, business intelligence and data visualization.

1.5.1.1. Steps for Data Analysis:

Importing Data with Pandas


The first step is to read the data. The data is stored as a comma-separated values,
or csv, file, where each row is separated by a new line, and each column by a
comma (,). In order to be able to work with the data in Python, it is needed to read
the csv file into a Pandas DataFrame. A DataFrame is a way to represent and work
with tabular data.
Handling the Missing Data
The Data Analysis Phase also comprises of the ability to handle the missing data
from our dataset, and not so surprisingly Pandas live up to that expectation as
well. This is where dropna and/or fillna methods comes into the play. While
dealing with the missing data, you as a Data Analyst are either supposed to drop
the column containing the NaN values (dropna method) or fill in the missing data
with mean or mode of the whole column entry (fillna method), this decision is of
great significance and depends upon the data and the affect would create in our
results.

CHAPTER 4:
SOURCE CODE AND OUTPUT SNAPSHOTS

15
This table we get by using this syntax : print(elementary.head)

Here, This above is the dataset of state wise elementary that we have imported
here to perform analysis in dataset using pd.read_csv this method is the method of
pandas library used for importing dataset..
17

Here, This is the second dataset state wise meta_elementary imported for the data
analysis.
Here, this is the dataset of state wise secondary we have readed..

In , above code we tried to checking the shape of the dataset and checking the
null entries in the elementary dataset.
Here, we are trying to print 2 rows of the elementary dataset which we have
imported above.

Here same we have


done we have printed meta_elementary data.

Here we have seen that the overall literacy rate using describe inbuilt function from
the above analysis the data we get maximum rate is : 93.91 ,minimum is 63.82

About this analysis this will be more clear from the plots bar graph .
These are the area details of
school that how many area allocated to schools

Here we are trying to analyze the growth rate of that states.

Lowest is: -0.47 Highest is:-0.47

21
We can see here that we have tried to see that which state has the lowest growth
rate then we get that the Nagaland is the state where the growth rate is lowest.

So when we try to see the growth rate maximum then we get that Dadra and nagar
haveli has the highest growth rate.

22
Here for working on various operation analysis we have created an attribute
DIFF_LIT from old attribute MALE_LIT and FEMALE_ LIT this defines that
here the aggregation process is going which comes under data transformation so
lets get to know that how we get the DIFF_LIT we get it by subtracting
MALE_LIT from FEMALE_LIT .

Now we will be able to see the OVERALL_LI describe in the ploting graph .
Here in the above code we are now calling that function that we have created at
first i.e. plot_barh function to plot the bar on graph.

Here , we are comparing literacy rate state wise between the different state wise.

From the above bar graph it is easily to see that Kerala has the highest literacy rate
and lowest literacy rate in Bihar .

So, Now we can conclude that from the graph points that kerala is above 80 so it
may be 93.something because when we have described overall literacy rate using
describe method then it has been seen there that maximum is 93.91 % and here in
the bar highest is kerala and min is the 63.82 % and if we see in the bar that bihar
is above 60% so it may be 63.82 % .

Comparision result of Literacy rate statewise is :

HIGHEST: Kerala.

LOWEST:Bihar.
Here we have compared the

Here we have Compared the FEMALE AND STATEWISE so from the above
observation it has been seen that :

Female Literacy Rate is Highest in KERALA.

Female Literacy Rate is Lowest in RAJASTHAN.


Here we, Compared the literacy rate of male state wise.

Highest: Lakshadweep

Lowest: Bihar.
Here In this bar we are comparing the Male and Female Literacy Rate Statewise.

From the above Observation We Get:

Female Highest: Kerala.

Female Lowest: Rajasthan.

Male Highest: Lakshadweep.

Male Lowest: Bihar.

27
Here in the above code we are printing the last some rows using tail () function.

And now we are going to print the average literacy rate from the diff_lit and
comparing it with the national average .
As it has been seen that north east have an average lesser difference,, between male
and female literacy rate when compared with the average of the country.

Now we have printed the female literacy rate of Meghalaya and average female
literacy rate.

Now we will create a new data frame named as top_bottom it contains only
top_3 and bottom_3 states from overall literacy rate.
29

We are droping here telangana because it has been founded so it is may be difficult
to make schools and all the facilities.

And in the next line of code we are concatinig the top_3_elem and bottom_3_
elem and making a top_bottom with axis=0 an sort= false show that now we need
to sort our data because we have already sorted it while making top_3_elem and
bottom_3_elem .
30

Here now we have

Seen our data of 6 states that we get from the concat operation that we have
performed above.

KERALA , LAKSHDEEP,ARUNACHAL
PRADESH,BIHAR,MIZORAM ,RAJASTHAN.

So these are the some states that are being considered as top 3 and bottom 3

Kerala,
31

Here in this code top_bottom data frame is multiplied with the total poplation and
then divided by the top_bottom multiplied with area sqkm * 1000

Here we are trying to see the difference between male and female in top_bottom
data frame i.e from theses 6 states. So we get the above output

Then rajasthan has the highest difference in literacy rate analysis and kerala has the
lowest difference as because in above analysis we have seen in male and female
analysis statewise that there is a very little difference between male and female
literacy rate and from that difference we are sure that there is only 4.04 % of
difference between male and female in kerala whereas in rajasthan difference is
more because if we look forward above analysis then what we are getting we are
getting that in rajasthan females rate is only above 40 and males is about to 80 and
even female literacy rate is lowest as compared to all the other states so we can say
that there is a 27.85 %is the highest difference between the male and female.

Now whatever we have analysed above through the textual output no it will
become easier to understand while looking up for the visualization;

33
Here in the above code we have ploted the same analysis that we have discussed
above rajasthan has the highest difference whereas kerala has the lowest difference
rate.

Here we are just displaying the Total population

Now in this code we are plotting the P_URB_POP, and P_RUR_POP

P_URB_POP is the percentage of urban population.

P_RUR_POP is the percentage of rural population

We are ploting this here we have given that what kind of bar or scatter points we
need to use in displaying the bar graph . and these all are done using all the
top_bottom dataframe.
Now we are checking the sexratio of top_bottom data frame sex ratio is the 1000
of males are in the states then how many female are there.

So as we can see that kerala has the highest sexratio and Bihar has the lowest sex
ratio.

35
Here it is the visualization form that we can see that kerala has the hhighest sex
ratio and bihar has the lowest .

Now we are including more states for better comparision:


Here we have included these states..

here this are the sex ratio.

37
Here we can see that we have ploted SC_ST_POP, _POP, P_ST_POP

Here sc and st are sc and st population.

That uses here for comparing between all top_bottom states.

38
Here in this from meta_secondary data here we get the SCHTOT and same as seen
above .

Here we displayed the above described the SCHTOT I it means schools by


category.

39
These are the some of the total facts that are in the above datasets.

In this we are trying to see that the how many schools are by category in
top_bottom data frame .

40
so here is the schools by category in these 6 states highest schools by category in
RAJASTHAN and BIHAR and after that kerala and so on…

Now in this we are printing the school kids now we have top_bottom[SCHKIDS]
is created by adding the 2 different attributes..

And we have created the another top_bottom[KIDSPERSCH] by dividing the


top_bottom.SCHKIDS / top_bottom.SCHTOT

And now we will plot the kidspersch

41
As from the above we can visualize that bihar has the highest number of school
and Mizoram has the lowest number of schools kids per school top 3 states are

Were where kidspesch is mostly good kerala, lakshdeep and bihar

42
Now here totcls1g we are printing the top_bottom data frame

From the above we can see that bihar has the highest no. of classes as compared to
rajasthan and kerala and Mizoram as Mizoram has very less no of schools so it has
been possible that there is less classs and all.

43
now here we are creating the KIDSPERCL by dividing the SCHKIDS BY
TOTCLS1G from top_bottom data frame .

So here this is the kids per class kids perclass is highest in the bihar and lowest in
the Mizoram.
Here in this we are creating the elem data frame and elem[schkids] and
elem[kidspercl] no these values are being visualized with the plot graph
45
Here we are comparing the kidspercl and overall _li that to get the kidspercl
values.

Here now we are going to compare the private schools and government schools
basically a type of schools to get that ho many schools are in how many states
Here this code for the displaying the private government schools and madras in
the whole analysis. here we

Have prepared so many attributes then we are concating those to plot in a same
graph to get the accurate and the relative result of the analysis ..

As we can see that lakshdeep has only government schools

Kerala has the highest no of private schools and it is the only state that have
highest no of privte school as compare to other states and the number of madradsas
and the number of government schools is much less then national average.

Bihar and arunachal Pradesh have really less private schools compared to
government schools.

Rajasthan has 35% private schools which is largely compared in literacy rates.
Here contie is the attribute we used for presenting that school son development or
expending i.e . government has granted the permission to the schools for the
development and expansion.

Now here elem is created to compare the contie from overall literacy rates i.e. from
this overall literacy rates this much of schools are being on development.
49

Now these all are the data of the schools development and granted permission for

50
Now we have to find the maximum dropout rate in different states dropot rate from
8th to 9th so how will do that we will first sum up C9_G, C9_B and same with the
8th class enrollments then we will subtract from 9th to 8th and divide by the total
enrollments of class 8th and then we find the proportion of students who droped out
from 8th to 9th .

51
Now using these we will calculate that percentage of childrens droping in top_3
and bottom_3 from 8th to 9th.

Now these are the data that we require to get the maximum drop out rate.
We will using the above data to get the dropout rate as we can see that pd.concat
is used for the concating i.e. merging and making one.

Now we are looking the droping out rate from top_ bottom data frame.

We can see that droping rate in kerala are vary less Mizoram , lakshdeep kerala
are in - top_bottom

53
top_bottom we have shown the droping out rate.
Here the details of the enrollments class column columns of enrollments

Total enrolments of each class and we are storing the column and classes now we
are totaling all the columns of the data frame.

55
56
Now here we are plotting the rate of the droping .

Maximum dropout rate is 0.08 and minimum -0.00 .

57
CHAPTER-5:

CONCLUSION

58
CONCLUSION

From the above of the analysis we get to know that the highest and lowest literacy
rate analysis state wise and between male and female and male and state name and
female and state name

State Wise:

Highest: Kerala

Lowest: Bihar

Female vs State name:

Highest: Kerala

Lowest: Rajasthan

Male vs State name:

Highest: Lakshdeep

Lowest: Bihar

Drop out rate:

Maximum: 0.08

Minimum: -0.00

Schools by Category:

Government: Lakshdeep

Private: Kerala

Madrasas: Kerala
From this analysis we get to learn so many concepts and figures by literacy rate is

Less and high so many we have seen that so many states have the low literacy rates
because the schools are there not as they require some facilities are may not given
to students and in many states schools are there but they are unable to grab it
because it may be far from home and may be not financially able to grab it or may
not have features as they wanted to be in the schools and drop out rate that some
students drop out from 8th to 9th because they may not able to continue and so
many reasons are there to define that why literacy rate are less in many of the
states .

60
CHAPTER-6
BIBLIOGRAPHY/REFERENCES

61
BIBLIOGRAPHY

https://www.educationforallinindia.com/page167.html

https://www.ijstr.org/final-print/aug2019/Literacy-Rate-Analysis-Dashboard.pdf

http://dataworld.org

http://kaggle/.com

http://census.com

62

You might also like